Testing Software that Involves LLMs, and more
In this post you will read about software testing regarding LLMs, creating in house benchmarks, relationship between physical movement and emotion regulation, and my half marathon race.
In my substack, deep random thoughts, I share a randomly selected set of my writing and updates every week. My posts will be related to LLMs (AI in general), product dev and UX, health (founders’ flavor), startup related topics, and of course the events we run!
Asks and Announcements
One Day Workshop on LLM Evaluation - add to your calendar
Last week at Aggregate Intellect!
This past week we needed to navigate a relatively complex situation regarding one of the deals that we are working on. This sparked some internal conversations around our long-term strategy, and how short term necessities can be aligned with long-term priorities. This involved reviving some of the strategy work we were previously doing, and adapting it to the more recent view of how we think we should run our business.
One of the interesting things was figuring out an effective way to communicate strategy so that internally and in interactions with anyone who wants to help us, we can maximize alignment and minimize misunderstanding. 'Business Canvas” type of artifacts are pretty common for this purpose but I don’t particularly like it. I had previously expended that idea with some details that I think is important to be explicit, for example how different decisions impact each other, and called it “business control panel”. So, dusting that off and adding new details to it was a helpful exercise.
Last week’s unsung hero
This week I spent quite a bit of time bouncing ideas off Osh, both in the context of the contract negotiation with the client and long-term strategy.
Thank you, Osh, for being a great mentor and partner.
LLM Stuff
In our recent LLM workshop Matt Lewis discussed the role of augmented intelligence (AI) in the field of life sciences, emphasizing the synergy between AI and humans. He also highlighted the potential of augmented intelligence in transforming life sciences and the importance of collaboration between humans and AI. Matt also discussed the challenges of implementing change and innovation in organizations.
Topics:
-------
∎ Definition of Augmented Intelligence
* Augmented intelligence is the intentional design of software by humans to enhance cognitive performance and decision-making.
* AI and human collaboration can significantly reduce error rates and improve effectiveness in various areas of life sciences and healthcare.
∎ Transforming Life Sciences with Augmented Intelligence
* Augmented intelligence has the potential to revolutionize insight generation, content generation, strategic decision-making, and customer interactions in life sciences.
* Major professional societies encourage viewing AI as a tool to enhance work rather than a threat to jobs.
∎ Challenges of Implementing Change and Innovation
* Implementing change and innovation requires a comprehensive understanding of the stages of change, adoption curve, organizational resistance, post-implementation challenges, regulatory frameworks, critical mass adoption, and the importance of trust.
* Change requires more than just champions advocating for it.
1. How do we test Large Language Models based systems, and what challenges do they pose compared to traditional software testing?
2. What practical implementations can be employed to ensure the robustness and reliability of components interacting with LLMs, using the Sherpa project as a case study?
3. What are the latest updates to the Sherpa project, and what open questions and challenges will be addressed in the next phase?
Join Percy (Boqi Chen), a PhD student at McGill University, as he dives into the world of model-driven software engineering, trustworthy AI, and verification for ML systems. In this workshop, we'll explore the intriguing realm of testing hashtag#LLM based systems, gaining insights into the unique challenges they pose compared to traditional software testing approaches. This is an opportunity to understand how the open-source project Sherpa, a "thinking companion," ensures the robustness and reliability of components interacting with LLMs.
Percy brings a wealth of experience to the table with his role as a PhD student at McGill University, focusing on model-driven software engineering and trustworthy AI. Percy has actively contributed to the academic landscape but also has dealth with the real world applications and implications of AI, making him a credible source for this insightful session.
1. How can we create custom evaluation metrics for our unique use-cases with Large Language Models?
2. What strategies are effective for managing model drift in real-world applications?
3. What methodologies can be employed to accurately test model accuracy across diverse tasks?
Join us for an insightful workshop session with Abi Aryan, a seasoned machine learning engineer, and learn essential insights into developing and managing Large Language Models (LLMs)!
Abi's expertise lies in machine learning infrastructure design and building production-level applications at scale. She is the founder of Abide AI, and is also the author of the upcoming book "LLMOps: Managing Large Language Models in Production" for O'Reilly Publications. Abi's vast experience includes being a Venture Capital Fellow at Laconia, a machine learning scientist at ASSENT, and a research scholar at UCLA. Her talk on developing in-house benchmarks addresses the critical need for tailored evaluation metrics and strategies in the rapidly evolving field of machine learning.
1. How can we ensure the trustworthiness of generative AI outputs, such as those from OpenAI or LLama 2?
2. What are the standard tools used to validate outputs from language models?
3. How can these tools address the challenge of stochastic generative output?
Join us for an insightful session with Benjamin Labaschin, Principal MLE and founding engineer at Workhelix. With extensive experience as a data scientist and economist, Ben has played pivotal roles in companies like Hopper, XPO Logistics, Great Learning, Revantage, and Arity. Come to hear how he uses the typical "tools of trade" to meet 90% of his needs for working with LLMs.
Ben's extensive background showcases his expertise in implementing practical solutions, teaching data science fundamentals, and leading teams in both academia and industry. His experience ranges from saving millions in shipment prioritization to creating anomaly detection services that recover misallocated funds.
Some resources from our Slack channel (join):
Podcasts
Here are the good podcasts I listened to this week:
Book Me





