Enhancing AI Agents with Causality

Given the remarkable brute force power we have access to, namely lots of data and computational power, is causal inference the “light weight and feather” of cognition?

Mar 05, 2025

We recently hosted Ali Madani for an insightful session on the intersection of AI Agents and Causality, a fundamental question that rarely gets enough attention: Can AI agents truly make reliable decisions without understanding cause and effect?

The distinction between correlation and causation is just like the difference between saying students should skip exams to avoid weight gain (because exams correlate with weight gain) versus addressing the actual causal chain (exams → stress eating → weight gain). This example, shared during the session, perfectly illustrates why causal reasoning matters in practical applications.

A few years ago I read the “Book of Why” and, as a physicist, I really enjoyed it. The book explores the concept of causality—how we determine cause-and-effect relationships rather than just correlations. It argues that traditional statistical methods (like correlation and regression - aka foundations of everything we do in ML) are insufficient for understanding causality. The book introduces a “causal inference framework” based on “causal diagrams” and “do-calculus”, which allow us to answer counterfactual questions like, What would have happened if X had not occurred? It contrasts different "levels of causation" using “Ladder of Causation”:

Association (Seeing) – Correlation and pattern recognition (e.g., "Smokers tend to get lung cancer").
Intervention (Doing) – Understanding the effects of actions (e.g., "What happens if we ban smoking?").
Counterfactuals (Imagining) – Reasoning about alternate realities (e.g., "Would this person have avoided cancer if they had never smoked?").

The book critiques traditional statistical methods (like those used in machine learning) for their reliance on correlation without causal understanding. It also discusses real-world applications in medicine, economics, AI, and social sciences.

The math we use in science often relies heavily on counterfactuals to understand fundamental assertions that generalize very broadly within the boundaries of their assumptions (think f = ma and such). In physics, for instance, sparse causal relationships enable tremendous generalizability. As Ali illustrated: "Newton didn't have millions of data points, it was an apple and then all the experiments and then he came up with the formulas, it worked out."

By identifying similar sparse causal relationships in other domains, we might achieve similar generalizability without requiring the massive datasets currently needed for correlation-based approaches. That is one of the most compelling aspects of marrying causality and classical ML is in hopes of improving generalization with less data, addressing a fundamental challenge in traditional machine learning approaches.

After a few years, I still believe that causal inference can be a significant addition to how we do AI, but I have moderated my view of it from “absolutely necessary” to “practically useful”. My go-to analogy for this kind of thing is flight: how nature flies is mechanistically very different from how humans fly. The “artificial” flight leverages a remarkable brute force power called a jet engine to pick up a significantly heavier object from the ground. That means that the absolutely necessary properties like light weight and features and wings, in their natural form, become largely irrelevant. The question that I’m struggling with these days is this: Given the remarkable brute force power we have access to, namely lots of data and computation, is causal inference the “light weight and feather” of cognition?

Well, I only think about that question when I have my philosopher hat on. When I have my pragmatic AI company hat on, I do spend a lot of time thinking about how causal structures can create scaffolding for the agentic systems we build for commercial and research purposes. While there’s a remarkably successful effort going on to build reasoning into the statistical models we use and love, say R1, in parallel and for practical applications, I think it is very important to still think about causality and counterfactual reasoning when designing agentic systems, especially those involving multi-agent interactions, autonomous decision-making, and adaptive learning.

Now let’s get into some notes from the session with Ali.

The Promise and Limitations of AI Agents

AI agents, at their core, are systems designed to interact with their environment through an iterative process of assessment, information processing, and autonomous decision-making. They're characterized by their ability to learn, adapt, and operate with varying degrees of independence. The recent explosion of large language models has accelerated interest in these agents, particularly for their potential to automate complex tasks across industries.

In healthcare alone, AI agents could revolutionize prevention, detection, diagnosis, and patient monitoring, not by replacing doctors, but by handling repetitive tasks and providing real-time support. The economic implications are significant, with potential cost reductions across multiple sectors.

But here's where things get interesting: most AI systems today operate primarily on correlative relationships rather than causal ones. This creates a fundamental limitation.

The Correlation Trap

"If you go correlative and identify association between different variables, you can see that exams definitely have correlation with gaining weight," Ali noted. "So many students go through stress eating through exams and they gain weight."

Imagine we want to recommend actions to help students avoid weight gain. Data analysis might show a strong correlation between exams and weight gain. A purely correlative approach might suggest the absurd recommendation to "avoid taking exams" to prevent weight gain. However, a causal understanding reveals that exams cause stress eating, which then causes weight gain. With this causal chain identified, we can make more meaningful recommendations targeting the actual mechanism (stress eating) rather than the initial trigger (exams).

This example highlights why correlation isn't enough for truly intelligent systems. Without causality, AI agents risk making recommendations based on spurious correlations, like the correlation between wind in Taiwan and Googling “I’m tired”.

The problem extends beyond obvious examples. In drug discovery, researchers spend years designing chemical compounds without knowing if they'll have the expected effect on patients. Some causal relationships remain unknown even to human experts, creating a significant challenge for AI systems.

Bringing Causality to AI Agents

There are several approaches to incorporating causality into AI agents, each with different applications:

1. Randomized Interventions

The gold standard for establishing causality involves randomized interventions, where confounding variables are controlled through randomization. This approach is widely used in clinical trials and allows for direct measurement of causal effects:

Causal Effect = Outcome(Treatment) - Outcome(Control)

While powerful, randomization isn't always feasible due to cost constraints or ethical considerations. As Ali noted, "From an ethical perspective in many situations, for example in the case of drugs, we cannot test every single thing that we hypothesize to work on human beings."

2. Causal Discovery Algorithms

These algorithms aim to generate directed acyclic graphs (DAGs) that represent causal relationships between variables. Unlike correlation, which merely shows association, these graphs reveal directionality, which variables cause changes in another.

So, for scenarios where controlled experiments aren't possible, causal discovery algorithms can extract causal relationships from observational data:

"We have causal discovery algorithms that aim to generate causal graphs and directed acyclic graphs... when you provide these values across variables into some of these causal discovery algorithms, what they try to do is to check some of the causality assumptions and at the end generates a directed acyclic graph for you."

These algorithms come in two main varieties:

Statistical methods (traditional constraint-based or score-based approaches like PC)
Machine learning-based gradient algorithms (more computationally efficient)

What's particularly valuable is that these approaches don't require massive datasets, hundreds or thousands of data points can suffice, making them practical for many real-world applications.

3. Causal Representation Learning

This emerging field aims to learn representations that reveal unknown causal structures. It's based on a fundamental insight from physics: most phenomena are governed by a sparse set of causal rules rather than thousands of continuous features.

Perhaps this fundamentally differs from traditional representation learning. While traditional approaches summarize raw features into latent variables, causal representation learning aims to uncover the underlying causal structure of data.

This approach draws inspiration from physics, where sparse sets of fundamental rules determine complex phenomena. As Ali explained: "We have a sparse set of rules that determine a specific phenomena... those rules are based on the causal roots... like gravity for example, the electromagnetic rules."

This sparsity principle applies across domains. In cancer research, for instance, while there isn't a single gene causing poor outcomes, we don't expect thousands of genes to be equally responsible either. Causal representation learning seeks to identify these sparse causal factors.

4. Large Language Models and Causality

While LLMs weren't explicitly trained for causal reasoning, research has shown they can effectively tackle certain causal tasks with proper prompting. A paper highlighted during the session demonstrated that models like GPT-4 can achieve up to 96% accuracy in identifying known pairwise causal relationships.

The key lies in smart but simple prompting strategies. Rather than asking broadly about causal relationships between multiple variables, researchers found success by asking direct questions like: "Which cause and effect relationship is more likely: changing A causes a change in B, or changing B causes a change in A?"

Importantly, LLMs excel at retrieving known causal relationships but cannot uncover novel ones:

"This way of using the large language models replace the experts for graph generation... But it doesn't uncover unknown relationship."

The key insight: domain knowledge is crucial. LLMs can only identify causal relationships they've encountered during pre-training. They excel at retrieving and applying known causal knowledge but cannot uncover truly unknown relationships.

This creates a natural categorization of causal tasks for AI agents:

Known causal relationships: LLMs can reliably retrieve these (e.g., smoking causing lung cancer)
Abundant data but unclear causality: Areas where causal discovery algorithms might help (e.g., sales data, web page optimization)
Unknown relationships: Domains requiring experimental validation and specialized causal learning algorithms (e.g., novel drug discovery)

5. Reinforcement Learning and Causality

The final piece of the puzzle involves using reinforcement learning to improve AI agents' causal reasoning. By providing feedback based on causal relationships, either from experts, experiments, or causal modeling, we can fine-tune models to make better causal inferences over time.

"The success of large language models was partially related to reinforcement learning... putting the transformers-based large language models and reinforcement learning for providing the feedback and fine-tuning and penalizing them and rewarding them resulted in huge success in the field."

Practical Applications Across Domains

The integration of causality with AI agents offers compelling applications:

Healthcare

More accurate diagnosis through root cause identification
Prevention and detection capabilities
Patient monitoring with causal understanding
Treatment recommendation based on causal effects

Business Applications

Understanding true drivers of sales beyond correlations
Designing effective A/B tests to measure intervention impacts
Web optimization based on causal rather than correlative insights

Drug Discovery

Target identification for different cancer types
Biomarker discovery for drug response prediction
Analysis of treatment regimens and patient journeys

Conclusion

What struck me most about Ali's presentation wasn't a single breakthrough technique, but rather the recognition that enhancing AI agents with causality requires integration across multiple approaches. It's not about waiting for perfect causal reasoning models, but about strategically incorporating causal thinking into existing systems.

As AI agents become more integrated into critical domains like healthcare, finance, and education, their ability to reason causally will directly impact human lives. An AI that recommends interventions based on genuine causal understanding rather than statistical correlation is not just more accurate, it's more trustworthy.

The future of AI isn't just about bigger models or more data, it's about smarter reasoning. And at the heart of smarter reasoning lies causality: understanding not just what happens, but why.

Q&A

Q: What's the relationship between reasoning models like R1 and causality?

A: While these models demonstrate impressive capabilities, true reasoning arguably requires causal understanding. Ali suggested we don't need to wait for perfect causal reasoning models, we can immediately begin providing causal feedback to existing models through reinforcement learning approaches while developing more fundamentally causal architectures in parallel.

Q: How does causal representation learning differ from traditional representation learning?

A: Traditional representation learning summarizes raw features into latent variables, while causal representation learning aims to uncover underlying causal structures. The latter involves additional assumptions beyond traditional IID (independent and identically distributed) assumptions, with the goal of identifying sparse causal relationships that enable better generalization and out-of-distribution performance.

Q: Can you give practical examples of how causality has helped in your work?

A: In drug discovery, Ali's team has used causal discovery and inference to identify new gene targets for different cancer types. They've also applied causal approaches to biomarker discovery, identifying underlying mechanisms related to drug responses. While specific results remain confidential, these applications demonstrate the practical value of causal approaches in real-world settings.

Eric Heitzman

Mar 5

All this talk of causality is needlessly confusing. The book of why is 5% useful ideas, 20% bad explanation, and 75% about how amazing Pearl is and how dumb everyone else has been.

Causes have effects and not vice versa.

But correlations are just that: lines.

Causation does not imply correlation; it implies association, and associations are asymmetric. When relationships between data are asymmetric, proper causality is already baked into the system.

Expand full comment

Deep Random Thoughts

Discussion about this post