Phantom Pipeline: Why I Built a Liar-Catcher for My Forecasts
In enterprise forecasting, AI hallucinations aren't a quirk. They're a firing offense.
My AI forecast cited a deal in "Perception Analysis" stage with a 75% close probability.
The problem? That stage doesn't exist. The probability was invented. The AI had fabricated Salesforce-sounding terminology that appeared nowhere in my CRM—and delivered it with complete confidence.
In enterprise forecasting, this isn't a quirk. It's a firing offense. Phantom pipeline destroys credibility, misallocates resources, and burns trust with leadership. I needed AI that was terrified of being wrong.
Why build this myself?
Salesforce ships Agentforce with the Atlas Reasoning Engine—enterprise-grade AI with built-in guardrails. But I built my own. Why?
Simple: I wanted to understand how these systems actually work, not just consume them. Reading architecture docs is one thing. Implementing a working agentic loop—even a simplified one—teaches you what the docs leave out.
Why use an LLM for forecasting at all?
Forecasts are about numbers. LLMs are bad at math. Seems like a mismatch.
But here's the insight: forecasting isn't just arithmetic. It's interpretation.
Rich data matters more than raw totals. Pipeline value tells you nothing about deal health. You need to analyze next steps, competitive situation, close plans, compelling events—the qualitative signals buried in CRM notes. LLMs excel at surfacing red flags that numbers alone won't show.
Tools solve the math problem. The agentic loop gives the LLM calculators, not calculation tasks. It chooses which tools to run, I feed it the results, it interprets what they mean. We're not asking AI to add—just to understand.
Context is everything. What does a manager forecast "IN" mean? How does attrition affect KPIs? When does the fiscal year start? Without this context via RAG, the LLM is guessing. With it, the LLM is reasoning.
The defense system
Knowing hallucinations would happen, I built three layers:

Evidence first. The LLM never operates from memory. Instead, it follows a ReAct loop: reason about what data is needed, select a tool, execute it, observe the results, decide if more is needed. This iterative process means the AI builds its understanding from retrieved evidence, not training data. It doesn't "know" that Stage 4 deals typically close at 75%—it only sees what's actually in my CRM right now.

Structured output. Every response must be valid JSON matching a strict schema. This powers my generative UI system—the AI outputs structured components, not freeform text. Malformed output gets auto-repaired or rejected. But here's the gap: the schema validates structure, not meaning. An invented stage name like "Perception Analysis" passes validation just fine. That's where the fact-checker earns its keep.
The fact-checker. Before any response reaches me, a separate LLM call validates claims against source data. I'm using AI to catch AI: "Does this response contain anything not supported by the evidence?" The same model that might hallucinate is surprisingly good at spotting hallucinations when given an explicit verification task. If a claim can't be verified, it gets flagged or stripped.
Caught in the act
Real example from today—January 24, 2026.
The AI drafted a response about one of my opportunities. The grounding layer caught these fabrications:
| What the AI claimed | Reality |
|---|---|
| "probability of closing typically jumps to 75%" | Not in source data |
| "Perception Analysis phase that follows" | Invented stage name |
| "Manager Forecast Judgment (MFJ)" | Not mentioned in data |
| "attrition concerns being raised" | Fabricated |
What I almost received: "This is a pivotal phase where the probability of closing typically jumps to 75%... ensure there's no friction in the 'Perception Analysis' phase..."
What I actually received: "This is a pivotal phase where you are actively negotiating financial details and mutual plan alignment. Securing this deal would single-handedly shift the outlook of your entire pipeline."
The liar-catcher stripped the hallucinations before they reached me.
The point
We talk about AI replacing managers. But right now, AI is the over-eager junior rep who pads the numbers to look good. We don't need smarter AI. We need AI that's terrified of being wrong.
I still use AI insights to challenge my own analysis. The AI complements my judgment—it doesn't replace it. But I only trust it because I built the verification layer myself.
Trust, but verify? No.
Verify, then trust.
But here's the thing
The liar-catcher isn't about distrusting AI. It's about trusting it appropriately.
When constrained properly, AI saves me hours every week. It surfaces patterns across dozens of opportunities that I'd miss manually. It catches stalled deals, forgotten competitors, missing next steps. It drafts narratives grounded in actual data so my forecasts come with explanations, not just numbers.
I built guardrails so I could stop worrying about hallucinations and start benefiting from the speed and insight that makes this technology genuinely valuable.