Predicting campaign ROI before launch—how reliable are AI forecasts actually, and when should I trust them?

I got burned last month. We launched a campaign based on AI-generated performance predictions, and the results were… well, not great. The model forecasted a 4.5x ROAS, we hit 2.1x. That got me thinking about how much I should actually trust these predictions, and honestly, I’m skeptical now.

But before I dismiss AI forecasting entirely, I want to understand what went wrong. The model we used pulls from historical cross-market data—past influencer campaigns, engagement patterns, audience demographics—and generates predictions for new campaigns. In theory, that should work. More data usually means better predictions, right?

Except… markets aren’t identical. What works in the US might not work in Russia, even with the same product category. Seasonality, cultural trends, influencer credibility—all of those shift. And I’m wondering if the AI model I was using was actually accounting for those differences, or just averaging them out.

I’ve started validating predictions differently now. For campaigns with AI forecasts above a certain confidence threshold, I’m cross-referencing with:

  • Historical campaign performance for similar creators in the same market
  • Engagement quality metrics, not just volume
  • Seasonality adjustments for the specific launch window
  • Input from team members who know the market intimately

When I layer all of that together, the predictions feel more grounded. They’re not always right, but they’re more honest about their uncertainty.

Here’s what I’m wrestling with: is the issue with AI forecasting in general, or with how I’m applying it? Are you guys using predictive models for campaign planning? And more importantly—what made you trust or distrust the predictions you got? What signals actually mattered when the campaign went live?

ROAS prediction failures almost always stem from one of three issues: model training data quality, mismatched market conditions, or incorrect outcome definition. Let me break this down.

First, what data was your model trained on? If it was trained primarily on successful campaigns, it’s going to systematically overpredict because it’s not learning from failure modes. You need balanced training data that includes underperforming campaigns, seasonal dips, and market-specific variations.

Second, ROAS is a terrible outcome variable for early prediction because it conflates two independent problems: volume and margin. A campaign can miss volume targets by 40% but still hit ROAS if the product margin is high, or vice versa. I’d recommend predicting engagement lift or conversion rate separately, then multiplying those by your cost structure to get to ROAS.

Third—and this is critical—are you re-training your model with actual campaign results? Every campaign you run is a data point that should feed back into the model. If you’re not doing that, your predictions will systematically diverge from reality over time.

What I’d recommend: build a prediction confidence interval, not just a point estimate. Your model should be saying “ROAS will be between 1.8x and 5.2x with 80% confidence,” not “4.5x.” Then you can make informed decisions about whether that range justifies the spend. And track which campaigns fall outside the interval—those are your learning opportunities.

Man, I feel this. We’re launching in three new European markets right now, and I tried using an AI tool to predict campaign performance for each one. The same influencer, same product, different market—the model gave wildly different predictions. It told me the campaign would perform 40% better in Germany than in France, which made no sense given what I know about those markets.

So I stopped trusting the black-box predictions and started asking for more transparency. What actually matters when you’re predicting cross-market performance? I think it comes down to: Does the model understand local cultural nuances? Can it account for regional influencer credibility (some creators are huge in Russia but nobody in the US knows them)? Is it learning from regional campaign data, or just global averages?

Here’s what I started doing: I collect performance data from my existing campaigns and use that to calibrate the AI predictions. Like, if my actual ROAS is consistently 20% higher than the model predicts for my market, I factor that in. It’s not perfect, but it’s better than trusting the raw forecast.

I’m also starting smaller now. Instead of betting the whole budget on a prediction, I run smaller test campaigns first, see what actually happens, and use those results to inform bigger commitments. Slower, but less painful when predictions miss.

Your 4.5x vs 2.1x miss is actually a really useful data point. Let me ask a clarifying question: did the model miss on volume, conversion rate, or both? Because those require different troubleshooting.

I’ve found that AI forecasts tend to be more reliable for engagement metrics (likes, comments, shares) than for business outcomes (ROAS, conversions), because engagement is more directly tied to content quality and influencer reach. Conversions depend on so many downstream factors—landing page design, product seasonality, competitive landscape—that even the best model is just making an educated guess.

Here’s what I track now when evaluating forecast accuracy:

  1. Model bias: Is it consistently over- or under-predicting? (Yours seems to overpredict)
  2. Model variance: How wide is the prediction range? Tight ranges with high accuracy are great; wide ranges aren’t useful
  3. Market-specific performance: ROAS in one market often differs from another by 15-30%. Are you stratifying predictions by market?
  4. Holdout validation: Before launching live campaigns, I always validate the model’s predictions against recent historical data it hasn’t seen. That’s a better proxy for real-world performance than looking at training accuracy.

One tactical thing: ask your tool provider (or model builder) for calibration. Some tools let you weight recent data more heavily, or adjust for market-specific factors. That can close gaps between predicted and actual performance pretty quickly.

What’s the composition of your historical data? How many campaigns per market?

Look, I’m going to be straight with you: AI forecasting tools are useful for fast filtering and scenario planning, but they’re not reliable enough to be your primary decision driver yet. We use them to quickly model “what if” scenarios—“what if we spend 2x on this creator?”, “what if we add a second influencer from this niche?”—and that’s valuable. But when we go to pitch a client, I always include a margin of safety in our forecasts because I know the model can miss.

What actually works for us is combining AI forecasts with sales intuition. My team reviews the AI prediction, then we ask: Does this align with what we’ve seen from this creator in the past? Does it match client expectations? Do we know competitors doing similar campaigns, and how did they perform?

That human layer catches a lot of misses. For example, we had an AI model predict strong performance for a TikTok creator in the beauty space, but our team flagged that her audience skewed too young for our client’s product. Turned out the model was right about engagement, but wasn’t accounting for purchase intent. That’s not a model failure—that’s a variable the model can’t see.

My suggestion: use AI forecasts to narrow down your choices and get fast answers, but don’t rely on them for final investment decisions. Layer in domain expertise. And absolutely track prediction accuracy over time so you know when the model is drifting.

This is such a great question, and I love that you’re being honest about the miss. You know, from a relationship perspective, I think one thing AI models sometimes miss is the influencer’s own motivation and capacity. Like, an influencer might have the reach and engagement metrics to hit a 4.5x ROAS, but if they’re not genuinely excited about your product, or if they’re overcommitted that month, the results will suffer.

I always talk to creators before committing, and I ask: Are you excited about this? Do you have bandwidth? Has anything changed since these metrics were captured? That human conversation catches so much that data can’t.

Maybe the gap between predicted and actual performance is partly because the AI model is looking at historical metrics from a different time in the creator’s journey? People change, audiences evolve, creators get burnt out or hit new peaks. That’s why I think you’re right to validate with your team—they can spot those qualitative shifts that numbers can’t.