Running A/B tests on referral messaging across languages—does changing the message actually move the needle?

I’ve been working with brands in both the US and Russian markets for a while now, and one thing I keep noticing is that a message that kills in English falls completely flat when translated to Russian, and vice versa. It’s not even a translation thing—it’s a tone thing, an urgency thing, an incentive framing thing.

We recently tried to run a referral campaign with consistent messaging across both markets. Same offer, same timeline, same call-to-action. The US side saw maybe a 12% conversion on the referral message. The Russian side? 3%. That’s a massive gap, and it made me think: what if we actually tested different versions of the message in each language?

So I started experimenting. For the Russian audience, I tried positioning the referral as a way to “build a trusted network and get priority access to exclusive campaigns.” For the US side, I went with “refer a partner and earn commission on every project they bring in.” Both are fundamentally the same offer, but the framing is totally different.

The problem is, I don’t have a structured way to run this. I’m just kind of guessing at what might work, sending out different versions to different groups, and hoping I can remember what performed better. No tracking, no statistical significance, no way to know if I’m actually onto something or just getting lucky.

Has anyone actually set up a proper A/B testing framework for referral messages across languages? What metrics did you track? Did you notice that the message actually matters more than the incentive, or is it the other way around?

You’re asking exactly the right question, and the fact that you saw a 12% vs. 3% split tells me your testing instinct is correct. That’s too big a gap to ignore.

Here’s how I’d structure this properly:

Test Setup:

  • Declare your hypothesis upfront. Example: “Framing referrals around network prestige will drive higher conversion in Russian markets than commission-based framing.”
  • Split your audience randomly and send different message versions to each segment. Keep everything else identical—timing, channel, incentive structure.
  • Track conversion at the single metric level: percentage of people who complete the referral action (not just click, but actually refer someone).
  • Run for long enough to get statistical significance. With email lists, that’s usually 2-4 weeks minimum.

Metrics to track:

  1. Click-through rate (did they open and engage?)
  2. Referral completion rate (did they actually refer someone?)
  3. Quality of referral (did the person they referred convert?)
  4. Time-to-conversion (how long between seeing the message and making a referral?)

The thing is, you probably care most about #3. A high completion rate means nothing if the people being referred don’t convert.

Quick win: Start with messaging alone, variables. Don’t change the incentive structure yet. Once you know what message drives behavior, then optimize the incentive.

Are you testing via email, or through the platform itself?

One more critical thing: make sure your sample size is large enough per variant. If you’ve got 50 people per message variant, you’re going to see noise, not signal. I’d aim for minimum 200-300 people per variant, ideally more depending on your baseline conversion rate.