JOURNAL

Why synthetic research needs evidence grounding (and most tools skip it)

The shortcut for building a synthetic persona is straightforward. Give an AI a description of your audience. Ask it to roleplay. It works. Sort of.

The persona sounds plausible. It has opinions. It answers your questions in full sentences. What it doesn't have is any connection to what your audience actually thinks, feels, or does. It's a confident fiction. And confident fictions are the most dangerous research outputs there are, because they confirm what you already believe and bury what you didn't know.

This is the line that separates synthetic research from AI improvisation: evidence grounding. Here's what it is, why most tools skip it, and what you lose if you skip it too.

The shortcut everyone takes

Spin up a chat-based AI. Write a prompt: "You are a 35-year-old marketing director at a mid-market SaaS company. You're frustrated with attribution. Answer my questions in character." Hit enter.

The model generates a character. The character has frustrations. The character has opinions. The character will tell you it values data-driven decisions and finds tool sprawl exhausting, because every marketing director on the internet has said those things in training data.

This is the persona-generation default. It's what happens when you ask any general-purpose AI model to roleplay. It's what happens inside most one-page "AI persona generator" tools. It's also what happens inside some products that call themselves synthetic research platforms.

The shortcut works because language models are pattern matchers. Given a description, they fill in plausible attributes. Given a question, they produce plausible answers. The output is fluent and internally consistent. It looks like research. It just doesn't come from anywhere real.

What evidence grounding actually means

Evidence grounding flips the order of operations.

Instead of generating a persona and asking it questions, you start by retrieving real evidence about the target audience. Published research. Your uploaded customer interviews. Survey data. Market reports. Public web sources. Validated behavioral distributions from peer-reviewed studies.

Then you synthesize a persona from that evidence, with every attribute tagged for provenance: grounded in source data, inferred from a behavioral pattern, calibrated from a published distribution, or sampled at random when no evidence is available. Every claim the persona makes in an interview can be traced back to its source.

The personality traits don't get assigned by the AI's intuition either. They come from peer-reviewed OCEAN distributions by region and occupation. Cognitive biases are assigned at research-backed intensities, not as binary labels. Responses are checked against the persona's established profile before delivery, so the persona at question 7 doesn't contradict itself at question 2.

This is what "synthetic research" means when it's done as research and not as AI improvisation. It's a real method, with a real audit trail.

Why most synthetic tools skip it

Evidence retrieval is hard engineering work that doesn't show up in the demo.

To do it well, you have to build document parsing, embedding pipelines, web search infrastructure, signal extraction, gap-filling logic, critic agents, and provenance tracking. None of that is visible when a user clicks "generate persona" and sees output in 10 seconds.

Without the engineering, you can ship a persona generator in a weekend. With it, you're building a system. Most vendors took the weekend route, because the weekend route demoes well and ships fast.

The result is a category where the marketing copy claims "synthetic research" but the underlying method is closer to "AI imagines a customer for you." The two produce different outputs, and the difference matters more than the surface-level fluency suggests.

It also produces a buyer-side problem. If you're evaluating a vendor and you don't know what to ask, you'll see the same kind of confident persona output across rigorous and lightweight tools. The output looks comparable. The methodology behind it is not.

What you lose without it

Three things, all of which compound over time.

Confirmation bias amplified. A persona generated from a description tends to agree with the description. You wrote that your audience is frustrated with attribution. The persona will be frustrated with attribution. You wrote they want self-serve onboarding. They want self-serve onboarding. You didn't really learn anything new. You just got a more articulate version of your own assumptions, delivered in someone else's voice.

Discovery surface collapsed. Real customers surprise you. They surface workarounds you didn't know existed, frustrations you didn't think to ask about, decision criteria you wouldn't have considered. Evidence-grounded synthetic personas can do this too, because the evidence includes real-world signal you haven't seen. Ungrounded personas can't, because there's no signal in the system other than what you put in.

Trust erodes. The first time a stakeholder asks "where did this finding come from?" and the answer is "the AI said so," you've lost the persuasive value of the research. Without provenance, synthetic insights are an inputless output. Hard to defend in a stage gate. Hard to revisit later. Easy to dismiss when someone disagrees with the conclusion.

The teams using ungrounded persona tools rarely notice this immediately. The findings seem useful. They're plausible, internally consistent, and aligned with the team's intuitions. The real cost shows up later, when the launched product underperforms the synthetic research's predictions and nobody can reconstruct why the research said what it said.

The thing worth saying out loud

Synthetic research is real research when it's grounded in real evidence. It's AI improvisation when it isn't. Both have uses. They are not the same product, and they should not be priced, evaluated, or trusted the same way.

If you're considering a synthetic platform, ask the vendor three questions:

  1. Where does the evidence for your personas come from?
  2. What personality and bias model do you use, and how is it calibrated?
  3. How do you enforce consistency across interview sessions?

If the answers are specific (data sources, citation methods, named frameworks, validation methods), you're looking at synthetic research. If the answers are some version of "the AI figures it out," you're looking at AI improvisation, and you should price it accordingly.

We built Candor with evidence grounding as the foundation, not the feature. That's the thing worth saying out loud in this category. The shortcut works for AI improvisation. It doesn't work for research.

Common questions

Ask three questions. Where does the evidence come from? What personality and bias model is used, and how is it calibrated? How is consistency enforced across interview sessions? Vendors with rigorous methodology will answer specifically: data sources cited by name, frameworks named, validation methods explained. Vendors without it will give general answers about AI capability. The clarity of the answer tells you a lot.

Yes, but it's important to know which mode you're in. Brainstorming personas from a description is a useful exercise for stakeholder alignment and early hypothesis generation. Evidence-grounded research is for decisions where you need traceable findings. Use the right mode for the question. The risk is when a tool blurs the two without telling you which one is running under the hood.

In traditional research, the evidence is the participant. You're talking to a real customer; their words are the evidence by definition. In synthetic research, there is no participant by default. The AI has to be grounded in something to produce credible output. That something has to be real evidence about the audience, or the persona is just generating plausible-sounding fiction. Evidence grounding is the equivalent of having real participants in the room.

A little, but not in the way people expect. Once the document parsing, web evidence retrieval, and signal extraction stages complete, the actual interview runs at the same speed as ungrounded persona interviews. The slower part is one-time per study: the upfront evidence layer. The compounding benefit is that every subsequent interview pulls from the same grounded evidence base, so research-grade outputs come at the same speed as ungrounded ones after the first study.

Candor is in development.

Be the first to know when it launches.

No spam. Just a note when Candor is ready. Powered by Highline Beta.