In Candor, evidence grounding means every attribute on every persona traces back to a real source: published research, your uploaded documents, validated population distributions, or flagged inference. The audit trail is visible to anyone reviewing the study. This piece walks through how the pipeline actually works.
If you want the opinionated version of why this matters, read why synthetic research needs evidence grounding. This piece is the technical companion: what the system does, step by step, and what the methodology gives you.
What evidence grounding means in practice
Most "AI persona" tools work the same way. The user describes an audience. The AI invents a character that fits the description. The output sounds plausible because language models are pattern matchers, and the patterns of "a 35-year-old marketing director" exist in training data abundantly enough to produce a fluent character on demand. There is no link between the persona's attributes and any specific evidence about the actual audience.
Evidence grounding flips that. Before any persona is generated, Candor retrieves real evidence about the target audience: published research, peer-reviewed studies, market reports, any first-party documents the customer uploaded, and validated behavioral and demographic distributions. The personas are constructed from that evidence, not from a prompt alone. Each attribute carries a tag showing which source informed it and at what confidence level. When a persona answers an interview question, the underlying reasoning traces back to evidence that exists in the system, not to the AI's pattern-matched best guess.
The practical difference shows up in three places. First, the personas reason about pain points, beliefs, and behaviors in ways that reflect what the actual audience has been documented to say, not the AI's compressed average of every "marketing director" in training data. Second, the methodology is auditable: a UXR lead or insights director can inspect the provenance and push back where the evidence is thin. Third, when the audience evolves (new market entrants, shifting attitudes, new behavioral data), re-running the evidence stage produces personas that reflect the updated reality, not the same generic pattern.
The rest of this piece walks through the five stages that produce that result.
Stage 1: Evidence retrieval. Where the data comes from
The first stage of every Candor study is evidence retrieval. At study setup, the user describes the target audience: B2C or B2B context, the learning goals they want to address, the industry, and the region. They can also upload any first-party research they already have. The specific interview type is selected later, when the interview guide is generated, and value-prop testing and assumption validation are layered formats that ride on top of the chosen interview type. From the initial audience input, Candor builds a search plan and runs two parallel retrieval workflows.
Uploaded documents come first. If the customer has uploaded research (PDF, DOC, DOCX, CSV, TXT, or MD files: customer interview transcripts, survey data, market reports, prior research deliverables), Candor parses and indexes those documents. They become the primary evidence source. Internal research about your specific audience is more relevant than general public evidence about the broader category, so uploaded documents get weighted higher in retrieval.
The documents are parsed into chunks, embedded into a vector space, and stored in a searchable index. When the pipeline needs evidence about a specific dimension of the audience (their pain points, their decision criteria, their attitudes toward category X), it retrieves the relevant chunks via vector search.
Web search runs alongside the documents. In parallel with the document layer, Candor runs structured web searches across published research, industry analyses, peer-reviewed studies, public consumer-health and market-behavior datasets, and any other source likely to contain evidence about the audience. The search strategy is iterative rather than batch: the pipeline runs a broad initial pass, then assesses what's been retrieved, identifies coverage gaps, and runs targeted follow-up searches to fill them.
The web layer is critical when the uploaded documents are thin or when the audience extends beyond what the customer has researched directly. For pre-launch products, healthcare populations that are hard to recruit, regulated industries, and any audience where first-party research is incomplete, the web layer is often the bulk of the evidence.
Both layers feed the same evidence pool, with provenance distinguishing them. When a persona attribute later traces back to "Source A," the provenance tag indicates whether Source A was a customer-uploaded document or a public source retrieved by Candor. This matters at the synthesis stage, where the customer can see which findings rest on their own data versus public evidence.
Stage 2: Signal extraction. Turning raw evidence into structured persona attributes
Retrieved evidence by itself is not yet usable. A PDF of a customer-interview transcript or a peer-reviewed paper about consumer behavior is unstructured text. To generate personas from it, Candor first extracts structured signals from the evidence.
A signal is a structured representation of one piece of behavioral or attitudinal information about the audience. Candor extracts hundreds of signals per study across several categories:
- Behaviors: what the audience actually does (purchase patterns, app usage, communication preferences, decision-making sequences)
- Pain points: what frustrates them, what they avoid, what they actively complain about
- Attitudes: what they believe, what they're skeptical of, what they value
- Constraints: what limits their choices (budget, time, regulatory, organizational, capability)
- Goals: what they're trying to accomplish, both stated and implied
- Beliefs: what they assume to be true about the category, the alternatives, the world
- Preferences: what they prefer when given choices, including the reasoning behind the preference
- Decision rules: how they actually decide, including shortcuts and heuristics
Each signal is extracted from a specific piece of evidence and tagged with metadata: which source it came from, which category it belongs to, what confidence level the extraction has, and what segment(s) of the audience it applies to.
The extraction is run by language models with structured-output validation, so the signals come out in a consistent schema rather than as free text. This matters because the next stages (segmentation, persona generation, interview behavior) all operate on the structured signal representation, not on raw evidence.
Signal extraction is also where the pipeline starts to see the shape of the audience: which segments cluster around similar pain points, which signals are universal versus segment-specific, where the evidence is dense versus thin. That shape feeds the next stage.
Stage 3: Provenance tagging. The audit trail at the attribute level
This is the stage where Candor's approach to rigor becomes visible in the product surface. Every signal extracted from evidence (and every attribute later derived from those signals on individual personas) carries a provenance tag. There are five:
- Grounded: the attribute traces directly to a specific source in the evidence pool. The customer can click through to the underlying citation. This is the highest-confidence tag; the attribute reflects something that was documented in the audience's actual behavior or attitudes.
- Inferred: the attribute was extrapolated from a behavioral pattern documented in the evidence. The pattern is grounded; the specific extrapolation is the inference. This is common when the evidence supports a general behavior and a specific persona instance follows from it.
- Calibrated: the attribute was drawn from a validated statistical distribution (a peer-reviewed population distribution for personality traits, a research-backed range for a behavioral parameter). The persona's specific value is a sample; the distribution it was sampled from is calibrated.
- Sampled: the attribute was sampled at random because no specific evidence was available and the value doesn't depend on validated distributions. This is honest noise.
- Weak-confidence: the attribute is grounded or inferred, but the underlying evidence is sparse enough that the customer should treat it as a hypothesis worth real-customer validation rather than as a finding.
The provenance tag is visible on every persona attribute and on every signal cited in the synthesis report. When a synthesis finding reads "Personas in Segment A consistently described claims-process friction as their top frustration," the customer can drill into the underlying signals, see which sources contributed, and see the confidence level of each contributing signal.
This is the auditability that separates synthetic research from AI improvisation. A UXR lead reviewing a study can challenge the methodology at the attribute level, not just at the conclusion level. An insights director defending a launch decision to leadership can cite the provenance chain rather than relying on the team's word that the AI was "trained on the right data." A skeptical stakeholder asking "where did this finding come from" gets a real answer with a real source, not a wave at the model.
Stage 4: Iterative gap-fill. Closing the holes
Evidence retrieval doesn't return uniform coverage. Some audiences have rich published research and dense first-party data; others have sparse coverage in one segment or across one signal category. Candor's pipeline assesses coverage after the first retrieval pass and runs targeted follow-up searches to fill identified gaps.
The gap-fill iteration looks at:
- Signal-category balance: are some categories (behaviors, pain points, attitudes, constraints, etc.) under-represented relative to others? If so, run searches biased toward the under-represented categories.
- Segment coverage: are all the defined audience segments equally represented in the evidence pool? If one segment is thin, run searches targeted at that segment.
- Diversity within signals: are the retrieved signals reflecting a diverse range of perspectives, or are they clustering around a single source's framing? If the latter, run searches to broaden the perspective base.
- Recency: is recent evidence present, or is the retrieved material older than is useful for the question? If older, run searches biased toward recent sources.
Each gap-fill iteration runs targeted web searches against the identified gaps, extracts signals from the new evidence, and re-assesses coverage. The loop continues until either (a) the gaps are sufficiently closed, or (b) the search budget for the study is exhausted.
This iterative pattern is why audience generation in Candor takes 25 to 35 minutes (rather than seconds): the system is actually doing meaningful evidence work, not generating a persona from a prompt. The minutes are spent retrieving and extracting evidence, not spent waiting on slow infrastructure.
Stage 5: Critic review. Quality control before persona generation
Before the evidence pool is handed off to persona generation, a critic agent reviews the assembled evidence and signal set for quality and diversity. The critic is a separate AI agent with a structured rubric covering:
- Diversity of perspectives: are multiple viewpoints represented, or is the evidence biased toward one framing of the audience?
- Signal redundancy versus coverage: has the gap-fill iteration produced redundant signals (the same finding from too many sources, with diminishing marginal information) or genuine breadth?
- Provenance health: is the evidence pool dominated by weak-confidence signals (which would produce hypothesis-grade personas rather than research-grade personas), or is there a healthy proportion of grounded and calibrated signals?
- Segment differentiation: do the segments actually differentiate from each other on the signal evidence, or are they overlapping enough that the segmentation may not be meaningful for the study?
When the critic flags issues, the pipeline either runs additional retrieval to address them or returns the issues to the synthesis stage as caveats the customer should see in the final report. The critic isn't a binary pass/fail gate; it's a quality-signal layer that surfaces concerns the customer should know about before drawing conclusions.
The critic step also catches a class of failure modes that earlier synthetic research tools regularly shipped without noticing: when the evidence pool contains internal contradictions, when one source dominates the signal pool disproportionately, or when the segmentation doesn't actually correspond to differentiated signal patterns. These aren't catastrophic failures; they're quality issues that, surfaced honestly, let the customer decide whether to proceed, re-run with different inputs, or escalate the question to real-customer research.
What evidence grounding gives you (and what it still can't do)
The cumulative effect of these five stages is research with an audit trail. Specifically:
Personas that reason about the audience's real patterns. Pain points, beliefs, and decision rules in the synthetic interviews track to documented evidence about the audience, not to AI-pattern-matched stereotypes. This produces interview content that researchers familiar with the audience will recognize as authentic rather than dismiss as generic.
Findings that survive stakeholder scrutiny. When a synthesis report goes to a stage gate, the provenance chain is the answer to "where did this come from." Researchers can defend the methodology. Insights leads can challenge specific attributes rather than dismissing or accepting the whole study as a black box.
A methodology that adapts to new evidence. When the audience evolves, re-running the evidence stage produces personas grounded in the updated evidence, not in last year's snapshot. This is part of why synthetic research can model drift over time: the underlying evidence is reload-able.
A clear handoff to real-customer research. Synthetic research is at its best as a first-pass layer that sharpens what to study with real customers. The provenance trail makes this handoff concrete: the surviving hypotheses are documented, the evidence each one rests on is named, and the gaps the real research should address are explicit.
What evidence grounding still can't do:
It can't produce statistically-bounded point estimates. The signals are qualitative; the personas reason about the audience; the synthesis finds themes and tensions. None of this produces "27% of customers prefer X, plus-or-minus 3pp at 95% confidence." That number requires real-respondent panel research at sufficient N.
It can't substitute for real-customer ground truth. The evidence pool reflects what's been documented; real customers behave in ways that haven't been documented yet. Real-customer research surfaces the unexpected; synthetic research reflects the expected within evidence-supported bounds.
It can't make up for missing evidence. If the audience is genuinely under-researched (a brand-new category, a niche professional segment with no public coverage), evidence grounding only goes as far as the evidence does. The gap-fill iteration helps; it doesn't manufacture evidence that doesn't exist. In those cases, the critic surfaces the gap, and the responsible move is to lean harder on the customer's first-party data or treat the synthetic findings as hypothesis-grade.
It doesn't replace real human judgment about what the research means. Evidence grounding produces grounded findings. Strategic interpretation of those findings remains a human job. The teams getting the most value from Candor treat the platform as an evidence-acceleration layer and the strategic synthesis as their own work.