JOURNAL

How archetype clustering works

Archetype clustering is the stage of Candor's pipeline that turns a pool of evidence about an audience into 3 to 8 distinct types of person, each with calibrated personality and cognitive bias ranges. The personas you eventually interview are sampled from those archetype-level ranges, with sibling personas in the same archetype kept meaningfully different from each other.

This is the technical companion to the earlier piece on how evidence grounding works. Evidence grounding produces the signal pool. Archetype clustering turns the signal pool into the population structure that drives the rest of the study.

What an archetype is in Candor, and what it isn't

The category vocabulary in synthetic research is fuzzy, and the four terms most often confused are signal, segment, archetype, and persona. Each one is a distinct object in Candor's pipeline and they nest in a specific order.

A signal is an atomic piece of extracted evidence. It comes out of the evidence-retrieval stage and represents one structured claim: a behavior, a pain point, an attitude, a constraint, a goal, a belief, a preference, or a decision rule. Every signal carries a provenance tag and a source citation. Most studies produce hundreds of signals.

A segment is a cluster of the audience defined by inclusion criteria and supporting evidence. Segments come out of the audience-generation pipeline. The user reviews segments on the audience-review screen and decides which to include or skip before persona generation. Segments describe the population structure: who exists in the audience, in what proportions, with what characteristics.

An archetype is a cluster of 3 to 8 meaningfully distinct types of person within the included segments, each representing a fundamentally different way of being a member of the audience. Archetypes are generated after segment approval, by synthesizing the signal pool against the dimensions that best separate the audience. Each archetype has a name, a description, defining dimensions, an OCEAN personality profile expressed as a range per trait, a set of primary cognitive biases with intensity ranges, and a population weight indicating its proportion of the total persona set.

A persona is an individual synthetic respondent sampled from an archetype. Each persona inherits the archetype's core (defining dimensions, OCEAN range, bias range, memory structure template) and adds independent secondary traits that vary within those ranges. Multiple personas can share an archetype, and the system enforces meaningful separation between sibling personas so they don't trivially collide on personality or bias values.

The order matters: signals come from evidence, segments come from signal clustering at the audience layer, archetypes come from synthesizing signals across included segments, and personas come from sampling archetypes. Each layer carries the provenance of the layer below it. Anything a persona says in an interview traces back through the archetype's defining attributes, through the signal evidence those attributes rest on, to the source documents and published research the evidence was extracted from.

What the user controls, and what runs automatically

A useful methodology essay starts by being honest about what the user actually controls. In Candor's archetype-clustering stage, the user's direct decisions are limited and intentional.

What the user controls:

  • Study setup (audience type, learning goals, industry, region, any uploaded research).
  • Segment selection on the audience-review screen (toggle which segments to include or skip).
  • Interview-type selection at the interview-guide stage (one of problem discovery, problem validation, concept testing, or price testing).

What runs automatically inside the archetype-clustering pipeline:

  • Signal summarization (turning hundreds of signals into a structured implication pool).
  • Dimension selection (picking the 5 to 9 dimensions that best separate the included audience).
  • Archetype generation (synthesizing the archetypes themselves).
  • Critic validation (checking the archetypes for realism, distinctness, evidence alignment, and segment coverage).
  • Respondent sampling (drawing the specific OCEAN and bias values for each persona from the archetype's ranges, with sibling-distance enforcement).
  • Memory generation (building each persona's six memory structures from the signal evidence).

The user can inspect the dimensions and the candidate trait pool on the audience-review screen if they want to understand what the system is working with, but the dimensions that actually drive archetype clustering are picked by code, not by the user. This is deliberate. Letting users hand-pick clustering dimensions in a low-volume research context usually produces archetypes that confirm what the team already believes, which defeats the point of running the research at all. Automating the dimension selection from evidence-anchored variance makes the output less convenient to manipulate and more useful as a research instrument.

The pipeline at a glance

The archetype-clustering pipeline is one phase of persona generation. End to end, persona generation takes 7 to 12 minutes of background time after the user approves segments. The clustering stages run in this order:

  1. Signal summarization.
  2. Dimension selection.
  3. Archetype generation.
  4. Critic validation.
  5. Respondent sampling and memory generation.

The first four stages produce the archetypes themselves. The fifth stage produces the individual personas by drawing from the archetype-level ranges. The rest of this piece walks through each.

Stage 1: From signals to clusters

The signal pool that comes out of evidence retrieval is rich but unstructured for clustering purposes. Hundreds of behavioral, attitudinal, and contextual signals, each tagged with provenance and source, do not yet describe a clustered audience. They describe a population's behavior, attitudes, and decisions in fragments.

The first thing the persona-generation pipeline does is summarize those signals into a structured implication pool that's usable for clustering. The summary preserves the diversity of perspectives in the underlying signals (so the archetype stage doesn't collapse the audience into a single "average" view) and translates raw signal language into the categories that downstream clustering can reason about: which behaviors recur across the audience, which attitudes split the audience into recognizably different camps, which constraints differentiate one segment of buyers from another.

This summarization step is also where the pipeline starts to surface the shape of the audience that the segments alone don't show. Segments describe demographic and behavioral clusters; signals capture the underlying reasoning patterns. Two segments that look similar demographically can be quite different in how they reason about a decision, and the signal-summarization stage exposes those reasoning differences as inputs for the clustering work.

Stage 2: Choosing the dimensions that separate the audience

Before generating archetypes, Candor picks the dimensions that will actually separate one archetype from another. This is the trickiest part of the methodology, because the answer to "which dimensions matter" depends entirely on the audience.

A consumer audience for a wellness product might cluster cleanly on values, lifestyle, and risk tolerance. A B2B audience for a developer tool might cluster on engineering maturity, build-vs-buy disposition, and team size. A regulated CX audience might cluster on technology comfort, plan tenure, and prior experience with care navigation. Picking the wrong dimensions produces archetypes that look distinct on paper but reason about decisions identically.

The dimension-selection step runs deterministically inside the persona-generation pipeline. It pulls from the candidate trait pool that audience generation produced (the user can browse this pool on the audience-review screen, but doesn't pick from it) and selects 5 to 9 dimensions that:

  • Show high variance across the included segments (so the archetypes will actually look different).
  • Are grounded in the signal evidence (so the dimensions aren't generic stereotypes).
  • Distinguish on reasoning patterns, not just demographic surface (so the archetypes capture different ways of thinking, not different ways of looking).
  • Span the relevant category mix for the audience type (B2B emphasizes role, decision environment, organizational context, and constraints; B2C emphasizes lifestyle, identity, and aspirational self).

The output is a set of 5 to 9 selected dimensions, each with a short rationale for why it was picked. The archetype-generation stage uses these as the scaffolding for clustering.

The user doesn't choose these dimensions. The selection is automatic, deterministic, and grounded in evidence variance. The reason is the one already mentioned: in low-volume research contexts, letting users pick clustering dimensions tends to produce confirmation bias, not insight.

Stage 3: Generating the archetypes

With the selected dimensions in hand, the pipeline synthesizes the archetypes themselves. This is the most LLM-heavy step in the clustering pipeline, and it produces structured output rather than free text.

For each archetype, the generation step produces:

  • A name: a short, human-readable label (for example, "The Cautious Evaluator," "The Tooling Skeptic," "The First-Time Buyer"). Names are intended to be memorable and to capture the archetype's core stance, not to be flippant.
  • A description: two or three sentences summarizing the archetype's defining stance, current behaviors, and decision pattern.
  • Defining dimensions: the 3 to 5 dimensions where this archetype is most distinct from the other archetypes in the set. Each dimension carries a value and a distinctiveness indicator (how much this dimension separates this archetype from peers).
  • A full trait profile: the values for all of the selected dimensions on this archetype, not just the defining ones. This is the complete behavioral and attitudinal shape.
  • An OCEAN profile: a range for each of the five Big Five personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism). Ranges, not points. Each persona sampled from this archetype will get specific OCEAN values drawn from inside these ranges.
  • Primary biases: the cognitive biases that are most relevant to this archetype's decision-making, each with an intensity range (for example, status-quo bias 0.7 to 0.9). Candor's bias library covers twenty biases across shared, B2B-specific, and B2C-specific categories. Intensity ranges, not points, for the same reason as OCEAN.
  • Signal sources: which signals from the evidence pool informed this archetype's construction. The audit trail at the archetype level.
  • A population weight: the proportion of the total persona set that this archetype should represent. Population weights sum to 1.0 across the archetype set and drive how many personas get allocated to each archetype downstream.

How many archetypes get generated per study depends on three things: the number of included segments, the diversity level configured for the study, and whether the audience is niche. Niche audiences cap at 4 archetypes (because forcing 7 distinct types out of a thin population is a synthetic rigor problem, not a feature). Broader, higher-diversity audiences can reach 8. The typical mid-range study with 3 to 4 included segments and medium diversity produces 5 archetypes.

The archetypes are generated as a coherent set, not one at a time. The system reasons about all of them together so they end up meaningfully different from each other rather than minor variations on the same core type.

Stage 4: Critic validation

The generated archetypes go through an automatic critic agent before they're persisted. The critic checks the archetype set against four criteria:

  • Realism: does each archetype describe a plausible person given the signal evidence, or does any archetype combine traits in ways that don't track to documented behavior?
  • Distinctness: are the archetypes meaningfully different from each other on at least three dimensions, or are some archetypes too close to others to be useful as separate research targets?
  • Evidence alignment: do the defining dimensions on each archetype ground in the signal pool, or has the generation step invented traits that have no support in the underlying evidence?
  • Segment coverage: do the archetypes collectively represent the audience segments the user approved, or has the set drifted toward one segment at the expense of others?

The critic is structurally similar to the critic in the audience-generation pipeline. It runs as a separate validation step rather than as part of generation, and when it identifies issues, the system addresses them before continuing rather than passing weak output downstream. The user sees only the final, critic-passing archetype set, which is one reason archetype output tends to read as more coherent than a single-pass LLM persona generator would produce.

Stage 5: Sampling personas from archetype ranges

Once the archetype set passes critic validation, the pipeline generates individual personas by sampling from the archetype-level ranges. This is where the move from cluster-level structure to instance-level synthetic respondents happens.

Each archetype produces between 2 and 4 personas depending on the diversity level configured for the study, with the total persona count across all archetypes capped at 24. The allocation is proportional to each archetype's population weight, so an archetype with a 30% population weight gets more personas than one with 10%, while every archetype gets at least one persona (the minimum-coverage rule).

Each persona's sampling process:

  • OCEAN values are drawn from the archetype's OCEAN ranges. Two personas in the same archetype get distinct OCEAN profiles, never the same point in personality space. The system enforces meaningful separation between sibling personas on OCEAN so two personas in the same archetype don't accidentally end up nearly identical.
  • Bias intensities are drawn from the archetype's bias ranges. Same approach: sibling personas in the same archetype get different bias intensity values, so a study with three personas in one archetype produces three meaningfully different decision-reasoning profiles.
  • Secondary traits (demographic, B2B firmographic or B2C lifestyle fields, behavioral specifics) are sampled with secondary-trait variation, so even sibling personas have distinct surface identities.
  • Memory structures (six memory types per persona: identity, behavioral, belief, language, decision, and conversation memory) are generated from the signal evidence plus the persona's sampled traits. The conversation memory starts empty; the others are populated at generation time and updated as interviews accumulate.

The sibling-distance enforcement is one of the methodology details that separates archetype clustering from naive replication. Without it, sampling three personas from an archetype with overlapping OCEAN ranges can produce three personas who are statistical near-twins, which collapses the research breadth that having multiple personas per archetype is supposed to provide. The system prevents this at sampling time by checking proposed OCEAN profiles against already-sampled siblings and resampling if they collide.

B2B versus B2C: structural separation, not lip service

Candor models B2B and B2C as fundamentally different research domains throughout the archetype-clustering stage, not just at the cosmetic level. The differences show up in four specific places:

Dimension emphasis. B2B archetype dimensions emphasize organizational context, decision environment, role and responsibility, and external constraints (vendor lock-in, security review processes, procurement cycles). B2C archetype dimensions emphasize lifestyle, identity, aspirational self, and lived-experience constraints. The dimension selection stage uses different category weights depending on whether the study is B2B or B2C.

Cognitive bias baselines. B2B and B2C audiences have different default bias intensity baselines. Status-quo bias, sunk-cost fallacy, career-risk aversion, and authority bias run higher at baseline in B2B because the decision-making environment rewards these biases (changing tools is risky for the buyer's career, established vendors carry implicit authority). Optimism bias, present bias, and scarcity or FOMO effects run higher at baseline in B2C because consumer decisions are made under different incentive structures.

The private-versus-public-stance distinction (B2B specifically). B2B personas maintain two layers of belief: the persona's actual private view of a topic, and the persona's likely committee-public stance on the same topic. These can differ. A B2B persona might privately think a competing vendor has the better product but publicly advocate for the incumbent because supporting the existing choice protects their relationship with the procurement team. This duality is baked into B2B archetype memory structures.

Decision-rule shape. B2B decision rules tend to include explicit committee dynamics (who needs to approve, in what order, with what evidence). B2C decision rules tend to include impulse and identity considerations. The system models these differently rather than treating them as variations of the same underlying decision framework.

The result is that B2B archetypes and B2C archetypes don't just have different content. They have different structural shapes, which produces interview behavior that reflects how each audience actually reasons rather than producing B2B interviews that feel like consumer interviews with formal language pasted on top.

What archetype clustering gives you, and what it still can't do

The cumulative effect of the five stages is a population structure that's grounded in evidence, audited at the cluster level, and varied at the persona level. Specifically:

Meaningful breadth in your research. Three personas in the same archetype aren't statistical near-twins; they're three different reasoning profiles within the same broad type. A five-archetype study with 15 personas gives you 15 meaningfully different interview perspectives, not 15 variations of three perspectives.

Provenance from cluster to instance. When a persona says something distinctive in an interview, you can trace through their archetype's defining attributes to the signal evidence that grounded those attributes, and back to the source documents. The audit trail crosses every layer.

Calibrated psychology, not labels. Personality and bias enter the model as ranges that get sampled, not as binary tags on the personas. A persona with "high anchoring bias" actually reasons differently from a sibling with "moderate anchoring bias," because the bias intensity is a real continuous parameter, not a flag.

Honest population structure for the included segments. The archetype set represents the audience the user approved, in proportions that reflect that audience, with weighting visible to the researcher rather than hidden in averages.

What archetype clustering can't do:

It can't capture genuine outliers below the population-weight threshold. A 5-archetype model collapses minor variations into the nearest archetype. If your audience contains a small but strategically important subgroup that doesn't justify its own archetype slot, that subgroup is folded into the cluster nearest to it and loses some fidelity. The fix is to include that subgroup as a separate audience segment at Phase 1, not to expect clustering to recover it.

It can't update the archetype core from interview feedback. Personas update their belief memory and decision memory as interviews accumulate within a study. Archetype-level traits don't update. If the underlying audience drifts mid-study, you'd see it in persona-level belief evolution but not in the archetype set itself. For longitudinal questions, the right answer is to re-run audience generation with updated evidence rather than expecting in-study drift to propagate up.

It can't model per-domain bias variation. A persona's bias intensity is a single value across all decision domains in the study scope. Real human anchoring bias on price differs from anchoring bias on feature comparisons; the system uses a single intensity for both. This is a fidelity compromise the methodology accepts in exchange for tractable models.

It can't recover from poorly-defined segments at Phase 1. Archetype clustering inherits the segment structure the user approved. If the segments are wrong (too broad, too narrow, missing a real subgroup, including a non-existent one), the archetypes reflect that flaw. The critic agent validates archetypes against evidence, not against real-world segment structure, so it can't catch this.

It can't model genuinely novel audiences with no evidence support. The clustering pipeline depends on signal evidence to be meaningful. For a brand-new category or a niche professional segment with no public research coverage and no first-party data, the candidate trait pool will be thin and the resulting archetypes will be flagged as weak-confidence rather than rigorous research output. The honest path in those cases is to either acquire first-party evidence first or treat the synthetic research as hypothesis-grade rather than findings-grade.

Common questions

Between 3 and 8, with most studies landing at 5. The exact count depends on the number of included audience segments, the diversity level configured for the study, and whether the audience is classified as niche. Niche audiences cap at 4 archetypes because forcing more clusters than the evidence supports produces synthetic rigor problems. Broader audiences with high diversity can produce 7 or 8.

An archetype is a cluster: a type of person, with defining dimensions, OCEAN and bias ranges, and a population weight. A persona is an individual instance sampled from one archetype, with specific OCEAN and bias values drawn from the archetype's ranges and unique secondary traits. Multiple personas can share an archetype, and the system enforces meaningful separation between sibling personas so two personas in the same archetype don't trivially collide on personality or reasoning patterns.

No, and that's intentional. The dimension selection step runs deterministically inside the persona-generation pipeline, picking 5 to 9 dimensions from the candidate trait pool based on variance across segments and grounding in the signal evidence. Users can browse the candidate trait pool on the audience-review screen if they want to understand the methodology, but the actual selection isn't a user pick. The reason is that letting users hand-pick clustering dimensions in low-volume research contexts tends to produce archetypes that confirm what the team already believes, which defeats the point of running the research.

A range allows sibling personas in the same archetype to be meaningfully different from each other rather than statistical near-twins. If an archetype has an extraversion range of 0.6 to 0.9, three personas sampled from that archetype get three different extraversion values inside that range, and the system enforces minimum separation between siblings. This produces actual breadth within an archetype, not just three copies of the same personality profile. Cognitive bias intensities work the same way: ranges at the archetype level, distinct sampled values at the persona level.

Yes, structurally. The dimension selection stage emphasizes different category mixes (organizational context and role for B2B; lifestyle and identity for B2C). The cognitive bias library applies different baseline intensities (status-quo bias and career-risk aversion run higher in B2B; optimism bias and present bias run higher in B2C). B2B personas maintain a distinction between their private view of a topic and their likely committee-public stance, which doesn't exist in the B2C model. The decision-rule shapes differ accordingly. These aren't cosmetic differences. They produce interview behavior that reflects how each audience actually reasons.

Five honest limitations. It can't capture genuine outliers below the population-weight threshold, which means small strategically-important subgroups get folded into the nearest archetype. It can't update the archetype core from interview feedback, so longitudinal drift questions need re-running audience generation rather than waiting for the clusters to evolve. It can't model per-domain bias variation; one bias intensity covers all decision domains. It can't recover from poorly-defined segments at Phase 1, which the user controls. It can't model genuinely novel audiences with no public evidence; in those cases the candidate trait pool is thin and the archetypes are flagged as weak-confidence rather than research-grade.

Candor is in development.

Be the first to know when it launches.

No spam. Just a note when Candor is ready. Powered by Highline Beta.