When concept testing with synthetic users works (and when it doesn't)

Synthetic concept testing is one of the strongest use cases for synthetic research. It is also one of the easiest to misuse. The whole game is knowing which concepts belong on synthetic and which belong on a real-respondent panel.

Concept testing is one of synthetic research's strongest use cases. It's also one of the easiest to misuse. The teams getting the most out of synthetic concept testing know exactly which concepts belong on synthetic and which ones belong on a real-respondent panel. The teams getting the least out of it don't, and they spend the difference learning the distinction the slow way.

Here's the decision framework, the four conditions where synthetic concept testing earns its keep, and the four conditions where it falls apart.

What concept testing actually is

Before the framework, a working definition. Concept testing is a research method where you put a concept (a product idea, a positioning angle, a value-prop framing, a campaign idea, a pricing structure) in front of your target audience and measure how they respond. The output answers two questions: does this resonate, and if not, why not.

Two main shapes. Monadic concept testing puts one concept in front of one set of respondents at a time. Comparative concept testing puts multiple concepts in front of the same respondents and asks them to react to each, often ranking or choosing between them. Both shapes work in Candor, with the platform mapping them to different interview-guide structures.

The traditional cost profile is the same as the broader synthetic-research economics: a panel-based concept test costs $15K to $50K (industry median around $23K) and takes four to eight weeks per round including recruitment and analysis. Most insights teams run four to six panel rounds a year and kill the other concepts on internal judgment. That's the math problem synthetic concept testing was built to solve. See concept testing for the use-case-level walkthrough.

The four conditions where synthetic concept testing earns its keep

These are not all-or-nothing tests. They're cumulative. The more of them a concept-test situation hits, the stronger the synthetic case.

You have more concepts than panel budget can absorb. This is the core economic argument. Concept volume is project-driven, not annually budgeted, and it stacks fast across every shape of team. A startup iterating on features, value propositions, and positioning typically tests 10-50 concepts per project. A consultancy running a strategy engagement is in the same range. A large company launching new products in-market is testing concepts across upfront screening, iteration, and finalist rounds at similar volume per launch. Most teams run multiple such projects a year, stacking up to hundreds of concepts tested in some form. Panel research can absorb a small fraction of that. Most concepts get killed on internal judgment rather than respondent signal. Synthetic screening replaces the intuition step with a respondent-grounded one across the full volume, then panel research handles the survivors that reach launch-gate scrutiny. Volume and budget asymmetry is where synthetic concept testing pays back fastest.

The concept is text-describable. Candor's synthetic personas react to written concept descriptions, not to designed visuals. A concept like "a value tier that includes priority support and a quarterly check-in for $X/month" tests cleanly. A concept like "this specific packaging redesign with this label and these colors" doesn't, because Candor doesn't currently support multimodal review of visual stimuli. The category-wide shorthand "concept testing" covers both shapes; synthetic is good at the first and not yet good at the second. If you can write your concept in two paragraphs and a buyer would understand it, synthetic concept testing fits.

You're at the early gates, not the launch gate. Synthetic concept testing produces qualitative-style depth and directional signal. It does not produce statistical point estimates with confidence intervals. For early decisions about which concepts to invest in further, directional signal is what you need. For the final launch gate, where the decision triggers manufacturing or media spend at scale, you need real-respondent data with the methodology a launch committee can defend. Synthetic at the early gates; panel at the launch gate.

You need to iterate. Most concepts that survive a test need refinement. Synthetic concept testing supports rapid iteration: refine the concept, re-test against the same audience definition, compare. The platform time per iteration is roughly one to two hours; the calendar time per iteration is whatever pace your team can move at. Panel research can't iterate at this speed because each round is a fresh recruitment cycle. Iteration is where synthetic concept testing's speed advantage compounds.

These four conditions overlap in practice. Most teams hitting one are hitting two or three. When you hit all four, synthetic concept testing isn't a nice-to-have, it's an unfair advantage over teams that don't use it.

The four conditions where synthetic concept testing falls apart

The failure mode for synthetic concept testing is using it where it doesn't belong, then concluding the method doesn't work. Four conditions where synthetic is the wrong tool.

The concept is visual or interactive. Packaging redesigns, mobile-app screens, prototype flows, ad creative, video storyboards. These need a real respondent who can see and click. Candor doesn't currently support multimodal review of visual stimuli. If your concept lives in a Figma file or a render, send it to a real-respondent platform that handles design feedback. Synthetic isn't the right tool yet, and pretending otherwise produces test outputs that read plausible but reflect the persona's reaction to your textual description of the design, not the design itself.

The output needs to substantiate a regulated claim. Anything that's going on a label, in a clinical document, in a regulatory submission, or in an advertising substantiation file needs real-respondent data with documented sampling methodology. Synthetic research is appropriate for exploring concepts and screening which ones are worth substantiating later. It is not appropriate for the substantiation itself. The legal exposure is real. Stay on panel methodology where the regulator expects panel methodology.

The decision requires a statistical point estimate with confidence intervals. "27% of category buyers prefer Concept A, plus-or-minus 3 percentage points at 95% confidence." That kind of number requires real-respondent sample sizes drawn with documented methodology. Synthetic concept testing produces directional signal across persona variance, not statistically-bounded percentages from real-respondent N. If your launch committee's gating criterion is a specific percentage with a specific confidence interval, that's a panel job by definition.

You're at the final launch gate. When the decision triggers manufacturing, distribution agreements, media spend, or other commitments at scale, the cost of being a little wrong is high enough that you want real-respondent validation in the loop. Synthetic concept testing is the right first pass and the right iteration tool. It's not the right last word before a major launch. The teams that try to skip the panel round entirely are the ones that learn this distinction the slow way.

The hybrid model is the answer

The cleanest mental model: synthetic concept testing is a screening layer that runs before panel research, not a replacement for it.

A typical project running through this hybrid looks like this. Stage one: the team generates the project's concept pool (10-50 concepts depending on the scope of the work). Stage two: synthetic-test all of them against the target audience. Each test runs in roughly one to two hours of platform time; even a 50-concept project fits in a couple of work weeks of insights-team time. Stage three: review the synthetic output, kill the weakest with respondent-grounded reasoning, advance the promising ones to iteration. Stage four: refine the survivors through one or two rounds of synthetic iteration, narrow to a handful of finalists. Stage five: send the finalists to a real panel for statistical validation, claim substantiation, and the launch-committee-grade evidence the decision actually requires. A team running multiple such projects a year stacks up hundreds of concepts tested at the synthetic layer while concentrating panel spend on the high-stakes finalists.

The math at the panel layer: each panel round runs $23K to $50K (industry median around $23K). Most teams aren't budgeted for panel rounds on every concept they generate; they pick a small number of finalists and reserve panel research for those. The hybrid version doesn't change the per-round panel cost; it shifts what those rounds buy. Instead of splitting panel budget across concepts that haven't been screened, panel rounds concentrate on finalists that have already survived synthetic iteration, producing richer per-concept studies. Synthetic screening adds research coverage on the concepts that would otherwise die on intuition.

The pattern works because synthetic and panel research are good at different things, and the strongest play is sequencing them rather than picking one. Teams who treat it as zero-sum get less out of both tools than teams who run the hybrid.

A decision framework you can actually use

If you're trying to decide whether to send a concept through synthetic or send it straight to panel, four questions in order.

One: can you describe the concept in text? If yes, synthetic can react to it. If no (visual, prototype, design-dependent), it's a real-respondent job until Candor adds multimodal support.

Two: will the output drive an early-stage decision or a launch-gate decision? If early-stage (which concepts merit further investment), synthetic is in scope. If launch-gate (manufacturing commitments, advertising substantiation, regulatory submission), it's a panel job.

Three: do you need a statistically-bounded number, or do you need to understand which concepts resonate and why? If the former, panel. If the latter, synthetic produces the explanatory signal cleanly.

Four: how many concepts do you have, and what's the panel budget? If concept volume exceeds what panel rounds can cover (the common case for any team running more than four or five concepts a year), synthetic does the screening so panel does the validating.

Four yes answers, in order, means synthetic concept testing fits. Any no, and the concept goes to panel directly, or waits for a future Candor capability that handles the specific failure mode. The teams getting the most value treat this as a per-concept routing decision, not a per-organization methodology choice.

If you're considering synthetic concept testing as a layer in your operation rather than as a replacement for your existing methodology, read the prior opinion piece on why synthetic research needs evidence grounding and the methodology walkthrough on how evidence grounding works. The screening layer only delivers value when the synthetic side is grounded in real evidence rather than improvising plausible-sounding output.

Common questions

No, and treating it that way is the most common way teams get burned. Synthetic concept testing is a screening layer that runs before panel research, not a replacement for it. It excels at early-gate decisions: which of 20 concepts deserve further investment, which positioning angle resonates, where a concept is weakest, why a concept fails. It does not produce statistically-bounded point estimates, can't substantiate regulated claims, and isn't designed to be the last word at a launch gate. The teams getting the most value run a hybrid: synthetic for screening and iteration, panel for the surviving concepts at the launch gate.

Not currently. Candor's synthetic personas react to text descriptions of concepts, not to visual stimuli. If your concept lives in a Figma file, a packaging render, a video storyboard, or an interactive prototype, that's a real-respondent job until Candor adds multimodal review (a candidate future direction, not a shipping capability). For text-describable concepts (value propositions, positioning angles, pricing structures, benefit framings, campaign ideas you can describe in a paragraph), synthetic concept testing works cleanly.

There's no hard ceiling. Each synthetic concept test runs in roughly one to two hours of platform time, most of which is background pipeline work. Real concept volume is project-driven: a single project typically generates 10 to 50 concepts in some form, and most teams run multiple such projects a year. Across a year, that stacks into hundreds of concepts tested at the synthetic layer. The pattern holds regardless of company shape. Startups iterating on features, value propositions, and positioning; consultancies running strategy engagements; large companies launching new products in-market all generate concept volume in this range. The bottleneck moves from "can I afford to test this concept" to "can I generate enough concepts worth testing" — which is a healthier place for an insights operation to be.

Qualitative-style depth on which concepts resonate, which fall flat, and the reasoning behind both. Per-persona reactions across 8 to 16 evidence-grounded personas, structured into a synthesis report with extracted signals, themes, archetype-level reactions, identified tensions, and opportunity framing. The output is directional rather than statistical. You get clear understanding of how the concept lands, why it lands that way, and which segments respond differently. You do not get "X% of category buyers prefer Concept A at 95% confidence" — that number requires a panel.

The pattern is consistent regardless of exact volume. Concept volume is project-driven and stacks fast: a typical project generates 10 to 50 concepts across upfront screening, iteration, and finalist rounds, and most teams run multiple such projects a year, reaching hundreds of concepts tested in some form. Traditional panel research absorbs only the finalists; the rest get killed without respondent signal. Routing the upfront concepts through synthetic screening first, then concentrating panel rounds on the surviving finalists, doesn't change per-round panel cost (typically $23K to $50K). It changes what those rounds buy: richer studies on validated concepts rather than thin coverage across un-screened ones. Total research coverage rises significantly because more concepts get tested at all rather than being killed on intuition.

Candor is in development.

Be the first to know when it launches.

No spam. Just a note when Candor is ready. Powered by Highline Beta.