panel run Reference — synthpanel · Run synthetic focus groups with any LLM

Multi-model — `--models`, `--blend`

--models has two distinct shapes, selected by whether the spec contains a colon (:). It is mutually exclusive with --model.

Weighted per-persona assignment

A spec with colons splits the panel across models in the given ratio. Weights are normalized — a:2,b:3 and a:0.4,b:0.6 behave identically. Per-persona model: fields in the YAML always win over the --models assignment.

synthpanel panel run \
  --personas examples/personas.yaml \
  --instrument examples/survey.yaml \
  --models 'haiku:0.5,gemini-2.5-flash:0.5'

With 6 personas and the spec above, 3 answer on haiku and 3 on Gemini. 7 personas → 3 + 4 (the last model absorbs the remainder). The assignment is fully deterministic and printed to stderr before the run:

Model assignment:
  Maya Chen        → haiku
  Derek Washington → haiku
  Priya Patel      → haiku
  Sam Torres       → gemini-2.5-flash
  Julia Hoffman    → gemini-2.5-flash
  Omar Rashid      → gemini-2.5-flash
Totals: haiku=3, gemini-2.5-flash=3

Ensemble mode

A spec without colons runs the full panel once per model. Combine with --blend to weight-average the response distributions across models for each question.

synthpanel panel run \
  --personas examples/personas.yaml \
  --instrument pricing-discovery \
  --models 'haiku,sonnet,gemini-2.5-flash' \
  --blend

With 6 personas and 3 models, this runs 18 sessions total. --blend then computes weighted-average distributions using the weights in the spec (equal weights when no : is given). Use haiku:0.5,sonnet:0.3,gemini:0.2 (with colons, in ensemble mode with --blend) to apply custom blend weights.

Flag	Default	Description
`--models SPEC`	—	Multi-model spec. `a:w,b:w` = weighted split; `a,b,c` = ensemble (full panel per model).
`--blend`	off	Weight-average distributions across ensemble models. Requires `--models`.

Persona variants — `--variants`, `--personas-merge`

--variants N

Generate N LLM-perturbed variants per persona before running the panel. The original personas are replaced by N × M variants (M = number of original personas). Each variant perturbs one axis — trait swap, mood context, demographic shift, or background rephrase — via a single LLM call per variant.

synthpanel panel run \
  --personas small-panel.yaml \    # 5 personas
  --instrument survey.yaml \
  --variants 4                     # → 20 total panelists

Useful for stress-testing whether results are stable across plausible persona perturbations, or for expanding a small hand-crafted panel into a larger synthetic one.

--personas-merge

Append additional YAML files to --personas. Repeatable. Files are merged in order; later entries override earlier ones on name collision (controlled by --personas-merge-on-collision).

synthpanel panel run \
  --personas base-panel.yaml \
  --personas-merge extra-personas.yaml \
  --personas-merge regional-overrides.yaml \
  --instrument survey.yaml

Flag	Default	Description
`--variants N`	—	Generate N LLM-perturbed variants per persona. Panel size becomes N × original count.
`--personas-merge PATH`	—	Merge additional persona YAML into the panel. Repeatable.
`--personas-merge-on-collision`	`dedup`	`dedup` (later file wins, warning emitted), `error` (abort on any collision).

Structured extraction — `--extract-schema`

Unlike --schema (which forces structured-only output and replaces free text), --extract-schema preserves the full free-text response and adds a second LLM call that extracts structured data into an extraction key alongside the raw response.

synthpanel panel run \
  --personas panel.yaml \
  --instrument feedback-survey.yaml \
  --extract-schema '{"type":"object","properties":{"sentiment":{"type":"string","enum":["positive","neutral","negative"]},"themes":{"type":"array","items":{"type":"string"}}}}'

The value can be a JSON file path (extract.json) or an inline JSON string. The extraction runs after the panelist answers and is stored under extraction in the result JSON. Use this when you need both the qualitative narrative and a structured signal you can aggregate (e.g. sentiment counts, theme frequency).

Flag	Description
`--extract-schema SCHEMA`	JSON Schema for post-hoc extraction. Preserves full free-text; adds structured `extraction` key. File path or inline JSON.
`--schema SCHEMA`	JSON Schema for structured-only output. Replaces free-text responses. Use when you want only structured data.

Synthesis strategy — `--synthesis-strategy`, `--synthesis-auto-escalate`

The synthesis step aggregates all panelist responses into a final summary. For small panels the default auto strategy concatenates everything into one LLM call. For large panels (n ≥ 50) it automatically switches to map-reduce.

Strategy	How it works	When to use
`single`	All responses concatenated into one call.	Small panels (n<50). Cheapest and most coherent.
`map-reduce`	One summary call per question in parallel, then one reduce call across summaries.	Large panels where responses overflow the synthesis model's context.
`auto` (default)	Pre-flight token estimate picks `single` or `map-reduce`.	Most runs. Falls back to `single` whenever estimate fits context.

--synthesis-auto-escalate

In map-reduce mode, a single question whose responses overflow context normally partitions panelists into sub-batches for an inner reduce. With --synthesis-auto-escalate, instead of sub-batching, that question's map call is retried on a larger-context model (gemini-2.5-flash-lite, 1 M ctx) and a warning is emitted. Use this to preserve single-model semantics when sub-batch results are unacceptable.

Flag	Default	Description
`--synthesis-strategy`	`auto`	Aggregation strategy: `single`, `map-reduce`, or `auto`.
`--synthesis-auto-escalate`	off	In map-reduce, retry overflowing question maps on a large-context model instead of sub-batching.

Rate limiting — `--rate-limit-rps`

By default the orchestrator fires one concurrent request per panelist (bounded by --max-concurrent). --rate-limit-rps adds a token-bucket that smooths bursts — useful when a provider enforces a requests-per-second limit on top of a concurrency cap.

synthpanel panel run \
  --personas large-panel.yaml \
  --instrument survey.yaml \
  --max-concurrent 20 \
  --rate-limit-rps 5.0          # at most 5 new requests/sec

Accepts fractional values: 0.5 means one request every two seconds. Works across all providers on the same client.

Flag	Default	Description
`--max-concurrent N`	unbounded	Cap concurrent in-flight LLM requests across the panel.
`--rate-limit-rps RPS`	—	Token-bucket rate cap in requests per second. Accepts fractional values.

Checkpointing & resume — `--checkpoint-dir`, `--resume`

For long or expensive runs, checkpointing writes per-panelist progress to disk so you can resume after an interruption without re-running completed panelists.

Starting a checkpointed run

synthpanel panel run \
  --personas panel.yaml \
  --instrument survey.yaml \
  --checkpoint-dir /tmp/runs \    # opts in; default: ~/.synthpanel/checkpoints
  --checkpoint-every 10           # flush every 10 completed panelists

The run id is printed to stderr. Each checkpoint is written to <checkpoint-dir>/<run-id>/state.json. Omitting --checkpoint-dir runs without snapshots.

Resuming

synthpanel panel run --resume <run-id>

When --personas and --instrument are omitted they are recovered from the checkpoint's saved CLI args. The resume refuses to start if the current config (model, temperature, questions) does not match the checkpointed config — pass --allow-drift to downgrade this to a warning and continue (statistically inconsistent results).

Flag	Default	Description
`--checkpoint-dir PATH`	`~/.synthpanel/checkpoints`	Directory for per-run snapshots. Setting this opts in to checkpointing.
`--checkpoint-every N`	25	Flush a checkpoint every N completed panelists.
`--resume RUN_ID`	—	Resume a checkpointed run. Skips already-completed panelists.
`--allow-drift`	off	With `--resume`: downgrade config-mismatch errors to warnings.
`--force-overwrite`	off	Replace existing checkpoint state for the same run id instead of refusing.

Convergence & auto-stop — `--convergence-*`

At large-n scales most of the token budget goes toward diminishing returns — once a response distribution has stabilized, additional panelists add <2% signal. The convergence feature surfaces that signal live so you can see when distributions stabilize and optionally halt early.

Convergence tracking applies only to bounded question types (Likert, yes/no, pick-one, any question with a JSON Schema enum). Free-text questions are not tracked.

synthpanel panel run \
  --personas panel.yaml \
  --instrument pricing-discovery \
  --convergence-check-every 20 \  # compute JSD every 20 panelists
  --auto-stop \                    # halt once all questions converge
  --convergence-eps 0.02 \         # JSD threshold (default: 0.02)
  --convergence-min-n 50 \         # don't stop before n=50
  --convergence-m 3                # 3 consecutive checks below eps

When the run finishes the JSON output contains a convergence key:

{
  "convergence": {
    "final_n": 487,
    "auto_stopped": true,
    "overall_converged_at": 473,  // run this many next time
    "per_question": {
      "pricing": { "converged_at": 473, "curve": [...] },
      "tier_preference": { "converged_at": 410, "curve": [...] }
    }
  }
}

overall_converged_at answers "how many panelists did you actually need?" — use it to right-size future runs.

Flag	Default	Description
`--convergence-check-every N`	off	Compute running JSD every N completing panelists. Setting this opts in.
`--auto-stop`	off	Halt once all tracked questions converge. Requires `--convergence-check-every`.
`--convergence-eps FLOAT`	0.02	JSD threshold below which a question is treated as converged.
`--convergence-min-n N`	50	Minimum panelists before `--auto-stop` is allowed to fire.
`--convergence-m N`	3	Consecutive checks below epsilon required to declare convergence.
`--convergence-log PATH`	stderr	Write each convergence check as a JSON line to PATH (for dashboards).
`--convergence-baseline DATASET:Q`	—	Load a human baseline convergence curve from SynthBench and include in the report. Requires `pip install 'synthpanel[convergence]'`.

SynthBench — `--calibrate-against`, `--best-model-for`, `--submit-to-synthbench`

--calibrate-against

Attaches inline calibration to a panel run by comparing the extracted response distribution against a published human baseline. Forces convergence tracking; pair with --convergence-check-every to control cadence (it is never implicit). v1 supports gss and ntia.

Auto-derives a pick-one extractor schema from the baseline when option count ≤ 5; otherwise pass --extract-schema explicitly. Requires pip install 'synthpanel[convergence]'.

synthpanel panel run \
  --personas panel.yaml \
  --instrument happiness-probe \
  --calibrate-against gss:HAPPY \
  --convergence-check-every 20

--best-model-for

Consult the SynthBench leaderboard and use the top-ranked model for the given topic instead of the default. A recommendation line is printed to stderr before the run. Overrides --model; mutually exclusive with --models.

synthpanel panel run \
  --personas panel.yaml \
  --instrument survey.yaml \
  --best-model-for 'political-opinion:globalopinionqa'

Format: TOPIC (ranked against the default dataset globalopinionqa), TOPIC:DATASET (specific dataset), or :DATASET (rank by SPS across the full dataset). The leaderboard is cached for 24 h at ~/.synthpanel/synthbench-cache.json.

--submit-to-synthbench

After a calibrated run, upload the per-question JSD and distributions to the SynthBench public leaderboard. Requires --calibrate-against and SYNTHBENCH_API_KEY in your environment (mint one at synthbench.org/account). First use prompts for consent, which is recorded locally and not re-prompted. Pass --yes to bypass the consent prompt for CI.

export SYNTHBENCH_API_KEY=sk_synthbench_...

synthpanel panel run \
  --personas panel.yaml \
  --instrument happiness-probe \
  --calibrate-against gss:HAPPY \
  --convergence-check-every 20 \
  --submit-to-synthbench

The submission step is non-fatal — a slow or rejecting SynthBench emits a warning but does not affect the panel run's exit code or output. See docs/synthbench-integration.md for the privacy model and what gets uploaded.

Flag	Default	Description
`--calibrate-against DATASET:Q`	—	Inline calibration vs a human baseline. v1 allowlist: `gss`, `ntia`. Requires `synthpanel[convergence]`.
`--best-model-for TOPIC[:DATASET]`	—	Use the SynthBench leaderboard top model for the topic. Overrides `--model`.
`--submit-to-synthbench`	off	Upload calibration results to SynthBench after the run. Requires `--calibrate-against` and `SYNTHBENCH_API_KEY`.
`--yes`	off	Bypass the SynthBench consent prompt (for CI/non-interactive use).

Multi-model — --models, --blend

Weighted per-persona assignment

Ensemble mode

Persona variants — --variants, --personas-merge

--variants N

--personas-merge

Structured extraction — --extract-schema

Synthesis strategy — --synthesis-strategy, --synthesis-auto-escalate

--synthesis-auto-escalate

Rate limiting — --rate-limit-rps

Checkpointing & resume — --checkpoint-dir, --resume

Starting a checkpointed run

Resuming

Convergence & auto-stop — --convergence-*

SynthBench — --calibrate-against, --best-model-for, --submit-to-synthbench

--calibrate-against

--best-model-for

--submit-to-synthbench

See also

Pack Calibration

MCP Server

Ensemble deep-dive

Convergence deep-dive