Docs · panel run

panel run reference

Full reference for synthpanel panel run. Covers the advanced flags for multi-model panels, persona variants, structured extraction, synthesis tuning, convergence auto-stop, checkpointing, and SynthBench integration.

Multi-model — --models, --blend

--models has two distinct shapes, selected by whether the spec contains a colon (:). It is mutually exclusive with --model.

Weighted per-persona assignment

A spec with colons splits the panel across models in the given ratio. Weights are normalized — a:2,b:3 and a:0.4,b:0.6 behave identically. Per-persona model: fields in the YAML always win over the --models assignment.

synthpanel panel run \
  --personas examples/personas.yaml \
  --instrument examples/survey.yaml \
  --models 'haiku:0.5,gemini-2.5-flash:0.5'

With 6 personas and the spec above, 3 answer on haiku and 3 on Gemini. 7 personas → 3 + 4 (the last model absorbs the remainder). The assignment is fully deterministic and printed to stderr before the run:

Model assignment:
  Maya Chen        → haiku
  Derek Washington → haiku
  Priya Patel      → haiku
  Sam Torres       → gemini-2.5-flash
  Julia Hoffman    → gemini-2.5-flash
  Omar Rashid      → gemini-2.5-flash
Totals: haiku=3, gemini-2.5-flash=3

Ensemble mode

A spec without colons runs the full panel once per model. Combine with --blend to weight-average the response distributions across models for each question.

synthpanel panel run \
  --personas examples/personas.yaml \
  --instrument pricing-discovery \
  --models 'haiku,sonnet,gemini-2.5-flash' \
  --blend

With 6 personas and 3 models, this runs 18 sessions total. --blend then computes weighted-average distributions using the weights in the spec (equal weights when no : is given). Use haiku:0.5,sonnet:0.3,gemini:0.2 (with colons, in ensemble mode with --blend) to apply custom blend weights.

Flag Default Description
--models SPEC Multi-model spec. a:w,b:w = weighted split; a,b,c = ensemble (full panel per model).
--blend off Weight-average distributions across ensemble models. Requires --models.

Persona variants — --variants, --personas-merge

--variants N

Generate N LLM-perturbed variants per persona before running the panel. The original personas are replaced by N × M variants (M = number of original personas). Each variant perturbs one axis — trait swap, mood context, demographic shift, or background rephrase — via a single LLM call per variant.

synthpanel panel run \
  --personas small-panel.yaml \    # 5 personas
  --instrument survey.yaml \
  --variants 4                     # → 20 total panelists

Useful for stress-testing whether results are stable across plausible persona perturbations, or for expanding a small hand-crafted panel into a larger synthetic one.

--personas-merge

Append additional YAML files to --personas. Repeatable. Files are merged in order; later entries override earlier ones on name collision (controlled by --personas-merge-on-collision).

synthpanel panel run \
  --personas base-panel.yaml \
  --personas-merge extra-personas.yaml \
  --personas-merge regional-overrides.yaml \
  --instrument survey.yaml
Flag Default Description
--variants N Generate N LLM-perturbed variants per persona. Panel size becomes N × original count.
--personas-merge PATH Merge additional persona YAML into the panel. Repeatable.
--personas-merge-on-collision dedup dedup (later file wins, warning emitted), error (abort on any collision).

Structured extraction — --extract-schema

Unlike --schema (which forces structured-only output and replaces free text), --extract-schema preserves the full free-text response and adds a second LLM call that extracts structured data into an extraction key alongside the raw response.

synthpanel panel run \
  --personas panel.yaml \
  --instrument feedback-survey.yaml \
  --extract-schema '{"type":"object","properties":{"sentiment":{"type":"string","enum":["positive","neutral","negative"]},"themes":{"type":"array","items":{"type":"string"}}}}'

The value can be a JSON file path (extract.json) or an inline JSON string. The extraction runs after the panelist answers and is stored under extraction in the result JSON. Use this when you need both the qualitative narrative and a structured signal you can aggregate (e.g. sentiment counts, theme frequency).

Flag Description
--extract-schema SCHEMA JSON Schema for post-hoc extraction. Preserves full free-text; adds structured extraction key. File path or inline JSON.
--schema SCHEMA JSON Schema for structured-only output. Replaces free-text responses. Use when you want only structured data.

Synthesis strategy — --synthesis-strategy, --synthesis-auto-escalate

The synthesis step aggregates all panelist responses into a final summary. For small panels the default auto strategy concatenates everything into one LLM call. For large panels (n ≥ 50) it automatically switches to map-reduce.

Strategy How it works When to use
single All responses concatenated into one call. Small panels (n<50). Cheapest and most coherent.
map-reduce One summary call per question in parallel, then one reduce call across summaries. Large panels where responses overflow the synthesis model's context.
auto (default) Pre-flight token estimate picks single or map-reduce. Most runs. Falls back to single whenever estimate fits context.

--synthesis-auto-escalate

In map-reduce mode, a single question whose responses overflow context normally partitions panelists into sub-batches for an inner reduce. With --synthesis-auto-escalate, instead of sub-batching, that question's map call is retried on a larger-context model (gemini-2.5-flash-lite, 1 M ctx) and a warning is emitted. Use this to preserve single-model semantics when sub-batch results are unacceptable.

Flag Default Description
--synthesis-strategy auto Aggregation strategy: single, map-reduce, or auto.
--synthesis-auto-escalate off In map-reduce, retry overflowing question maps on a large-context model instead of sub-batching.

Rate limiting — --rate-limit-rps

By default the orchestrator fires one concurrent request per panelist (bounded by --max-concurrent). --rate-limit-rps adds a token-bucket that smooths bursts — useful when a provider enforces a requests-per-second limit on top of a concurrency cap.

synthpanel panel run \
  --personas large-panel.yaml \
  --instrument survey.yaml \
  --max-concurrent 20 \
  --rate-limit-rps 5.0          # at most 5 new requests/sec

Accepts fractional values: 0.5 means one request every two seconds. Works across all providers on the same client.

Flag Default Description
--max-concurrent N unbounded Cap concurrent in-flight LLM requests across the panel.
--rate-limit-rps RPS Token-bucket rate cap in requests per second. Accepts fractional values.

Checkpointing & resume — --checkpoint-dir, --resume

For long or expensive runs, checkpointing writes per-panelist progress to disk so you can resume after an interruption without re-running completed panelists.

Starting a checkpointed run

synthpanel panel run \
  --personas panel.yaml \
  --instrument survey.yaml \
  --checkpoint-dir /tmp/runs \    # opts in; default: ~/.synthpanel/checkpoints
  --checkpoint-every 10           # flush every 10 completed panelists

The run id is printed to stderr. Each checkpoint is written to <checkpoint-dir>/<run-id>/state.json. Omitting --checkpoint-dir runs without snapshots.

Resuming

synthpanel panel run --resume <run-id>

When --personas and --instrument are omitted they are recovered from the checkpoint's saved CLI args. The resume refuses to start if the current config (model, temperature, questions) does not match the checkpointed config — pass --allow-drift to downgrade this to a warning and continue (statistically inconsistent results).

Flag Default Description
--checkpoint-dir PATH ~/.synthpanel/checkpoints Directory for per-run snapshots. Setting this opts in to checkpointing.
--checkpoint-every N 25 Flush a checkpoint every N completed panelists.
--resume RUN_ID Resume a checkpointed run. Skips already-completed panelists.
--allow-drift off With --resume: downgrade config-mismatch errors to warnings.
--force-overwrite off Replace existing checkpoint state for the same run id instead of refusing.

Convergence & auto-stop — --convergence-*

At large-n scales most of the token budget goes toward diminishing returns — once a response distribution has stabilized, additional panelists add <2% signal. The convergence feature surfaces that signal live so you can see when distributions stabilize and optionally halt early.

Convergence tracking applies only to bounded question types (Likert, yes/no, pick-one, any question with a JSON Schema enum). Free-text questions are not tracked.

synthpanel panel run \
  --personas panel.yaml \
  --instrument pricing-discovery \
  --convergence-check-every 20 \  # compute JSD every 20 panelists
  --auto-stop \                    # halt once all questions converge
  --convergence-eps 0.02 \         # JSD threshold (default: 0.02)
  --convergence-min-n 50 \         # don't stop before n=50
  --convergence-m 3                # 3 consecutive checks below eps

When the run finishes the JSON output contains a convergence key:

{
  "convergence": {
    "final_n": 487,
    "auto_stopped": true,
    "overall_converged_at": 473,  // run this many next time
    "per_question": {
      "pricing": { "converged_at": 473, "curve": [...] },
      "tier_preference": { "converged_at": 410, "curve": [...] }
    }
  }
}

overall_converged_at answers "how many panelists did you actually need?" — use it to right-size future runs.

Flag Default Description
--convergence-check-every N off Compute running JSD every N completing panelists. Setting this opts in.
--auto-stop off Halt once all tracked questions converge. Requires --convergence-check-every.
--convergence-eps FLOAT 0.02 JSD threshold below which a question is treated as converged.
--convergence-min-n N 50 Minimum panelists before --auto-stop is allowed to fire.
--convergence-m N 3 Consecutive checks below epsilon required to declare convergence.
--convergence-log PATH stderr Write each convergence check as a JSON line to PATH (for dashboards).
--convergence-baseline DATASET:Q Load a human baseline convergence curve from SynthBench and include in the report. Requires pip install 'synthpanel[convergence]'.

SynthBench — --calibrate-against, --best-model-for, --submit-to-synthbench

--calibrate-against

Attaches inline calibration to a panel run by comparing the extracted response distribution against a published human baseline. Forces convergence tracking; pair with --convergence-check-every to control cadence (it is never implicit). v1 supports gss and ntia.

Auto-derives a pick-one extractor schema from the baseline when option count ≤ 5; otherwise pass --extract-schema explicitly. Requires pip install 'synthpanel[convergence]'.

synthpanel panel run \
  --personas panel.yaml \
  --instrument happiness-probe \
  --calibrate-against gss:HAPPY \
  --convergence-check-every 20

--best-model-for

Consult the SynthBench leaderboard and use the top-ranked model for the given topic instead of the default. A recommendation line is printed to stderr before the run. Overrides --model; mutually exclusive with --models.

synthpanel panel run \
  --personas panel.yaml \
  --instrument survey.yaml \
  --best-model-for 'political-opinion:globalopinionqa'

Format: TOPIC (ranked against the default dataset globalopinionqa), TOPIC:DATASET (specific dataset), or :DATASET (rank by SPS across the full dataset). The leaderboard is cached for 24 h at ~/.synthpanel/synthbench-cache.json.

--submit-to-synthbench

After a calibrated run, upload the per-question JSD and distributions to the SynthBench public leaderboard. Requires --calibrate-against and SYNTHBENCH_API_KEY in your environment (mint one at synthbench.org/account). First use prompts for consent, which is recorded locally and not re-prompted. Pass --yes to bypass the consent prompt for CI.

export SYNTHBENCH_API_KEY=sk_synthbench_...

synthpanel panel run \
  --personas panel.yaml \
  --instrument happiness-probe \
  --calibrate-against gss:HAPPY \
  --convergence-check-every 20 \
  --submit-to-synthbench

The submission step is non-fatal — a slow or rejecting SynthBench emits a warning but does not affect the panel run's exit code or output. See docs/synthbench-integration.md for the privacy model and what gets uploaded.

Flag Default Description
--calibrate-against DATASET:Q Inline calibration vs a human baseline. v1 allowlist: gss, ntia. Requires synthpanel[convergence].
--best-model-for TOPIC[:DATASET] Use the SynthBench leaderboard top model for the topic. Overrides --model.
--submit-to-synthbench off Upload calibration results to SynthBench after the run. Requires --calibrate-against and SYNTHBENCH_API_KEY.
--yes off Bypass the SynthBench consent prompt (for CI/non-interactive use).