Docs · Recommended Models
Recommended models
SynthPanel can consult the SynthBench public leaderboard to pick the best-ranked model for the kind of research you're running. This closes the credibility loop: scores measured on the bench drive defaults in the harness.
Quick start
# Use the top-ranked model for a specific topic synthpanel panel run \ --personas examples/personas.yaml \ --instrument pricing-discovery \ --best-model-for "Economy & Work" # Top-ranked model across a whole dataset (by SPS) synthpanel panel run ... --best-model-for ":globalopinionqa" # Topic within a non-default dataset synthpanel panel run ... --best-model-for "Technology & Digital Life:globalopinionqa"
Before the run, SynthPanel prints a recommendation line to stderr so you can cancel and override:
synthbench: best model for globalopinionqa/Economy & Work → claude-haiku-4-5-20251001 · SPS 0.850 · JSD 0.091 · n=100 · $0.032/100q · cached 0h ago · source=synthbench.org
How it works
-
1
On first use, SynthPanel fetches
https://synthbench.org/data/leaderboard.jsonand caches it at~/.synthpanel/synthbench-cache.jsonfor 24 hours. -
2
Entries are filtered to the requested
dataset(defaultglobalopinionqa), then ranked — by the named topic's score when a topic is given, otherwise by overall SPS. -
3
The top entry's
modelfield is resolved through SynthPanel's alias table (so"haiku"becomesclaude-haiku-4-5-20251001) and stamped onto--modelfor the rest of the pipeline.
Environment knobs
| Variable | Effect |
|---|---|
SYNTHPANEL_SYNTHBENCH_URL |
Override the fetch URL (useful for forks or air-gapped environments). |
SYNTHPANEL_SYNTHBENCH_OFFLINE=1 |
Never hit the network; use the cache if present, otherwise skip the recommendation. |
SYNTHPANEL_SYNTHBENCH_REFRESH=1 |
Bypass the 24h TTL and force a fresh fetch (ignores the cached ETag). |
SYNTH_PANEL_DATA_DIR |
Override the data dir where the cache lives. |
Graceful offline behaviour
- Stale cache + network error → stderr warning, use stale cache.
-
No cache + network error → stderr "synthbench unavailable", fall through to whatever
--modelor default was already in effect. - Empty entries after filter → same fall-through.
No recommendation is ever fatal. --best-model-for
is advisory: a bad network day won't take the panel down.
Use-case → top-ranked model
Snapshot from leaderboard.json on 2026-04-24.
The live data updates continuously — consult the CLI flag or
synthbench.org
for current picks.
| Use case | Dataset | Top SynthBench pick |
|---|---|---|
| General attitudes research | globalopinionqa |
claude-haiku-4-5-20251001 |
| Economic / workplace surveys | globalopinionqa |
claude-haiku-4-5-20251001 |
| Tech product discovery | globalopinionqa |
gemini-2.5-flash |
| Health & science messaging | globalopinionqa |
see --best-model-for "Health & Science" |
| International affairs / policy | globalopinionqa |
see CLI |
| Trust & wellbeing | globalopinionqa |
see CLI |
Caveats
-
Ensembles & product configs.
Some leaderboard entries are SynthPanel product configs
(
framework=product,is_ensemble=true). These aren't runnable as a plain--modelvalue, so SynthPanel falls back to the underlying base model inferred from the entry'sconfig_id. A stderr note records the substitution. -
Sparse topics.
When the top entry's
run_count < 3, a low-confidence warning is emitted. Treat those recommendations as suggestive rather than authoritative. -
Provider/model strings vary.
The leaderboard publishes the raw
modelstring the run used — sometimes a canonical id, sometimes a short alias. SynthPanel passes the string through the alias resolver so either shape works, but the raw value is preserved in the recommendation line asraw_model.
Scoping
--best-model-for picks a single model for the whole
panel. It is mutually exclusive with --models
(which splits the panel across multiple models) — mixing the two is
rejected at parse time.