Team experiment registry — snapshot 2026-05-08

Mirrored from the team Google Sheet (1D18voT5Yxr4BxR4qowNmS91qR4q_4F7VmRUTCNJlVwk) on 2026-05-08. Source CSVs alongside this file in 2026-05-08-team-experiments/.

What changed in this snapshot: added Coop% @5 columns for both Qwen3.5-9B and Qwen3.5-35B-A3B-SFT to the Task Inventory tab, based on the 4-VM K=5 run on coop_pass_at_k. Other tabs are unchanged from the source.

Diff vs prior snapshot: First snapshot — no prior to diff against.

Sheet definitions & conventions
  1. Data Registry — every training dataset that exists. One row per dataset.
  2. Training Runs — every fine-tuning run. One row per checkpoint. References Data ID.
  3. Eval Results — every evaluation. One row per (checkpoint × eval set × mode). References Run ID.
  4. Task Inventory (optional) — characterization of CooperBench tasks (solo vs coop pass rate).

ID conventions

Filling rules

Caveats & provenance (Task Inventory)

Trajectories