targeted synthetic data turns failures into successes
We find where the policy fails, generate synthetic demos in those regions, and improve Square_D0 with the same BC-RNN architecture.
- 74.0% Best measured result Mild KDE adversarial sampling on 95 generated demos
- +4.5pp Lift over seed baseline 74.0% vs 69.5% with the same 1000 seed demos
- +2.5pp Lift over uniform control Matched 95-demo MimicGen budget
Before and after
Same seed, same scene, same BC-RNN architecture. The seed policy is trained on 1000 demos; the adversarial policy adds 95 targeted synthetic demos.
Benchmark
Square_D0 success rate across 200 rollouts per policy, all trained from the same seed dataset.
| Policy | Training data | Success |
|---|---|---|
| Seed baseline | 1000 seed demos | 69.5% |
| Uniform control | 1000 seed + 95 uniform MimicGen demos | 71.5% |
| Mild KDE adversarial | 1000 seed + 95 adversarial MimicGen demos | 74.0% |
Failure modes
Counts across the same 200 rollouts per policy. The adversarial policy converts placement near-misses and never-grasped failures into successes.
| Outcome | Seed | Uniform | Adversarial |
|---|---|---|---|
| Successes | 139 | 143 | 148 |
| Placement near misses | 45 | 27 | 28 |
| Never grasped | 13 | 27 | 22 |
| Dropped | 3 | 3 | 1 |