← back

targeted synthetic data turns failures into successes

We find where the policy fails, generate synthetic demos in those regions, and improve Square_D0 with the same BC-RNN architecture.

Before and after

Same seed, same scene, same BC-RNN architecture. The seed policy is trained on 1000 demos; the adversarial policy adds 95 targeted synthetic demos.

Before
Seed policy misses the grasp. 1000 seed demos · never grasped
After
Synthetic adversarial data solves it. 1000 seed + 95 synthetic demos · success

Benchmark

Square_D0 success rate across 200 rollouts per policy, all trained from the same seed dataset.

Policy Training data Success
Seed baseline 1000 seed demos 69.5%
Uniform control 1000 seed + 95 uniform MimicGen demos 71.5%
Mild KDE adversarial 1000 seed + 95 adversarial MimicGen demos 74.0%

Failure modes

Counts across the same 200 rollouts per policy. The adversarial policy converts placement near-misses and never-grasped failures into successes.

Outcome Seed Uniform Adversarial
Successes 139 143 148
Placement near misses 45 27 28
Never grasped 13 27 22
Dropped 3 3 1