← back

targeted synthetic data finds policy failure regions

We map where robot policies fail, generate synthetic demos in those regions, and test whether the data improves training. Square_D1 shows a +7.0pp transfer result; Square_D0 is high-variance across five seeds.

+7.0pp Square_D1 transfer 55.0% to 62.0% over 200 rollouts
5 seeds Square_D0 replication Original single-seed lift did not reproduce
+3.1pp Targeted vs uniform 70.6% vs 67.5%; baseline remains 73.9%

Before and after

The videos are an illustrative Square_D0 scene: same seed, same scene, same BC-RNN architecture. The aggregate scientific read comes from the five-seed table below.

Before

Seed policy misses the grasp. 1000 seed demos · never grasped

After

Synthetic adversarial data solves this scene. 1000 seed + 95 synthetic demos · success

Benchmark

Square_D0 success rate across five independent training seeds with 200 rollouts per seed. The original single-seed lift was not robust; targeted data beats uniform on mean, but not the seed baseline.

Policy	Training data	Mean success
Seed baseline	1000 seed demos	73.9% ± 9.0pp
Uniform control	1000 seed + 95 uniform MimicGen demos	67.5% ± 8.6pp
Mild KDE adversarial	1000 seed + 95 adversarial MimicGen demos	70.6% ± 8.8pp

Failure modes

Counts below are from the original 200-rollout Square_D0 example used for the visual demo. They explain the failure regions the sampler targets; they are not the five-seed aggregate result.

Outcome	Seed	Uniform	Adversarial
Successes	139	143	148
Placement near misses	45	27	28
Never grasped	13	27	22
Dropped	3	3	1