2026-03-30
SoftMimicGen: Data Generation System for Scalable Robot Learning in Deformable Object Manipulation
Masoud Moghani, Mahdi Azizian, Animesh Garg, Yuke Zhu, Sean Huver, Ajay Mandlekar et al.
problem
large-scale robot datasets enable foundation model training but are expensive to collect via teleoperation. synthetic data generation (MimicGen, DexMimicGen) works well for rigid-body manipulation by exploiting object-centric invariance with a static reference frame, but this assumption breaks down for deformable objects. SoftMimicGen extends automated data generation to deformable object manipulation by using non-rigid registration to adapt source demonstrations to new object instances and contexts.
architecture
pipeline: (1) human teleoperator collects 1-10 source demonstrations per task. (2) SoftMimicGen generates thousands of demonstrations for novel object instances by extracting object-centric trajectories and adapting them via non-rigid registration that accounts for deformable object state changes. (3) visuomotor policies trained on generated data transfer zero-shot to real world, further improved by sim-real co-training.
non-rigid registration. unlike MimicGen which assumes a rigid reference frame, SoftMimicGen uses non-rigid registration techniques to transform source trajectories when the deformable object changes state (e.g., a rope in a different configuration, a towel folded differently). this handles the fact that deformable objects don’t maintain a fixed frame.
simulation suite: high-fidelity environments with real-time (or faster) simulation across: stuffed animals, rope, tissue, towel. manipulation behaviors include high-precision threading, dynamic whipping, folding, pick-and-place. four robot embodiments: single-arm (Franka), bimanual (Franka x2), humanoid, surgical robot.
policy training: behavioral cloning from generated demonstrations. policies take visual observations and output actions. no explicit registration needed at inference time.
training
- source demonstrations: 1-10 per task (human teleoperation)
- generated datasets: thousands of demonstrations per task
- policy: visuomotor behavioral cloning (max likelihood)
- simulation: Isaac Gym / MuJoCo with deformable object plugins
- real-world transfer: zero-shot sim-to-real + sim-real co-training
evaluation
systematic evaluation across the full task suite with ablations on data generation parameters, object variations, and sim-to-real transfer. policies trained on SoftMimicGen data achieve:
- high success rates in simulation across all task types and embodiments
- zero-shot sim-to-real transfer on real-world deformable manipulation tasks
- further improvement via sim-real co-training (combining small real-world dataset with large synthetic dataset)
the key result: synthetic deformable manipulation data, when generated with non-rigid registration, enables training policies that generalize to novel object instances and transfer to real robots.
reproduction guide
- visit https://softmimicgen.github.io for simulation environments and pipeline code
- collect 1-10 teleoperated demonstrations per task in simulation
- run SoftMimicGen to generate thousands of demonstrations with varied object instances/contexts
- train visuomotor BC policy on generated data
- for real-world: deploy zero-shot or fine-tune with small real-world dataset via sim-real co-training
- gotchas: deformable object simulation is computationally expensive. the non-rigid registration step is the key differentiator from rigid-body MimicGen. simulation fidelity matters for sim-to-real transfer
notes
- this is the first systematic automated data generation pipeline for deformable object manipulation, which is a huge gap in the robot learning ecosystem
- the non-rigid registration approach is a clean generalization of MimicGen’s rigid-body assumption. it should also work on rigid tasks (subsuming MimicGen), though the paper focuses on deformable cases
- for bopi: deformable manipulation (cables, cloth, soft objects) is one of the hardest open problems in robotics. having a scalable data generation pipeline for it is significant. the sim-real co-training results suggest this is practical for real deployment
- the four embodiments tested (single-arm, bimanual, humanoid, surgical) show the approach generalizes across morphologies, which is important for VLA foundation models that need multi-embodiment data