SoftMimicGen: Data Generation System for Scalable Robot Learning in Deformable Object Manipulation

Masoud Moghani, Mahdi Azizian, Animesh Garg, Yuke Zhu, Sean Huver, Ajay Mandlekar et al.

problem

large-scale robot datasets enable foundation model training but are expensive to collect via teleoperation. synthetic data generation (MimicGen, DexMimicGen) works well for rigid-body manipulation by exploiting object-centric invariance with a static reference frame, but this assumption breaks down for deformable objects. SoftMimicGen extends automated data generation to deformable object manipulation by using non-rigid registration to adapt source demonstrations to new object instances and contexts.

architecture

pipeline: (1) human teleoperator collects 1-10 source demonstrations per task. (2) SoftMimicGen generates thousands of demonstrations for novel object instances by extracting object-centric trajectories and adapting them via non-rigid registration that accounts for deformable object state changes. (3) visuomotor policies trained on generated data transfer zero-shot to real world, further improved by sim-real co-training.

non-rigid registration. unlike MimicGen which assumes a rigid reference frame, SoftMimicGen uses non-rigid registration techniques to transform source trajectories when the deformable object changes state (e.g., a rope in a different configuration, a towel folded differently). this handles the fact that deformable objects don’t maintain a fixed frame.

simulation suite: high-fidelity environments with real-time (or faster) simulation across: stuffed animals, rope, tissue, towel. manipulation behaviors include high-precision threading, dynamic whipping, folding, pick-and-place. four robot embodiments: single-arm (Franka), bimanual (Franka x2), humanoid, surgical robot.

policy training: behavioral cloning from generated demonstrations. policies take visual observations and output actions. no explicit registration needed at inference time.

training

source demonstrations: 1-10 per task (human teleoperation)
generated datasets: thousands of demonstrations per task
policy: visuomotor behavioral cloning (max likelihood)
simulation: Isaac Gym / MuJoCo with deformable object plugins
real-world transfer: zero-shot sim-to-real + sim-real co-training

evaluation

systematic evaluation across the full task suite with ablations on data generation parameters, object variations, and sim-to-real transfer. policies trained on SoftMimicGen data achieve:

high success rates in simulation across all task types and embodiments
zero-shot sim-to-real transfer on real-world deformable manipulation tasks
further improvement via sim-real co-training (combining small real-world dataset with large synthetic dataset)

the key result: synthetic deformable manipulation data, when generated with non-rigid registration, enables training policies that generalize to novel object instances and transfer to real robots.

reproduction guide

visit https://softmimicgen.github.io for simulation environments and pipeline code
collect 1-10 teleoperated demonstrations per task in simulation
run SoftMimicGen to generate thousands of demonstrations with varied object instances/contexts
train visuomotor BC policy on generated data
for real-world: deploy zero-shot or fine-tune with small real-world dataset via sim-real co-training
gotchas: deformable object simulation is computationally expensive. the non-rigid registration step is the key differentiator from rigid-body MimicGen. simulation fidelity matters for sim-to-real transfer

notes

this is the first systematic automated data generation pipeline for deformable object manipulation, which is a huge gap in the robot learning ecosystem
the non-rigid registration approach is a clean generalization of MimicGen’s rigid-body assumption. it should also work on rigid tasks (subsuming MimicGen), though the paper focuses on deformable cases
for bopi: deformable manipulation (cables, cloth, soft objects) is one of the hardest open problems in robotics. having a scalable data generation pipeline for it is significant. the sim-real co-training results suggest this is practical for real deployment
the four embodiments tested (single-arm, bimanual, humanoid, surgical) show the approach generalizes across morphologies, which is important for VLA foundation models that need multi-embodiment data