Towards Embodied AI with MuscleMimic: Unlocking full-body musculoskeletal motor learning at scale

Chengkun Li, Cheryl Wang, Bianca Ziliotto, Merkourios Simos, Jozsef Kovecses, Guillaume Durandau, Alexander Mathis et al.

problem

learning motor control for muscle-driven musculoskeletal models is hindered by two bottlenecks: the computational cost of biomechanically accurate simulation (seconds per timestep on CPU), and the scarcity of validated open full-body models. most prior work on humanoid control uses simplified rigid-body dynamics with joint torque actuators, ignoring the complexity of real muscles (activation dynamics, force-length-velocity relationships, tendon compliance). this limits transfer to real biomechanics research and prosthetics.

prior approaches:

DeepMimic (peng et al.): joint-torque-driven characters, no muscle dynamics. can’t study biomechanics.
AMP (peng et al.): adversarial motion priors for torque-driven humanoids. same limitation.
MyoSuite (macklin et al.): musculoskeletal but limited to isolated body parts (hand, arm). no full-body locomotion.
learning muscle control for biomechanics (various): small-scale, single-task, often hand-crafted reward functions.

architecture

flowchart LR
    mocap[SMPL motion capture] --> RT[retargeting pipeline]
    RT --> joint_ref[reference joint angles]
    joint_ref --> PPO[PPO policy]
    obs[joint state muscle state task info] --> PPO
    PPO --> exc[muscle excitations u_t]
    exc --> Sim[MuJoCo GPU simulator]
    Sim --> torque[Hill-type muscle forces]
    torque --> env[environment physics]
    env --> obs
    
    style Sim fill:#c4b8a6,color:#fff
    style PPO fill:#b09a84,color:#fff

simulator: uses MuJoCo with custom muscle actuators. each muscle modeled as a Hill-type actuator with:

activation dynamics: $\dot{a} = (u - a) / \tau_a$, where $u \in [0, 1]$ is the neural excitation and $\tau_a$ is the activation time constant
muscle force: $f = a \cdot f_l(l) \cdot f_v(v) \cdot f_{\max}$, where $f_l$ and $f_v$ are the force-length and force-velocity curves

two validated embodiments:

fixed-root upper-body ($N = 126$ muscles, $30$ DOF): for manipulation tasks (reaching, grasping, tool use)
full-body ($N = 416$ muscles, $76$ DOF): for locomotion (walking, running, jumping)

retargeting pipeline: maps SMPL motion capture data to musculoskeletal joint space via inverse kinematics, then extracts joint angle trajectories as reference for imitation.

policy: RL policy (PPO) that outputs muscle excitations $u_t$ per muscle per timestep. the observation space includes joint positions, velocities, muscle states (length, velocity, activation), and task-specific information (target positions, object states).

\[\pi_\theta(a_t \mid o_t) \rightarrow u_t \in [0, 1]^{N_{\text{muscles}}}\]

massively parallel GPU simulation: custom MuJoCo-based simulator runs $4096+$ environments in parallel on a single GPU, giving order-of-magnitude speedup over CPU simulation.

training

hardware: single NVIDIA GPU (type not specified, likely A100 or similar)
training time: “days” for a generalist policy across hundreds of motions (vs months on CPU)
algorithm: PPO with shared generalist policy across diverse motion clips
dataset: hundreds of diverse motions from CMU Mocap and AMASS, retargeted to both embodiments via SMPL pipeline
reward: combination of imitation reward (joint position tracking) and biomechanical regularization (muscle effort penalties, joint limit avoidance)
curriculum: starts with easy motions, progressively adds harder ones

evaluation

single generalist policy performance:

trained on hundreds of diverse motions, achieves robust performance across unseen motion categories
strong biomechanical validation against experimental walking/running data
mean correlation $r = 0.90$ for joint kinematics against ground-truth motion capture

muscle activation analysis:

key finding: kinematic imitation alone does NOT achieve physiological muscle fidelity
the policy can match joint trajectories while using physiologically implausible muscle coordination patterns
this suggests future work needs explicit biomechanical objectives (e.g., EMG matching, metabolic cost minimization), not just motion matching

embodiment-specific results:

upper-body (126 muscles): successful manipulation across diverse reaching and grasping tasks
full-body (416 muscles): locomotion (walking, running, turning) with physically plausible ground reaction forces

reproduction guide

clone the repo: git clone https://github.com/amathislab/musclemimic
install dependencies: MuJoCo, PyTorch, CUDA. the README should have exact version pins
download preprocessed motion datasets (links in repo)
for a quick test: train on a single locomotion motion first (e.g., walking). expect convergence in hours on a single GPU
for the full generalist: train across the full motion library. this takes days on a single GPU
the repo includes pre-trained checkpoints, musculoskeletal model files, and retargeted datasets - excellent reproducibility
known gotcha: muscle simulation is sensitive to timestep size. use the default timestep from the paper - smaller timesteps improve stability but slow training

notes

this is highly relevant for bopi’s embodied AI interests. the key contribution is making musculoskeletal simulation practical at scale through GPU parallelization. $416$ muscles is a real full-body model, not a toy.

the finding that kinematic imitation alone doesn’t produce physiological muscle fidelity is important and underappreciated. it means if you want to study real biomechanics (for prosthetics, rehabilitation, ergonomics), you need to go beyond motion matching and add explicit biomechanical objectives.

the open-source nature is excellent - code, models, datasets, and checkpoints all available. this makes it a strong foundation for future work.

open questions:

can you combine muscle-based control with learned tactile feedback for dexterous manipulation?
what happens if you add explicit EMG matching as a training objective? does it improve physiological fidelity?
can the retargeting pipeline handle motion capture from different body shapes/sizes?
how does the sample efficiency compare to torque-driven approaches? is the extra complexity of muscle dynamics worth it for robotics applications that don’t need biomechanical accuracy?