MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation

Taozheng Yang*, Ya Jing*, Hongtao Wu*, Jiafeng Xu*, Kuankuan Sima,
Guangzeng Chen, Qie Sima, Tao Kong

ByteDance Research    *Equal Contribution


We present a novel method for mobile manipulators to perform multiple contact-rich manipulation tasks. While learning-based methods have the potential to generate actions in an end-to-end manner, they often suffer from insufficient action accuracy and robustness against noise. On the other hand, classical control-based methods can enhance system robustness, but at the cost of extensive parameter tuning. To address these challenges, we present MOMA-Force, a visualforce imitation method that seamlessly combines representation learning for perception, imitation learning for complex motion generation, and admittance whole-body control for system robustness and controllability. MOMA-Force enables a mobile manipulator to learn multiple complex contact-rich tasks with high success rates and small contact forces. In a real household setting, our method outperforms baseline methods in terms of task success rates. Moreover, our method achieves smaller contact forces and smaller force variances compared to baseline methods without force imitation. Overall, we offer a promising approach for efficient and robust mobile manipulation in the real world.


Visual-Force Imitation

The observation images of the expert data are converted to representation vectors by a visual encoder. In the rollout, the observation image is converted to a representation vector with the same visual encoder. The action and target wrench are predicted by retrieving the action and wrench of the expert data with top-1 similarity of the representations. We use admittance whole-body control (a-WBC) to control the robot.


MOMA-Force achieves the best average success rate among all the comparing baseline methods. Compared to BC, MOMA-Force obtains a performance gain of 53.33% even though it performs multi-task learning while BC performs single-task learning. Without force imitation, the success rate of MOMA-Force w/o FC decreases by 28.3% on average. This method mainly struggles with tasks involving translation and rotation.

With force imitation, the average absolute contact force and torque of MOMA-Force in x, y, and z-axes are all smaller compared to those of the baseline methods without force imitation. In addition, MOMA-Force has a smaller force variance, indicating less oscillation and more stable contact during the rollout.

Behavior Cloning (BC)

MOMA-Force w/o FC



      author    = {Yang, Taozheng and Jing, Ya and Wu, Hongtao and Xu, Jiafeng and Sima, Kuankuan and Chen,
                  Guangzeng and Sima, Qie and Kong, Tao},
      title     = {MOMA-Force: Visual-Force Imitation for Real-World Mobile Manipulation},
      booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
      year      = {2023}