Human-human motion generation is essential for understanding humans as social beings. Current methods fall into two main categories: single-person-based methods and separate modeling-based methods. To delve into this field, we abstract the overall generation process into a general framework MetaMotion, which consists of two phases: temporal modeling and interaction mixing. For temporal modeling, the single-person-based methods concatenate two people into a single one directly, while the separate modeling-based methods skip the modeling of interaction sequences. The inadequate modeling described above resulted in sub-optimal performance and redundant model parameters. In this paper, we introduce TIMotion (Temporal and Interactive Modeling), an efficient and effective framework for human-human motion generation. Specifically, we first propose Causal Interactive Injection to model two separate sequences as a causal sequence leveraging the temporal and causal properties. Then we present Role-Evolving Scanning to adjust to the change in the active and passive roles throughout the interaction. Finally, to generate smoother and more rational motion, we design Localized Pattern Amplification to capture short-term motion patterns. Extensive experiments on InterHuman and InterX demonstrate that our method achieves superior performance. The project code will be released upon acceptance.
Quantitative evaluation on the InterHuman test set.
Quantitative evaluation on the InterX test set.
We compare against Intergen for human-human motion generation. The synthesized motion by our proposed method are more consistent with the description.
two persons perform a synchronized dancing move together.
one person lifts the magazine in front of themselves with both hands, while the other person kicks up their right leg to assault the magazine.
one person embraces the other person's back with both arms, while the other person reciprocates the gesture.
two individuals are sparring with each other.
@article{wang2024tim
title={Temporal and Interactive Modeling for Efficient Human-Human Motion Generation},
author={Yabiao Wang and Shuo Wang and Jiangning Zhang and Ke Fan and Jiafu Wu and Zhengkai Jiang and Yong Liu},
year={2024},
eprint={arXiv:2408.17135},
archivePrefix={arXiv},
primaryClass={cs.CV}
}