GamePose: Self-Supervised 3D Human Pose Estimation from Multi-View Game Videos

Zhou, Yang; Guo, Tianze; Xu, Hao; Wei, Xilei; Xu, Lang; Tang, Xiangjun; Yang, Sipeng; Kou, Qilong; Jin, Xiaogang

GamePose: Self-Supervised 3D Human Pose Estimation from Multi-View Game Videos

Files

pg20241316.pdf (6.97 MB)

Date

2024

Authors

Zhou, Yang
Guo, Tianze
Xu, Hao
Wei, Xilei
Xu, Lang
Tang, Xiangjun
Yang, Sipeng
Kou, Qilong
Jin, Xiaogang

Publisher

The Eurographics Association

Abstract

Recovering 3D character animations from published games is crucial when original animation assets are lost. One solution for recovering such animation assets is to use 3D human pose estimation with single or multiple views. Our insight is to preserve the ease of use of single-view estimation while enhancing its accuracy by leveraging information from multi-view videos. It is a difficult task that requires explicitly modelling the correlation of multi-view input to achieve superior accuracy and converting the multi-view correlation model to a single-view model without impacting the accuracy, which both are unresolved. To this end, we propose a novel self-supervised 3D pose estimation framework that models the correlation of multi-view input during training and can predict highly accurate estimation for single-view input. Our framework consists of two main components: the Single-View Module (SM) and the Cross-View Module (CM). The SM predicts approximate 3D poses and extracts features from a single viewpoint, while the CM enhances the learning process by modelling correlations across multiple viewpoints. This design facilitates effective self-distillation, improving the accuracy of single-view estimations. As a result, our method supports highly accurate inference with both multi-view data and single-view data. We validate our method on 3D human pose estimation benchmarks and create a new dataset using Mixamo assets to demonstrate its applicability in gaming scenarios. Extensive experiments show that our approach outperforms state-of-the-art methods in self-supervised learning scenarios.

CCS Concepts: Computing methodologies → Motion capture

        @inproceedings{10.2312:pg.20241316
,
booktitle = {Pacific Graphics Conference Papers and Posters
},
editor = {Chen, Renjie and 
Ritschel, Tobias and 
Whiting, Emily
},
title = {{GamePose: Self-Supervised 3D Human Pose Estimation from Multi-View Game Videos
}},
author = {Zhou, Yang and 
Guo, Tianze and 
Xu, Hao and 
Wei, Xilei and 
Xu, Lang and 
Tang, Xiangjun and 
Yang, Sipeng and 
Kou, Qilong and 
Jin, Xiaogang
},
year = {2024
},
publisher = {The Eurographics Association
},
ISBN = {978-3-03868-250-9
},
DOI = {10.2312/pg.20241316
}
}

URI

https://doi.org/10.2312/pg.20241316
https://diglib.eg.org/handle/10.2312/pg20241316

Collections

PG2024 Conference Papers and Posters

Full item page