Text-Guided Diffusion with Spectral Convolution for 3D Human Pose Estimation
Loading...
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association and John Wiley & Sons Ltd.
Abstract
Although significant progress has been made in monocular video-based 3D human pose estimation, existing methods lack guidance from fine-grained high-level prior knowledge such as action semantics and camera viewpoints, leading to significant challenges for pose reconstruction accuracy under scenarios with severely missing visual features, i.e., complex occlusion situations. We identify that the 3D human pose estimation task fundamentally constitutes a canonical inverse problem, and propose a motion-semantics-based diffusion(MS-Diff) framework to address this issue by incorporating high-level motion semantics with spectral feature regularization to eliminate interference noise in complex scenes and improve estimation accuracy. Specifically, we design a Multimodal Diffusion Interaction (MDI) module that incorporates motion semantics including action categories and camera viewpoints into the diffusion process, establishing semantic-visual feature alignment through a cross-modal mechanism to resolve pose ambiguities and effectively handle occlusions. Additionally, we leverage a Spectral Convolutional Regularization (SCR) module that implements adaptive filtering in the frequency domain to selectively suppress noise components. Extensive experiments on large-scale public datasets Human3.6M and MPI-INF-3DHP demonstrate that our method achieves state-of-the-art performance.
Description
CCS Concepts: Computing methodologies → Activity recognition and understanding
@article{10.1111:cgf.70263,
journal = {Computer Graphics Forum},
title = {{Text-Guided Diffusion with Spectral Convolution for 3D Human Pose Estimation}},
author = {Shi, Liyuan and Wu, Suping and Yang, Sheng and Qiu, Weibin and Qiang, Dong and Zhao, Jiarui},
year = {2025},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.70263}
}