DGFormer: Dynamic graph transformer for 3D human pose estimation

Abstract

Despite the significant progress for monocular 3D human pose estimation, it still faces challenges due to self-occlusions and depth ambiguities. To tackle those issues, we propose a novel Dynamic Graph Transformer (DGFormer) to exploit local and global relationships between skeleton joints for pose estimation. Specifically, the proposed DGFormer mainly consists of three core modules: Transformer Encoder (TE), immobile Graph Convolutional Network (GCN), and dynamic GCN. TE module leverages the self-attention mechanism to learn the complex global relationships among skeleton joints. The immobile GCN is responsible for capturing the local physical connections between human joints, while the dynamic GCN concentrates on learning the sparse dynamic K-nearest neighbor interactions according to different action poses. By building the adequately global long-range, local physical, and sparse dynamic dependencies of human joints, experiments on Human3.6M and MPI-INF-3DHP datasets demonstrate that our method can predict 3D pose with lower errors outperforming the recent state-of-the-art image-based performance. Furthermore, experiments on in-the-wild videos demonstrate the impressive generalization abilities of our method. Code will be available at: https://github.com/czmmmm/DGFormer

Publication
Pattern Recognition, 152
Ju Dai(代菊)
Ju Dai(代菊)

My research interests include distributed robotics, mobile computing and programmable matter.

Junjun Pan(潘俊君)
Junjun Pan(潘俊君)
Professor of Beihang University

My research interests include computer vision, computer graphics, animation and medical simulation.