Peeking into Occluded Joints:A Novel Framework for Crowd Pose Estimation
ECCV 2020

Lingteng Qiu1,2,3, Yanran Li4, Guanbin Li5, Xiaojun Wu3, Zixiang Xiong6,
Xiaoguang Han1,2,#, Shuguang Cui1,2
1CUHK-SZ , 2SRIBD, 3HIT-SZ, 4Bournemouth University,
5Sun Yat-sen University, 6Texas A&M University

The current SOTA method (left) VS our method (right). Our method demonstrates a more natural and accurate estimation for occluded joints.

Abstract

Although occlusion widely exists in nature and remains a fundamental challenge for pose estimation, existing heatmap-based approaches suffer serious degradation on occlusions. Their intrinsic problem is that they directly localize the joints based on visual information; however, the invisible joints are lack of that. In contrast to localization, our framework estimates the invisible joints from an inference perspective by proposing an Image-Guided Progressive GCN module which provides a comprehensive understanding of both image context and pose structure. Moreover, existing benchmarks contain limited occlusions for evaluation. Therefore, we thoroughly pursue this problem and propose a novel OPEC-Net framework together with a new Occluded Pose (OCPose) dataset with 9k annotated images. Extensive quantitative and qualitative evaluations on benchmarks demonstrate that OPEC-Net achieves significant improvements over recent leading works. Notably, our OCPose is the most complex occlusion dataset with respect to average IoU between adjacent instances. Source code and OCPose will be publicly available.

Method

This figure depicts the two stages of estimation for one single pose. The GCN-based pose correction stage contains two modules: the Cascaded Feature Adaptation and the Image-Guided Progressive GCN. Firstly a base module is employed to generate heatmaps. After that, an integral regression method is employed to transform the heatmap representation into a coordinate representation, which can be the initial pose for GCN network. The initial pose and the three feature maps from the base module are processed in Image-Guided Progressive GCN. The multi-scale feature maps are updated through the Cascaded Feature Adaptation module and put into each ResGCN Attention blocks. J1, J2 and J3 are the node features excavated on related location (x, y) from image features. The error of Initial Pose, Pose1, Pose2, and Final Pose are all considered in the objective function. Then the OPEC-Net is trained entirely to estimate the human pose.

Result

Comparison

Our approach outperforms existing state-of-the-art methods on both multi-person and couple situation.

Multi-Person

Couple People

More Qualitative Results

The current SOTA method (left) VS our method (right) on CrowdPose Dataset.
The current SOTA method (left) VS our method (right) on OCHuman Dataset.
The current SOTA method (left) VS our method (right) on OCPose Dataset.

OCPose Dataset

We create a new dataset called OCPose to evaluate our method on very crowd situation. More details and the download link can be found in our here.

BibTeX

@inproceedings{qiu2020peeking,
      title={Peeking into occluded joints: A novel framework for crowd pose estimation},
      author={Qiu, Lingteng and Zhang, Xuanye and Li, Yanran and Li, Guanbin and Wu, Xiaojun and Xiong, Zixiang and Han, Xiaoguang and Cui, Shuguang},
      booktitle={European Conference on Computer Vision},
      pages={488--504},
      year={2020},
      organization={Springer}
    }