End-to-End Detection-Segmentation System for Face Labeling

Abstract

In this paper, we propose an end-to-end detection-segmentation system to implement detailed face labeling. Fully convolutional networks (FCN) has become the mainstream algorithm in the field of semantic segmentation due to the state-of-the-art performance. However, a general FCN usually produces smooth and homogeneous results. Moreover, when semantic category is extremely unbalanced in samples such as face labeling problem, features for some categories cannot be well explored by FCN. To alleviate these problems, a face image is firstly encoded to multi-level feature maps by a pyramid FCN, then features of different facial components are extracted separately according to the bounding box provided by a one-stage detection head. Three class-specific sub-networks are employed to process the extracted features to obtain the respective segmentation results. The skin-hair region can be decoded directly from the back end of the pyramid FCN. Finally, the overall segmentation result is obtained by combining different branches. Moreover, the proposed method trained on a single-face labeled dataset, can be directly used to implement detailed multi-face labeling tasks without any network modification and additional module or data. The overall structure can be trained in an end-to-end manner while maintaining a small network size (12 MB). Experiments show that the proposed method can generate more accurate (single or multi) face labeling results comparing with previous works and gets the state-of-the-art results in HELEN face dataset.

DOI

10.1109/TETCI.2019.2947319

Year

2021