Hair Removal on Face Images using a Deep Neural Network

(1)

2019 년 한국방송· 미디어공학회 하계학술대회

심층 신경망을 이용한 얼굴 영상에서의 헤어 영역 제거

Jonathan Samuel Lumentut , 이정우, 박인규 인하대학교

jlumentut@gmail.com , leejw2807@gmail.com , pik@inha.ac.kr

Hair Removal on Face Images using a Deep Neural Network

Jonathan Samuel Lumentut , Jungwoo Lee, In Kyu Park Inha University

요 약

The task of image denoising is gaining popularity in the computer vision research field. Its main objective of restoring the sharp image from given noisy input is demanded in all image processing procedure. In this work, we treat the process of residual hair removal on faces images similar to the task of image denoising. In particular, our method removes the residual hair that presents on the frontal or profile face images and in-paints it with the relevant skin color. To achieve this objective, we employ a deep neural network that able to perform both tasks in one time. Furthermore, simple technic of residual hair color augmentation is introduced to increase the number of training data. This approach is beneficial for improving the robustness of the network. Finally, we show that the experimental results demonstrate the superiority of our network in both quantitative and qualitative performances.

1. Introduction

The world of computer vision has helped humans on achieving noise-free images that is always preferred for exhibition or other computer vision tasks. The art of obtaining noise-free images is known as denoising in both computer vision or image processing area. Image noises are the undesired pixels in a certain region of an image that are caused by any random or non-random mathematical distribution. In a simple understanding, any noise of an image is the product of the convolution between each pixel in an image and a specific noise kernel.

By this manner, any noisy image can be produced once a correct specific noise kernel is identified. The predicted noise kernel is then used to deconvolve the noisy image to give output of a noise-free (or denoised) image. We treat the residual hairs that present on a face image as noises.

Residual hair on a frontal or profile images can be seen in Figures 1 and 2 where it is produced by an imperfect hair filter on any image processing tool. The objective of this work is to totally remove the residual hair of these 2D frontal and profile images. This problem becomes challenging where a residual hair image does not show full-hair representation of human face as shown in Figure 2. This case is severed by another challenge of the variety of human hair colors. We answer these challenges by providing neural network approach that learns to remove this residual hair (noise) and substitute it with the skin color according to the ground truth. We also introduce a simple color augmentation method for the residual hair only. By this approach, the works of predicting correct noise kernel and removing it by in-painting the noise are done in one time. In summary, our contributions are:

- We provide a neural network architecture that does the tasks of removing the residual hair on any frontal or profile face images and paving it with skin color in one time.

- We provide simple residual hair color augmentation technic to obtain robust neural network performance.

2. Related Works

To the best of our knowledge, no previous works are presented to do the task of residual hair removal. We take the example of the most complicated de-noising task, known as de-blurring, that is aimed to remove undesired blurry region on an image. Many blur types can be inferred, however, in this study, we specifically discuss the work of motion blur and interpolated blur.

Motion blur of an image is occurred due to the movement of capturing tool (camera) or the object itself. Recent works of deblurring employ neural network approach to achieve faster and robust processing.

The work of Nah et. al. [1] utilizes a very deep network that works on multi-scale image motion deblurring. Contrarily, the work Kim et. al. [2]

proves that using a non-deep neural network, motion deblurring can be achieved in a near real-time performance. Similarly, Lumentut et. al. [3]

provide recurrent network to deblur a light field under 6-DOF motion constraint with fast processing and full-resolution capability.

Interpolated blur usually occurred due to the process of interpolation from low resolution image to its higher resolution version.

This problem is widely known as super-resolution task on computer vision study. Recent works of super-resolution also employ neural network to produce de-blurred version on a super-resolution image.

Pioneer work of learning based super-resolution is shown in the work of Kim et. al. [4] that employ a very deep neural network architecture that accommodated with gradient clipping for faster convergence. Another improvised work of super-resolution in video using deep neural network is the work of Jo et. al. [5]. Their work employs dynamic and learnable up-sampling filters that utilize neighboring frames (images) in a video.

While previous works are widely exploited in the motion deblurring and super-resolution tasks, we aim the work of residual hair removal as denoising. In this work, we elaborate our neural network architecture and its augmentation technic to achieve our objective.

(2)

Figure 1. Network Architecture for Residual Hair Removal on Frontal or Profile Face Image.

Figure 2. Example of residual hair map (3^rd column) that is obtained by finding the absolute difference between input image (1^st column) and ground truth image (2^nd column). The map is used for the color augmentation (4^th column) for training the ResHairNet.

3. Proposed Method

Our neural network architecture is shown in Figure 1 where it is inputted with an input (I) of an image that contains the residual hair. The network will predict an in-painted region with the color of skin in the background (R). Final output (O) is the result of the addition between I and R images. By this approach, our network does the tasks of recognizing residual hair regions and substitute them with skin color in one time. For simplicity, we show only the frontal face case in Figure 1 while the profile face is done by only switching the input image. We elaborate on the details of our network in the following discussions.

Our residual hair removal network (ResHairNet) is designed with encoder-decoder style approach. The encoders are represented with 2DConv_1, 2DConv_2, and ResNet layers, while the decoders are shown in the 2DDeconv and 2DConv_4 layers. As shown in Figure 1, the height (h) and width (w) of the layers are changed in 2DConv_2 and 2DConv_4.

This is due to the usage of stride = 2 on their previous layers which are 2DConv_1 and 2DDeconv, respectively. Note that our neural network is fully convolutional approach that arbitrary input size can be processed if their numbers can be divided by 2. Our network also utilizes 20 Residual Network (ResNet) blocks similar to the style of the original author, He et.

al. [6]. The only difference here is that all the blocks are equipped with Instance Normalization instead of Batch Normalization. We draw from the previous study [4] that predicting a certain region is easier for the network convergence instead of predicting the whole image. This is expressed in our network by providing a skip connection that makes the network’s task is only to predict the in-painted residual hair as shown in R in Figure 1. Each of the network’s layer is attached with ReLU activation except the last layer (2DConv_4) that is employed with Tanh activation and without any normalization. Final output (O) is the image with the cleared residual hair. Note that, for the profile face image training, the filter size of 5x5 in 2DConv_1 layer is changed to 7x7 to obtain large receptive field in the hair regions.

To increase the robustness of the network, the augmentation technic is employed in the training phase. We introduce simple residual hair color augmentation technic in this work. To obtain the residual hair region, we simply find the absolute difference between input image (I) and the ground truth. Note that we obtain the ground truth of cleared residual hair image by utilizing Pixlr tool. Once the difference map is obtained, we use it to find the location of residual hair and simply

randomize the color of it. The example of this procedure is shown in Figure 2. We also implement spatial augmentation by cropping a large patch of 400x400 of the frontal and profile face images to maintain large region of the face. In a unique case of frontal face ResHairNet training, we employ image flipping in horizontal direction. The network is configured using ADAM optimizer and a learning rate of 0.0001. Finally, we train this network in an end-to-end manner by finding the mean squared error (MSE) between ground truth and final predicted output (O).

In the training phase, we separate 78 images as training set while 9 images as the test set in both frontal and profile face images cases.

4. Experimental Results

To show the robustness of the ResHairNet performance, we present the quantitative and qualitative result from the test set. We achieve the average PSNR results of 35.79 and 33.31 of 9 frontal and residual faces images, respectively. The average SSIM results are 0.9799 and 0.9754 for the case of frontal and residual faces. These numbers are relatively high and achieved with only a small number of training data. To clearly show the robustness of our network, we provide the qualitative results in Figure 3. Also, we include the 3D face rendering result after it is inputted with the cleaned residual hair version in the last row of Figure 3.

5. Conclusion

In this work, we utilize a neural network based residual hair- removal for frontal and profile face images. By drawing the knowledge of previous studies, we succeed to train the network with simple augmentation to increase the number of training data equip it with skip- connection to produce the in-painting effect on the residual hair area.

References

[1] S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” In Proc. of the IEEE CVPR, volume 1, page 3, 2017.

[2] T. H. Kim, K. M. Lee, B. Scholkopf, and M. Hirsch, “Online video deblurring via dynamic temporal blending network,” In Proc. of the IEEE ICCV, pages 4058-4067, 2017.

[3] J. S. Lumentut, T. H. Kim, R. Ramamoorthi, and I. K. Park, “Fast and full-resolution light field deblurring using a deep neural network,” arXiv preprint arXiv:1904.00352, 2019.

[4] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super- resolution using very deep convolutional networks,” In Proc. of the IEEE CVPR, volume 1, page 3, 2017.

[5] Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, “Deep Video Super- Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation,” In Proc. of the IEEE CVPR, pages 3224-3232, 2018.

[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” In Proc. of the IEEE CVPR, pages 770-778,

2016.

(3)

Figure 3. First-to-fourth columns in the first 4 rows: Ground truth, input with residual hair, removed residual hair using ResHairNet (O), and in-painted residual hair (R) images. The first 2 and the next 2 rows represent the results of frontal and profile faces images, respectively. Note that each image has different size and our network can handle any arbitrary input and output size. The last row demonstrates the 3D model of the cleaned residual-hair face.