Journal of Internet Computing and Services(JICS) 2018. Oct.: 19(5): 43-54 43
생성적 대립쌍 신경망을 이용한 깊이지도 기반 연무제거
Single Image Dehazing Based on Depth Map Estimation via Generative Adversarial Networks
왕 야 오 1 정 우 진 1 문 영 식 1*
Yao Wang Woojin Jeong Young Shik Moon
요 약
연무가 있는 상황에서 촬영된 영상은 낮은 대비로 인해 시인성이 낮아지는 문제가 있다
.
이렇게 연무로 인해 흐릿한 영상에서 연무의 효과를 제거하는 과정을 연무제거라고 한다.
연무제거에서 가장 중요한 문제 중 하나는 전달지도(transmission map)
또는 깊이지도
(depth map)
를 정확하게 추정하는 것이다.
본 논문에서는 정확한 깊이지도 추정을 위해 생성적 대립쌍 신경망(Generative
Adversarial Network: GAN)
을 이용한 정확한 깊이 영상 추정 방법을 제안한다.
제안된GAN
모델은 흐릿한 입력영상과 이에 상응하는깊이지도 간의 비선형 매핑을 학습한다
.
그리고 연무제거단계에서는 훈련된 모델을 사용하여 입력영상의 깊이지도를 추정하고 이것 을 전달지도를 계산하는데 사용한다.
이어서guided filter
를 사용하여 전달지도를 다듬는다.
마지막으로 대기 산란 모델을 기반으로 연무가 제거된 영상을 복원한다.
제안된GAN
모델은 합성실내영상으로 훈련되었다.
하지만 실제 연무영상에 대해서도 적용할 수 있다.
이를 실험을 통해 증명하였다.
또한 실험에서 제안된 방법이 이전에 연구된 방법에 비해 시각적 및 정량적 측면에서 우수한 결과를 나타냈다.
☞ 주제어
:
연무제거,
깊이지도 추정,
생성적 대립쌍 신경망ABSTRACT
Images taken in haze weather are characteristic of low contrast and poor visibility. The process of reconstructing clear-weather image from a hazy image is called dehazing. The main challenge of image dehazing is to estimate the transmission map or depth map for an input hazy image. In this paper, we propose a single image dehazing method by utilizing the Generative Adversarial Network(GAN) for accurate depth map estimation. The proposed GAN model is trained to learn a nonlinear mapping between the input hazy image and corresponding depth map. With the trained model, first the depth map of the input hazy image is estimated and used to compute the transmission map. Then a guided filter is utilized to preserve the important edge information of the hazy image, thus obtaining a refined transmission map. Finally, the haze-free image is recovered via atmospheric scattering model. Although the proposed GAN model is trained on synthetic indoor images, it can be applied to real hazy images. The experimental results demonstrate that the proposed method achieves superior dehazing results against the state-of-the-art algorithms on both the real hazy images and the synthetic hazy images, in terms of quantitative performance and visual performance.
☞
keyword :
dehaze, depth estimation, generative adversarial networks.1. Introduction
Haze is an atmospheric phenomenon in which smoke, dry particulates, and dust obscure the clarity of the sky. When one takes pictures in the haze weather condition, the pictures have the disadvantages of poor visibility, low contrasts and blur. A sample hazy image is shown on the left side of
1
Department of Computer Science and Engineering, Hanyang University 1271 Sa-3 Dong, Ansan-Si, Gyeonggi-do, Korea* Corresponding author ([email protected])
[Received 15 June 2018, Reviewed 21 June 2018(R2 13 August 2018), Accepted 10 September 2018]
Figure 1.
Image dehazing is an important topic in computer vision and photography filed. First, most of the computer vision algorithms such as object recognition, target tracking and image analysis are based on the haze-free images. Therefore, if the input images suffer from haze, this may lead to a bad influence on the results, such as accuracy reduction and misidentification. Second, haze removal can enhance the image quality significantly, so that the images have a better visual effect.
In order to improve the image quality by resolving
ISSN 1598-0170 (Print) / ISSN 2287-1136 (Online)http://www.jics.or.kr Copyright ⓒ 2018 KSII
shortcomings of hazy images, many dehazing methods have been proposed [1]. Early methods often rely on reference information such as multiple images of the same scene [2]
[3] or additional depth information [4]. However, for the single image dehazing problem, information about depth and multiple images of the same scene may not be available.
Therefore, some prior-based algorithms are proposed to deal with this issue, such as dark channel prior [5] [6] [7] and color attenuation prior [8]. These algorithms achieved better results in most of the hazy scene, nevertheless, they may cause the color distortion and over-saturation for some of the objects like sky and light floor. Recently, machine learning based methods show a great potential in haze removal [9]
[10]. They regarded image dehazing as a regression problem, with aim to train a convolutional neural network to estimate the transmission map, then to recover the haze-free image via atmospheric scattering model. These methods work well for thin haze, but they are not suitable for images with dense haze.
Motivated by the machine learning based methods, we consider the image dehazing problem as three separate tasks:
depth map estimation, transmission map refinement and haze-free image recovery. In the first part, the depth map of the input hazy image is estimated through the well-trained GAN framework. In the second part, transmission map is achieved using the estimated depth information, and refined by a guided filter. Finally, the haze-free image is recovered via atmospheric scattering model. The contributions of this work are summarized as follows.
demonstrated the validity and feasibility of the proposed method.
2. Background
In this section, we review some important related works on image dehazing, and the atmospheric scattering model which is useful for solving dehazing problem.
2.1 Atmospheric Scattering Model
The following atmospheric scattering model is proposed by Koschmieder [11] in order to formulatethe hazy images.
This model is widely used by most of the recent dehazing methods:
(1) where I(x) is the observed image, J(x) is the scene radiance, A is the global atmospheric light, and t(x) is the medium transmission. The goal of image dehazing is to recover J(x) by:
(2)
This requires us to estimate the global atmospheric light A and medium transmission t(x). Basically, the transmission is directly related to the depth of the image. This value is expressed as:
(3)
where β is the medium attenuation coefficient, and d(x) represents the depth of the scene.
Based on above discussion, in order to recover the haze-free images, estimating the depth information of the scene is the crux of image dehazing problem.
2.2 Generative Adversarial Networks The Generative adversarial networks (GANs) [12]are generated model, which used for generating images from noise vectors. The training of GAN framework can be regarded as a min-max game between two components: a generator G and a discriminator D, where D is optimized to give a high probability to the real data and a low probability to the generated data, and G is then optimized to increase the probability of the generated data being rated highly. The effectiveness of GANs have led to a variety of applications, such as image to image translation [13], gray-scale image colorization [14], and image super-resolution [15]. In this work, we employ GAN to estimate the depth map of their corresponding hazy image.
2.2 Related Works
Image dehazing is a challenging problem because it is ill-posed. A lot of algorithms are proposed to deal with this task. Early methods often require the multiple images of the same scene. For example, the polarization-based methods [3]
[16] employ several images taken under the different degrees of polarization to estimate the atmospheric light information, then remove the hazes through the atmospheric degradation model. Depth-based methods [17]use additional depth information of the input images or 3D models to recover the haze-free images.
Over recent years, many algorithms based on assumption and prior information are proposed. Fattal [18] assumes that the image shading and transmission components are statistically uncorrelated. They formulate an image formation model that accounts for both the surface shading and the transmission function to resolve ambiguities in the data.
Based on the same model, Tan [19] found that most of the haze-free images have the higher contrast than hazy images.
Therefore, they remove the hazes by maximizing the contrast
per patches of the input images. He [5] proposed a well-performing dark channel prior algorithm. Through the observation, they found that some pixels intensity of the outdoor haze-free images is very low and close to zero, in at least one of the color channels. Then the transmission information can be compute with this prior information.
Based on dark channel prior, Meng [20] proposed an improved method that utilizes the boundary constrain on the transmission function.
More recently, various learning-based methods are proposed. Tang [21] proposed a method based on the regression model, aiming to extract different haze-relevant features for estimating more accurate transmission map.
Following the success of convolutional neural networks (CNN) for computer vision tasks such as object recognition, image generation, and classification, Cai [10]proposed an end-to-end trainable CNN model, which can learn the haze-relevant features automatically. Ren [9] trained a multi-scale convolutional neural network combining a coarse-scale net with a fine-scale net for estimating the scene transmission map more accurately.
3. Proposed Method
The proposed method consists of three main steps illustrated in Figure 2. In the first step, the trained GAN model takes a hazy image as input and generates corresponding depth map as output. Then the depth map is used to compute the coarse transmission map. In the second step, a guided filter is applied to the coarse transmission map, which aims to obtain a refined transmission map. In the third step, the hazy-free image is recovered using atmospheric scattering model. The detailed steps involved for each of the proposed method are described in the following.
3.1 Depth Map Estimation
The task of estimating depth map from a given input hazy
image can be regarded as an image regression problem. The
objective is to learn a nonlinear mapping between a hazy
image and its corresponding depth map by minimizing the
loss of them. To solve this type of problem, convolutional
neural network is a desirable choice, but in consideration of
(Figure 2) Overview of the proposed method, consisting of three main parts: (a) Depth map estimation, (b) Transmission map refinement, (c) Single image haze removal.
(Figure 3) The training procedure of GAN model for depth map estimation.
the efficiency of the adversarial loss, we decide to employ the Generative Adversarial Network conditioned by hazy image to estimate the depth map.
The proposed network for learning the depth map consists of two models: generative model G and discriminative model D.
The generator takes the hazy image as input and learns to produce the samples that are similar as the real depth map, such that the discriminator unable to distinguish between generated samples and actual distribution. The discriminator
takes an image pair of hazy image and unknown depth map as input. The discriminator is trained to give the high probability to the real depth map and low probability to the generated depth map. The training procedure is shown in Figure 3.
With conditional GAN, both generator and discriminator
are conditioned by some extra information. We feed the hazy
image as the extra information to the generator. The
adversarial loss function is expressed as:
(Figure 4) The structure of generator. Given a hazy image as input, the generator estimates a corresponding depth map as output.
(Figure 5) The structure of discriminator. The network takes an unknown depth map pair as input, and outputs a 30x30 map in which each pixel value between 0 and 1 shows the probability of how much it believes the unknown image is true.
∼
∼
│ (4) Besides, we consider distance between the ground truth and the generated depth map in generator, since previous approaches to conditional GANs have found it beneficial to mix GAN objective function with distance [22].
loss is expressed by:
∼
∥ ∥ (5) By alternating gradient optimization between the generator and discriminator, the GAN model will slowly converge to producing depth map that is realistic as the ground truth in the dataset. Therefore, the ultimate objective is to combine the adversarial loss with loss as:
(6)
We use an encoder-decoder model as the generator. It consists of eight encoder layers and eight decoder layers.
Each encoder layer comprises of a convolution operation for extracting haze relevant features, and a leaky Rectified Linear Unit (ReLU) as activation function. Each decoder layer contains a transpose convolution for upsizing the shape, so that the output has the same size with the input image. Note that convolution process with stride 2 generally causes a loss of the important spatial information of the input image, so we add shortcut connections between the symmetric layers, called U-Net [23] to preserve low-level information. Figure 4 illustratesthe structure of the generator. The discriminator consists of five encoder layers and outputs a 30x30x1 map in which each pixel value between 0 and 1 shows the probability of how much it believes the unknown image is true. The structure of discriminator is illustrated in Figure 5.
3.2 Transmission Map Refinement
After obtaining the estimated depth map, its corresponding
(a) (b) (c)
(Figure 6) Estimating transmission map with/without guided filter. (a) Input hazy images. (b) Results without guided filter. (c) Results with guided filter.
transmission map can be computed directly by equation (3).
However, the obtained transmission map is coarse due to the noise and fuzzy boundary. Therefore, the guided filter [24]
is needed to refine the coarse transmission map.
The guided filter is one of the edge-preserving filter that computes the output by considering both the content of the input image and the guidance image. In this task, we use the coarse transmission map as the filtering input, and hazy image as guidance image. The filtering output can keep not only the content information of the filtering input, but also the important edge information of the guidance image. The estimation results with and without applying the guided filter are illustrated in Figure 6.
3.3 Single Image Hazy Removal
In addition to transmission map , the atmospheric light is also needed to recover the haze-free image according to the haze image formation (2). We estimatethe atmospheric light using the depth map. First, we pick up the top 10% brightest pixels in the depth map, since these pixels usually have the most serious haze. Then, we compute the intensity of these pixels in the corresponding hazy image, and the highest intensity pixel is chosen as the value of atmospheric light.
Up to now, we obtained the estimated transmission map
using the proposed GAN model, and estimated the atmospheric light by the proposed algorithm. Finally, the haze-free image can be recovered directly by (2).
4. Experimental Results
In this section, we first introduce the experimental environment and the datasets used for GAN model training.
Then we evaluate the proposed method through the multiple comparison experiments with the state-of-the-art algorithms on both the synthetic hazy images and real hazy images.
4.1 Experimental Environment
In order to train the GAN model, we derive images from
[25]. This dataset contains about 1500 pairs of synthetic
indoor hazy images with their corresponding depth map. We
divided all the images into 8:2, where 1159 for training, and
others for testing. We trained the proposed model on Tensor
Flow under Ubuntu 16.04 operating system, running on an
NVIDIA GeForce GTX 960. The number of epochs is 200,
learning rate is set to 2e-4 with the use of Adam optimizer.
(a) (b) (c)
(Figure 7) Comparison results of the estimated depth map and ground truth. (a) Hazy images. (b) Estimated depth map. (c) Ground Truth.
4.2 Evaluation on Depth Map
Depth information of the hazy images has an important role in dehazing problem according to Atmospheric Scattering Model. The quality of the dehazing result heavily depends on the estimated depth map. In this section, we evaluate the performance of GAN model by comparing estimated depth map withthe ground truth. As shown in Figure 7, although the depth map estimated by the proposed model lack of some details, it is approximated to the ground truth depth map. The results illustrate the effectiveness of the GAN model.
4.3 Evaluation on Synthetic Hazy Images In this section, we compare the proposed method with four state-of-the-art dehazing algorithms on synthetic hazy images through the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) metrics. We use six examples including both the indoor and outdoor images: Bicycle, Flower, Women, Man, Building, and Tree. For a more
comprehensive evaluation on the validity of the proposed model, the first four evaluation images are randomly selected in the Middelbury [26] and NYU [27]. Specially, the last two test images are taken by Samsung smartphone.
Figure 8 shows the comparison with other methods [5] [9]
[10] [20] on synthetic hazy images. As shown in Figure 8(b),
He et al. [5] tends to generate darker results, because of the
assumption that dark channel of a clear image is zero may
overestimate the haze thickness. The dehazing results
generated by Meng et al. [20] and Ren et al. [9] tend to
leave some haze in the recovered images. For example, the
flower behind the cupboard is still not clear with the haze
as shown in the second lines of Figure 8(c)(e). The method
proposed by Cai et al. [10] achieves better dehazing results
than others. However, it may lead to some color distortions,
such as the asphalt road in the last line of Figure 8(d), which
is darker than ground truth. In contrast, the dehaze results of
the proposed method are closer to the ground truth images
than other methods, and also recover more details. For
example, the plants behind the flowers are clearer than
others, as shown in the second line of Figure 8(f).
(a) Input (b) He et al. (c) Meng et al. (d) Cai et al. (e) Ren et al. (f) Ours Ground Truth (Figure 8) Comparison results on synthetic images: Bicycle, Flower, Women, Man, Building, and Tree.
(Figure 9) PSNR comparison results. (Figure 10) SSIM comparison results.
(a) Input (b) He et al. (c) Meng et al. (d) Cai et al. (e) Ren et al. (f) Ours (Figure 11) Comparison results on real images.
(Table 1) Average PSNR and SSIM of dehaze results on synthetic images.
Average Metrics
He [5]
Meng [20]
Cai [10]
Ren [9] Ours
PSNR(dB) 17.40 19.55 19.06 18.26 21.98
SSIM 0.87 0.89 0.88 0.88 0.90
For quantitative performance evaluation, PSNR and SSIM are used. Figure 9 and Figure 10 show the PSNR and SSIM comparison results of the different methods respectively. The average PSNR and SSIM of dehaze results are shown on Table 1. The comparison results show that the proposed method performs well on each evaluation image against other dehazing methods in terms of PSNR and SSIM, which indicates our model can estimate better scene transmission map.
4.4 Evaluation on Real Hazy Images In this section, we compare the proposed method with
other four methods [5] [9] [10] [20] on real hazy images.
Although the proposed GAN model is trained on synthetic indoor images, we note that it also can be applied for outdoor images. Figure 11 shows the comparison results.
The dehaze results by He et al. [5] tend to erroneously
estimate the thickness of the haze. For example, in the
second line of Figure 11 (b), the color of the ground is over
bright. The results of Meng et al. [20] can improve the
visibility but miss most of details. Cai et al. [10] can
generate good dehazing results when the haze is not dense,
but for the image with dense haze, it tends to leave some
haze in the recovered images, as shown in the second and
third lines of Figure 11(d). The results of Ren et al. [9] have
significant color distortions in some scene, such as the color
of grass in the first line of Figure 11(e). In contrast, the
proposed method can recover the clearly haze-free images
without artifacts and distortions even though the haze is
dense.
method can achieve better dehazing results compared with other algorithms on the synthetic hazy images, and visually perform well on the real hazy images without suffering in color distortions and darkness.
References
[1] D. Nan, D.-Y. Bi, L.-Y. He, S.-P. Ma and Z.-L. Fan,
"A Variational Framework for Single Image Dehazing Based on Restoration," KSII Transactions on Internet and Information Systems, vol. 10, no. 3, pp. 1182-1194, 2016.
http://dx.doi.org/10.3837/tiis.2016.03.013
[2] S. G. Narasimhan and S. K. Nayar, "Chromatic framework for vision in bad weather,"Proceedings IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2000.
https://doi.org/10.1109/CVPR.2000.855874
[3] Y. Y. Schechner, S. G. Narasimhan, and S. K. Nayar,
"Instant dehazing of images using polarization,"
Proceedings IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2001.
https://doi.org/10.1109/CVPR.2001.990493
[4] J. Kopf, B. Neubert, B. Chen, M. Cohen, D. Cohen-Or, O. Deussen, M. Uyttendaele, and D. Lischinski, "Deep photo: Model-based photograph enhancement and viewing" Proceedings ACM Transactions on Graphics, vol. 27, 2008.
https://doi.org/10.1145/1457515.1409069
[5] K. He, J. Sun, and X. Tang, "Single image haze removal using dark channel prior," Proceedings IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 2341-2353, 2010.
http://dx.doi.org/10.3837/tiis.2016.07.021
[8] Q. Zhu, J. Mai, and L. Shao, "Single Image Dehazing Using Color Attenuation Prior," British Machine Vision Conference, 2014.
https://doi.org/10.1109/TIP.2015.2446191
[9] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and MH.
Yang, "Single image dehazing via multi-scale convolutional neural networks," Proceedings European Conference on Computer Vision, pp. 154-169, 2016.
https://doi.org/10.1007/978-3-319-46475-6_10
[10] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, "DehazeNet:
An End-to-End System for Single Image Haze Removal,"
Proceedings IEEE Transactions on Image Processing, vol.
25, pp. 5187-5198, 2016.
https://doi.org/10.1109/TIP.2016.2598681
[11] H. Koschmieder, "Theorie der horizontalen Sichtweite,"
Beitrage zur Physik der freien Atmosphare, vol. 640, pp.
7-10, 1959.
https://doi.org/10.1007/978-3-663-04661-5_2
[12] I. Goodfellow, J. P-Abadie, M. Mirza, B. Xu, D.
W-Farley, S. Ozair, A. Courville, and Y. Bengio,
"Generative Adversarial Nets,"Advances in Neural Information Processing Systems Conference, 2014.
https://dl.acm.org/citation.cfm?id=2969125
[13] Z. Yi, H. Zhang, P. Tan, and M. Gong, "DualGAN:
Unsupervised Dual Learning for Image-to-Image Translation,"IEEE International Conference on Computer Vision, pp. 2868-2876, 2017.
http://doi.org/10.1109/ICCV.2017.310
[14] P. L. Suarez, A. D. Sappa, and B. X. Vintimilla, "Infrared
image colorization based on a triplet DCGAN
architecture,"IEEE Conference on Computer Vision and
Pattern Recognition Workshops, pp. 212-217, 2017.
https://doi.org/10.1109/CVPRW.2017.32
[15] C. Ledig, L. Theis, F. Huszar, J. Caballero, A.
Cunningham, A. Acosta, and W. Shi, "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network," Proceedings IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2017.
http://dx.doi.org/10.1109/CVPR.2017.19
[16] S. Shwartz, E. Namer, and Y. Y. Schechner, "Blind haze separation." Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 2006.
https://doi.org/10.1109/CVPR.2006.71
[17] S. G. Narasimhan, and S. K. Nayar, "Interactive (de) weathering of an image using physical models,"
Proceedings IEEE Workshop on color and photometric Methods in computer Vision, vol. 6, 2003.
https://www.ri.cmu.edu/publications/interactive-deweathe ring-of-an-image-using-physical-models/
[18] R. Fattal, "Single image dehazing," Proceedings ACM Transactions on Graphics, vol. 27, 2008.
https://doi.org/10.1145/1360612.1360671
[19] R. T. Tan, "Visibility in bad weather from a single image," Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008.
https://doi.org/10.1109/CVPR.2008.4587643
[20] G. Meng, Y. Wang, J. Duan, S. Xiang, and C. Pan,
"Efficient image dehazing with boundary constraint and contextual regularization," Proceedings IEEE International Conference on Computer Vision, pp.
617-624, 2013. https://doi.org/10.1109/ICCV.2013.82 [21] K. Tang, J. Yang, and J. Wang, "Investigating
haze-relevant features in a learning framework for image dehazing," Proceedings IEEE Conference on Computer
Vision and Pattern Recognition, pp. 2995-3000, 2014.
https://doi.org/10.1109/CVPR.2014.383
[22] P. Isola, J. Zhu, T. Zhou, and A. A. Efros,
"Image-to-image translation with conditional adversarial networks," Computer Vision and Pattern Recognition, 2017.
http://doi.ieeecomputersociety.org/10.1109/CVPR.2017.6 32
[23] O. Ronneberger, P. Fischer, and T. Brox, "U-net:
Convolutional networks for biomedical image segmentation," Proceedings International Conference on Medical image computing and computer-assisted intervention, pp. 234-241, 2015.
https://doi.org/10.1007/978-3-319-24574-4_28
[24] K. He, J. Sun, and X. Tang, "Guided Image Filtering,"
Proceedings European Conference on Computer Vision, pp. 1-14, 2010. https://doi.org/10.1109/TPAMI.2012.213 [25] A. Cosmin, O. A. Codruta, and D. V. Christophe,
"D-hazy: a dataset to evaluate quantitatively dehazing algorithms," Proceedings IEEE International Conference on Image Processing, pp. 2226-2230, 2016.
http://dx.doi.org/10.1109/ICIP.2016.7532754
[26] D. Scharstein, H. Hirschmller, Y. Kitajima, G. Krathwohl, N. Nesic, X. Wang, and P. Westling, "High-resolution stereo datasets with subpixel-accurate ground truth,"
Proceedings German Conference on Pattern Recognition, pp. 31-42, 2014.
http://dx.doi.org/10.1007/978-3-319-11752-2_3
[27] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from rgbd images,"
Proceedings European Conference on Computer Vision, pp. 746-760, 2012.
https://doi.org/10.1007/978-3-642-33715-4_54
2012년 한양대학교 컴퓨터공학과(공학사) 2012년~현재 한양대학교 컴퓨터공학과 박사과정 관심분야 : 컴퓨터비전, 패턴인식, 딥러닝 E-mail : [email protected]
문 영 식(Young Shik Moon)