Surface Water Mapping of Remote Sensing Data Using Pre-Trained Fully Convolutional Network

(1)

https://doi.org/10.7848/ksgpc.2018.36.5.423

Surface Water Mapping of Remote Sensing Data Using Pre-Trained Fully Convolutional Network

Song, Ah Ram

¹⁾

· Jung, Min Young

²⁾

· Kim, Yong Il

³⁾

Abstract

Surface water mapping has been widely used in various remote sensing applications. Water indices have been commonly used to distinguish water bodies from land; however, determining the optimal threshold and discriminating water bodies from similar objects such as shadows and snow is difficult. Deep learning algorithms have greatly advanced image segmentation and classification. In particular, FCN (Fully Convolutional Network) is state-of-the-art in per-pixel image segmentation and are used in most benchmarks such as PASCAL VOC2012 and Microsoft COCO (Common Objects in Context). However, these data sets are designed for daily scenarios and a few studies have conducted on applications of FCN using large scale remotely sensed data set. This paper aims to fine-tune the pre-trained FCN network using the CRMS (Coastwide Reference Monitoring System) data set for surface water mapping. The CRMS provides color infrared aerial photos and ground truth maps for the monitoring and restoration of wetlands in Louisiana, USA. To effectively learn the characteristics of surface water, we used pre-trained the DeepWaterMap network, which classifies water, land, snow, ice, clouds, and shadows using Landsat satellite images. Furthermore, the DeepWaterMap network was fine-tuned for the CRMS data set using two classes:

water and land. The fine-tuned network finally classifies surface water without any additional learning process. The experimental results show that the proposed method enables high-quality surface mapping from CRMS data set and show the suitability of pre-trained FCN networks using remote sensing data for surface water mapping.

Keywords : Surface Water Mapping, Deep Learning, Fully Convolutional Networks, DeepWaterMap, Coastwide Reference Monitoring System

Original article

Received 2018. 10. 17, Revised 2018. 10. 21, Accepted 2018. 10. 23

1) Member, Dept. of Civil and Environmental Engineering, Seoul National University (E-mail: [email protected]) 2) Member, Dept. of Civil and Environmental Engineering, Seoul National University (E-mail: [email protected]) 3) Corresponding Author, Dept. of Civil and Environmental Engineering, Seoul National University (E-mail: [email protected])

1. Introduction

Surface water refers to water on the surface of the earth and includes oceans, rivers, lakes, and wetlands. Surface water mapping is an important task in various applications, such as monitoring coastline changes, predicting floods and droughts, evaluating water resources, and managing coastal zones (Sarp and Ozcelik, 2017).

As part of the effort to monitor surface water and protect coastal landscapes and wetlands, the CRMS (Coastwide Reference Monitoring System) has collected aerial digital photos from about 390 observation sites in Louisiana, USA (Steyer, 2010). The aerial photos were obtained with digital

orthophoto quarter quadrangles every 3 years, starting in 2005 and ending in 2012, at all CRMS sites. The photos are CIR (Color InfraRed) images. CRMS provides not only CIR images but also ground-truth maps with two classes: land and water. In order to make CRMS ground-truth maps, the aerial images were classified as either land or water through supervised and unsupervised classification methods with a threshold in the NIR (Near-InfraRed) band.

Many methods have been developed to extract water bodies from remote sensing data. The use of thresholds in classification is a traditional method for water mapping. Also Water indices with more than two spectral bands−such as the NDWI (Normalized Difference Water Index; Mcfeeters,

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://

(2)

CRMS data set without thresholds, data-based approaches are needed.

Deep learning algorithms have considerably advanced in the field of image segmentation and classification. Many studies have shown the effectiveness of deep learning algorithms such as ANNs (Artificial Neural Networks) and FCNs (Fully Convolutional Networks) for the high-accuracy classification of water bodies (Karpatne et al ., 2016; Isikdogan et al ., 2017). In particular, FCNs represent a state-of-the-art image segmentation method that replaces the fully connected layers at the end of CNNs (Convolutional Neural Networks) with convolutional layers (Fu et al ., 2017).

FCNs can learn semantic features for classification;

however, in many remote sensing applications, the number of labeled samples is insufficient for network parameter training.

Even if there are a lot of data, high-performance computing resources are required because of the multiple iterations involved. In these cases, one of the typical solutions is using pre-trained networks, which have been previously trained on a large data set. Many kinds of FCNs have successfully been trained on large data sets consisting of daily scenes, such as ImageNet, PASCAL VOC2012, and Microsoft's COCO (Common Objects in Context). However, there are some differences between daily images and remote sensing images. Many remotely sensed images have multispectral bands, including in the NIR, while everyday scenes generally have red, green, and blue bands. The NIR is especially important for water because water has a high absorption region at NIR and beyond. Because of the absorption region, water can be detected and delineated with remote sensing images. Therefore, in order to effectively classify water on

mapping. The pre-trained DeepWaterMap network (Isikdogan et al ., 2017) was fine-tuned on CRMS data set with different learning rates. The DeepWaterMap was previously trained on Landsat images and GLCF (Global Land Cover Facility) data sets which are ground truth data that provides per-pixel labels for each Landsat images. After the fine-tuning process, the DeepWaterMap network classified the CRMS images into the two classes, water or land, using the modified weights without any additional learning process. Finally, the classification results using the fine-tuned DeepWaterMap network are compared to the results of NDWI, ANNs, and fine-funed FCN-8s to validate the effectiveness of the proposed method.

2.1 Fully convolutional networks

Usually, FCNs extend to well-trained CNN models, such as AlexNet, GoogLeNet, and VGGNet (Long et al ., 2015).

These nets contain several fully connected layers, which have fixed dimensions and produce non-spatial outputs. However, in many remote sensing applications, 2D (Two-Dimensional) spatial information is required as output. Various approaches have been proposed to maintain 2D spatial structure, and the representative approaches are patch-based CNNs and FCNs.

Patch-based CNNs divide the input image into small patches and apply the CNN model on each patch to predict the label of a center pixel of the patch (Fu et al ., 2017). However, patch- based CNNs make too many redundant computations because of overlapped patches.

FCNs replace all fully connected layers with convolutional

layers. The fully connected layers can be treated as

convolutional layers with kernels that cover their entire input

regions (Long et al ., 2015). Replacing the fully connected

(3)

layers with convolutional layers makes it possible to accept any arbitrary size as input and enables pixel-level classification instead of image-level classification because convolutional layers can produce a coarse feature map instead of a node as an output (Long et al ., 2015). Also, FCNs can train entire images at a time and reduce the implementation complexity (Fu et al ., 2017). Usually, FCNs are composed of convolutional, pooling, deconvolutional, and activation layers. The convolutional can be expressed as Eq. (1) (Jiao et al ., 2017):

, (1) where “ ” refers to the convolutional operator, and is the nonlinear activation function. is the th feature map in layer , and the output of the convolutional layer is . is the set of input images, and is the convolutional kernel of layer that connects the th map in layer and th map in

. is the bias.

The output number of the last convolutional layer (also called a feature map) is equal to the number of classes to be separated. The feature maps can be seen as the stacking of heat maps (score distribution) for all classes (Fu et al ., 2017).

For example, the VGG-verydeep-16 model contains 13 convolutional layers, 5 pooling layers, and 3 fully connected layers for pixel-level image classification. The fully connected layers are turned into fully convolutional layers in FCNs based on VGG16, and the last convolutional layer produces two feature maps−in this paper, one for water and the other for

land classifications.

Because there are several convolutional layers and pooling layers in FCN nets, the size of input images is reduced by subsampling to keep the filters small and the computational burden reasonable. In remote sensing image-classification tasks, the loss of spatial information is a serious problem.

When upsampling the coarse feature maps to have the same size with the input images, deconvolutional layers and skip architectures such as FCN-8s (Long et al ., 2015) can be used.

FCN-8s is made by fusing predictions from pool3 with a 2-pixel stride deconvolution of feature maps fused from pool4 and conv7 (Long et al ., 2015). The deconvolutional layer produces enlarged and dense feature maps by increasing the spatial resolution of the outputs. The deconvolutional kernel could be learned as convolutional computation (Eq. (2)) (Jiao et al ., 2017).

(2) where is the deconvolution kernel of layer that connects the th map in layer and th map in . is the bias. The skip connections are used to combine the final coarse feature maps (produced by the deep layers) and the fine-scale feature maps (produced by the shallow layers).

The final feature maps are passed through the softmax

(normalized exponential) function. The predicted classification

labels were determined by minimizing the normalized sum of

the multinomial logistic loss of the softmax outputs (Audebert

Fig. 1. Flowchart of the proposed method for surface water mapping

(4)

verydeep-16 trained on the renowned benchmark data set instead of remote sensing data has been used as a pre-trained network for many land-cover classification applications (Fu et al ., 2017; Jiao et al ., 2017). Because it is difficult to use reliable ground-truth data for large-scale remotely sensed image data sets, these data sets have been rarely used for deep-learning models.

In this paper, DeepWaterMap was used as a pre-trained network for surface water mapping (Isikdogan et al ., 2017).

DeepWaterMap is a multiscale FCN trained on multispectral Landsat imagery for mapping surface water. Landsat satellites have been collecting global coverage of multispectral imagery for more than 40 years. DeepWaterMap used the Global Land Cover Facility data set as ground-truth data for the Landsat imagery and trained the model using Landsat imagery for water mapping. DeepWaterMap acts like encoder-decoder networks, and it consists of 53 convolutional layers, 6 pooling layers, 9 deconvolution layers, 53 batch normalization layers, and a final softmax layer with five classes. DeepWaterMap finally classifies water bodies from land, snow, ice, clouds, and shadows. DeepWaterMap has two types of skip connections, and these connections made it possible to reuse previous features. It was identified that DeepWaterMap was able to efficiently learn the global characteristics of water bodies (Isikdogan et al ., 2017).

2.3 FCN fine-tuning

The process of fine-tuning should be conducted on the new data set in order to use the pre-trained networks for the CRMS data set. The task of fine-tuning is to update the parameters of an already trained network so that they adapt to the new

results were copied three bands to initialize the fine-tuned network (Wang et al ., 2015). Also, the weights of the encoding layers were initialized with (the learning rate for the encoder) using the corresponding layers from DeepWaterMap, and the weights of the decoding layers were initialized with (learning rate for the decoder)(Fig. 2). In this study, is 10 times higher than . This difference is because the first few layers can capture universal features that are applicable for the new data set. In addition, setting a lower learning rate for the encoder part of the FCN may act as a regularizer and prevent overfitting problems (Audebert et al ., 2016).

Finally, the number of feature maps should be updated to match the number of new classes. Because the CRMS data set has labels with two classes (water and land), the output number of the last softmax layer was modified to 2.

3. Results

In this paper, a fine-tuned DeepWaterMap was applied to classify surface water from land. Representative methods such as NDWI, ANNs, and fine-tuned FCN-8s were used for surface water mapping on the CMRS data set. In particular, FCN-8s based on VGG-verydeep-16 were trained on PASCAL VOC2012 and were fine-tuned the same way as was done for DeepWaterMap.

The CRMS images were divided into two subsets, referred to as the training and validation sets. The spatial resolution of CRMS images is 1 m, and they consist of 1000×1000 pixels.

In total, 284 images were used as training sets and 72 images

were used as validation sets. To consider graphics processing

unit memory limitations, we split the input images and

(5)

corresponding ground-truth maps into smaller 300×300 pixel patches. Overall, patches of the CRMS data set were used for training and validation; the output of the softmax layer was set to 2, and two feature maps were conducted the argmax.

The final feature maps represent the score distribution of the corresponding classes.

Two test images that were not used for training and validation were classified to show the classification results using the trained networks. The classification results were evaluated in terms of their score and overall accuracy. The score is the harmonic mean of precision and recall, and it is available when data are imbalanced. The equation for calculating the score is defined as Eqs. (3) and (4) (Audebert et al ., 2016).

, (3) , (4) where is the number of true positives, is the number of false positives, and is the number of false negatives for class . The score takes both precision and recall into account, and its best value is 1 (perfect precision and recall).

The classification results are shown in Figs. 3 and 4, and Tables 1 and 2 lists the producer accuracy, scores and overall accuracy for each class.

Fig. 2. Overall architecture of DeepWaterMap. The red rectangles show the phases of freeze and train (Isikdogan et al ., 2017)

(6)

(d) (e) (f) Fig. 3. The classification results obtained from site 1 with the different methods:

(a) an aerial photo (the areas marked in white rectangles are enlarged in Fig. 4),

(b) a ground-truth map, (c) NDWI, (d) ANN, (e) fine-tuned FCN-8s, and (f) fine-tuned DeepWaterMap.

Color legend: water (blue) and land (brown)

Table 1. Accuracy comparison of various methods:

NDWI, ANN, fined tuned FCN-8s and fine-tuned DeepWaterMap on site1

Water Land Overall F1 Overall Accuracy

Site 1

NDWI 66.34% 82.55% 76.34% 77.01%

ANN 88.19% 96.15% 92.56% 94.19%

Fine-tuned FCN-8s 86.39% 95.35% 91.03% 93.07%

Fine-tuned DeepWaterMap 90.03% 95.99% 93.16% 94.28%

(7)

(a) (b) (c)

(d) (e) (f)

Fig. 4. The classification results obtained from site 2 with the different approaches:

(a) an aerial photo, (b) a ground-truth map, (c) NDWI, (d) ANN,

(e) fine-tuned FCN-8s, and (f) fine-tuned DeepWaterMap. Color legend: water (blue) and land (brown)

Table 2. Accuracy comparison of various methods:

NDWI, ANN, fined tuned FCN-8s and fine-tuned DeepWaterMap on site2

Water Land Overall F1 Overall Accuracy

Site 2

NDWI 84.19% 77.97% 82.24% 81.59%

ANN 97.43% 97.26% 97.37% 97.35%

Fine-tuned FCN-8s 96.38% 96.27% 96.44% 96.32%

Fine-tuned DeepWaterMap 97.62% 97.49% 97.61% 97.56%

(8)

(a) (b) (c)

(d) (e) (f)

Fig. 5. The enlarged images of the classification results of site 1:

(a) and (d) ground-truth map, (b) and (e) the ANN results, and (c) and (f) the fine-tuned DeepWaterMap results zero or negative values. Although mainstream seawater is

relatively classified, but water on the land area is not classified correctly. Most land areas were misclassified as water. This misclassification is due to the relatively dark pixels in the land.

The ANN and the fine-tuned DeepWaterMap had higher F1

detailed classification results of the two methods, the ANN

and the fine-tuned DeepWaterMap, of which accuracies are

relatively high. Compared to the results of the ANN, the fine-

tuned DeepWaterMap successfully classified the main stream

of the water. However, the fine-tuned DeepWaterMap tends to

(9)

misclassify pixels with low reflectance in land such as shadow as water.

As shown in Tables 1 and 2, compared with the other methods, the fine-tuned DeepWaterMap model has higher accuracy and F1 scores for the water and land classes across all sites. These results suggest that a DeepWaterMap network could more effectively classify water bodies in land regions as compared with other networks. Although shorelines are well classified in both FCNs, the fine-tuned FCN-8s did not represent detailed water bodies in land regions. The DeepWaterMap effectively classified not only water but also land. This effective classification is because the CRMS images have NIR information, which is significant for water classification, and DeepWaterMap was also trained on Landsat images, including NIR bands. Furthermore, the target classes to be separated from CRMS and Landsat has similar spectral characteristic, so the pre-trained weights could be more applicable to CRMS than the trained PASCAL VOC2012.

4. Conclusion

In this paper, the use of FCN and remote sensing data for surface water mapping was investigated. In order to effectively learn the characteristics of water bodies from the pre-trained networks, the DeepWaterMap network trained on Landsat images was used, and the network was fine-tuned with different learning rates on the CRMS data set. Finally, without any additional learning process, the fine-tuned network classifies the surface water from lands.

The experimental results showed that the fine-tuned Deep Water Map performed significantly better than other traditional surface water mapping approaches−such as the NDWI and ANNs−at classifying water. In particular, it classified water bodies much better than the fine-tuned FCN-8s trained on PASCAL VOC2012. The reason for this enhanced classification is because CRMS and Landsat data include the same NIR band, which is significant for water classification.

Furthermore, effectively fine-tuning DeepWaterMap by setting different learning rates for the encoder and decoder phases made it possible to optimally use the pre-trained weights for specific data sets.

This study proves that remotely sensed images are strongly applicable as remote sensing data for surface water mapping.

In the future work, we will conduct further experiments to demonstrate that the weights obtained from CRMS data sets can be applied to other data set with different resolution as pre-trained weights.

　　

Acknowledgment

This work was supported by Global Surveillance Research Center (GSRC) program funded by the Defense Acquisition Program Administration (DAPA) and Agency for Defense Development (ADD).

　

References

Audebert, N., Lesaux, B., and Lefévre, S. (2016), Semantic segmentation of earth observation data using multimodal and multi-scale deep networks, In : Asian Conference on Computer Vision, Computer Vision-ACCV 2016, Springer, Cham, Taipei, Taiwan, pp. 180-196.

Fu, G., Liu, C., Zhou, R., Sun, T., and Zhang, Q. (2017), Classification for high resolution remote sensing imagery using a fully convolutional network, Remote Sensing , Vol. 9, No. 5, pp. 498.

Isikdogan, F., Bovik, A.C., and Passalacqua, P. (2017), Surface water mapping by deep learning, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , Vol. 10, No. 11, pp. 4909-4918.

Jiao, L., Liang, M., Chen, H., Yang, S., Liu, H., and Cao, X. (2017), Deep fully convolutional network-based spatial distribution prediction for hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing , Vol. 55, No. 10, pp. 5585-5599.

Karpatne, A., Khandelwal, A., Chen, X., Mithal, V., Faghmous, J., and Kumar, V. (2016), Global monitoring of inland water dynamics: state-of-the-art, challenges, and opportunities, 4th International Conference on Computational Sustainability , Springer, Cham, 6-8 July, Newyork, USA, pp. 121-147.

Long, J., Shelhamer, E., and Darrell, T. (2015), Fully

convolutional networks for semantic segmentation,

(10)

(CRMS) . US Geological Survey Fact Sheet 2010-3018, USGS National Wetlands Research Center, Lafayette, LA, pp. 2.

Wang, L., Xiong, Y., Wang, Z., and Qiao, Y. (2015), Towards good practices for very deep two-stream ConvNets.

Cornell University Library , Ithaca, New York, https://

arxiv.org/abs/1507.02159 (last date accessed 17 October

2018).