Automatic Extraction of Liver Region from Medical Images by Using an MFUnet

(1)

Automatic Extraction of Liver Region from Medical Images by Using an MFUnet

Vo Thi Tuong Vi

^*

, A-Ran Oh

^**

, Guee-Sang Lee

^***

, Hyung-Jeong Yang

^***

, Soo-Hyung Kim

^***

Abstract

This paper presents a fully automatic tool to recognize the liver region from CT images based on a deep learning model, namely Multiple Filter U-net, MFUnet. The advantages of both U-net and Multiple Filters were utilized to construct an autoencoder model, called MFUnet for segmenting the liver region from computed tomograph. The MFUnet architecture includes the autoencoding model which is used for regenerating the liver region, the backbone model for extracting features which is trained on ImageNet, and the predicting model used for liver segmentation. The LiTS dataset and Chaos dataset were used for the evaluation of our research. This result shows that the integration of Multiple Filter to U-net improves the performance of liver segmentation and it opens up many research directions in medical imaging processing field.

Keywords : liver segmentation | medial image | CT images

I. INTRODUCTION

Understanding the necessary preconditions of complicated medical procedures poses a challenge for physicians in the success of human health-related activities. As a result, many methods have been proposed to support physicians, such as extraction of the region of interest from DICOM images as advanced tools for three-dimensional visualization. Accordingly, the accurate of the segmentation of abdominal organs such as liver, kidney, etc is essential for several clinical procedures including to pre-evaluation of the liver before liver transplant surgery based on a living donor or a detailed analysis of abdominal organs to determine the exact position of graft before abdominal aortic surgery. This encourages ongoing research to achieve better segmentation performance and overcoming numerous challenges stemming from both the flexible anatomical properties of the abdomen and restriction of the

reflecting image characteristics.

In our body, the largest internal organ is the liver. It plays an important role in the natural metabolism of the body. Its mission is that storing glycogen, synthesizing plasma protein, participating in nutrients metabolic, and detoxify. Besides that, it also incorporates clotting-factor concentrates, fibrinogen, thrombin.

According to the problem of identifying organs in the body, we initially interested in the liver. We used computed tomography (CT) data sets for approaching the liver segmentation problem. The CT datasets were collected after the injection of contrast agents for pre-evaluation. Therefore, there is the low-intensity contrast between the liver and other nearby organs. This makes the automatic liver segmentation becomes a challenging task.

To solve this pervasive problem, we proposed the simple but effective architecture named MFUnet. By building Multiple Filter into the classical U-net model, the intermediate feature maps are desired to be used more effectively. Multiple filters focus on

*This work was supported by the Bio & Medical Technology Development Program of the National Research Foundation(NRF) & funded by the Korean government (MSIT) (NRF-2019M3E5D1A02067961).

* Student Member, Graduate Student, Dept. of AI Convergence, Chonnam National University

** Member, Researcher, Dept. of AI Convergence, Chonnam National University

*** Member, Professor, Dept. of AI Convergence, Chonnam National University

Manuscript: 2020. 02. 13 Revised: 2020.04.14, 2020.06.23

Confirmation of Publication: 2020.09.15 Corresponding Author: Soo-Hyung Kim, e-mail: [email protected]

(2)

the target structure to learn automatically. In this way, using an external organ localization module can be discarded while retaining prediction performance highly. U-net models with Multiple filters can be trained from scratch in the same training way with fully convolutional network models. In this paper, we generalize this design to recognize local regions.

We demonstrated the performance of Multiple Filter in CT liver segmentation in 2 liver segmentation datasets, which got from 2 challenges:

Liver tumor segmentation challenge (LiTS dataset) and Combined Healthy Abdominal Organ Segmentation - Chaos dataset. We combined multiple filters into a variant of U-net, termed MFUnet, to prove that it can automatically localize the interest of the regionand improves the general segmentation performance. The experiments show that Multiple filters can improve predictive accuracy across different training datasets and sizes while reaching modern performance without demanding multiple CNN models.

The paper is divided into five sections, with each section devoted to a different aspect of experiments as follows: section II describes related works on which our results are founded. Then, we present our method and the experimental results in section III and section IV. Finally, we discuss and summarize our method in section V with some further research.

II. RELATED WORK

2.1 Deep learning in medical image analysis It seems undeniable that convolutional neural networks - CNNs are promoting progress in the medical image segmentation area, resulting in

superior performances in many applications. Most existing medical image segmentation architectures are proposed and inspired by the fully convolutional neural network (FCN) [1]. It has achieved great success in a large field of recognition and segmentation problems. FCN uses convolutional layers instead of the fully connected layers to obtain dense pixel prediction at one step forward. They are then upsampling the prediction to restore the original resolution of the input image. Besides, using intermediate feature maps, skip connections are embedded in the network to improve the predictability. There are two architectures of FCN known as well as U-net [2] and Feature Pyramid Network – FPN [3]. U-net uses skip connection to concatenate the features from the contractive path to the expansive path, which are two important paths of U-net architecture. They were created by a stack of convolutional layers with pooling and up-sampling layers. While U-net is useful for segmentation [4], the feature pyramids network FPN is useful for detecting objects at different scales as the fundamental component in recongnition systems.

Like U-net, FPN has a lateral connection between the bottom-up pyramid in the left and the top-down pyramid in the right. Later on, the Inception [5]

were introduced to use the varying kernel sizes of convolutional layers in parallel, to inspect the points of interest in images from different scales. These features were combined together and transmitted on deeper into the network.

Inspired by this success, MultiResUNet [6] has included multi-filter in the network for medical image segmentation. The MultiRes block replaces the sequence of two convolution layers at each level in the original U-net. They Fig. 2. The blurred edges of the liver region on

the Chaos dataset Fig. 1. The complexity contrast on the LiTS

dataset

(3)

started with a simple Inception-like block by using three convolutional filters 3×3, 5×5 and 7×7 in parallel, to reconcile spatial features from different context size. Then they established the MultiRes block, by increasing the number of filters in the three consecutive layers and adding a residual connection, along with 1×1 filters for preserving dimensions.

There are more than 95 medical imaging benchmarks have been organized from 2010 to 2019.

They concentrated on segmentation, detection, classification, and prediction tasks of organs in the human body. In particular, the whole heart segmentation used multiple modalities [7], prostate segmentation [8] and whole-body segmentation in 3D medical images [9]. The others focus on brain tumor segmentation, Alzheimer’s disease detection [10] and skin lesion analysis [11].

For liver and liver tumor task, there are two related benchmarks. The first benchmark, which was introduced in the International Conference on Medical Image Computing and Computer - Assisted Intervention (MICCAI) 2007, is the liver segmentation Sliver’07. This dataset included 30 CT volumes [12]. The second benchmark is The Liver Tumor Segmentation Challenge (LTSC’08), which was organized in conjunction with MICCAI 2008 workshop [6]. There were 30 CT volumes used for this liver tumor segmentation.

2.2 Liver segmentation in CT image

Because the liver is a regular organ of primary tumor growth, the first and necessary step to liver tumor diagnosis as well as the liver transplantation system and 3D liver volume rendering is the automatic liver analysis. Manual liver segmentation is feasible, but it is time-consuming and cost-intensive and also depends on the changeability of the operator. Image processing and machine learning techniques provide several semi-automatics and automatic methods for liver image segmentation more quickly and save time.

Nevertheless, the liver segmentation from the abdominal images is a laborious task owing to three main reasons. The first is that the contrast between the liver and the surrounding area is low and complexity, as shown in Figure 1, and the edges of the liver are blurred as Figure 2. Second, the intensity of pixels in the liver region is similar to tissues in the abdominal image. Thirdly, the liver is variant and complicated in shape and position.

Therefore, the various researcher has proposed semi-automatic or automated methods to solve these problems.

Graph-cut segmentation algorithms [13,14] and live wire algorithm [15] are based on a boundary term and region terms to collect the region of the liver. Beck and Aurich [16] proposed a region-growing algorithm that takes advantage of a close pixel/ voxel share common gray values. The Level set is another approach [17,18] when requesting the user draw a rough contour outside or inside the object and then the contour will shrink/extend. When the contour meets the boundary of the object, the procedure will be ended.

These approaches are suitable for a small dataset.

Therefore, Salemolen [19] built an atlas matching with 20 training images- a large number of anatomical images than previous methods to segment the liver.

Lamecker et al. [20], Kainme et al 2007 [21]

combine a statistical shape model with the principal component analysis (PCA). Another automatic method is region-growing [22,23] and the gray-level [24] they used histogram of intensities to analyze the distribution of the liver as well as combining with clustering, voxel classification, EM algorithm to segment the liver.

Since the challenging datasets were released such as SLIVER07, LiTS, etc. many studies on these datasets were proposed based on deep learning methods[25,26].

With performance and robustness advantages in particular segmentation tasks, U-net architecture, introduced by Ronneberger et al. [27], has asserted its especially role in the biomedical image segmentation task. Like U-net, other approaches used 2D CNN without cross-connections, showed good results in resolving the low contrast problem between the nearby organs and the liver region, especially on SLIVER07 [25].

H-dense U-net [28] is a novel hybrid densely connected U-net. H-dense U-net includes two parts. The first part is a 2D DenseUNet, which is efficient in extracting intra-slice features. Another part is a 3D counterpart, which is hierarchical on aggregating volumetric contexts. Even though with a single model, the H-dense U-net method surpassed other state-of-the-art on the segmentation results of liver lesions and achieved competitive performance for liver segmentation tasks.

(4)

Recently, in a single model case, using transfer Learning for 3D Medical Image Analysis (Med3D) [29] on state-the-of-art DenseASPP segmentation network achieved 94.6% Dice coefficient, which approached the competitive result on the LiTS challenge for liver segmentation task.

Yading Yuan [30] also introduced the hierarchical convolutional-Deconvolutional Neural Networks for Automatic Liver and Tumor segmentation and gained the mean dice score of 0.963 for liver segmentation.

In this paper, we are considering primarily automatic segmentation techniques to propose the multiple filter U-net for accessing the first step on the liver segmentation task then for the medical area.

Our model trained on the LiTS dataset and Chaos dataset for comparing the performance mainly between MFUnet and Unet as well as other methods.

III. PROPOSED METHOD

3.1 Convolutional neural network

3.1.1 Advantages of U-net

We build MFUnet on the standard U-net by combining multiple filter block. Firstly, the purpose of the down-sampling path of U-net is to catch the context of the input image to extract essential features by composing of two 3×3 convolution layers and activation function (with batch normalization and 2×2 max pooling) [2]. Then, this coarse contextual information was transferred to the up-sampling path through skip connections.

Secondly, the up-sampling path of U-net includes feature map from the down-sampling path by concatenating via deconvolution with two 3×3 convolution layer and activation function (with batch normalization). The objective of this path is to enable precise localization combined with contextual information from the down-sampling path.

Thirdly, the following advantage comes from the location information combination between the down-sampling path and the contextual information in the up-sampling path .

This is necessary to have a good segmentation Fig. 3. The Multi Filter block (MF block) in MFUnet

Fig. 4. The MFUnet architecture. The MF block is short of Multi Filter block

(5)

prediction based on obtaining a localization and context, combining general information in all image.

3.1.2 Advantages of Multiple Filter

There are many kinds of features in the input image, such as edge, arch, color, etc. The network can be confident about the input image if it can recognize a whole variety of different shapes and patterns of the object in the image. The filter is shifted several times and then applied at different image positions until the general image has been detailed. However, each filter is going to learn precisely one pattern or feature that will excite it. A single filter cannot be equally excited by a horizontal and a vertical line. For informally, the filter size is how many neighbor information you can see when processing the current layer. For example, when the filter size is 3×3, that means each neuron can observe information from a total of 8 neighborhoods like that upper, down, left, right, upper left, upper right, lower left, lower right. In another case, some filter learns the same features. Therefore, one filter is not enough for segmentation problems.

The weights of the filters are randomly initialized.

A filter is represented by a weight vector that we combine with the input. The filter provides a measure of closing an input patch that resembles a feature. The filters are applied similarly, but the cell value in the matrix is different from each other filters.

So, they extract different features from the image.

3.2 Multiple filter U-net

In this study, we construct our multiple filter model on a standard U-net architecture. Our goal is to

leverage the advantages of Multiple Filter as a filtering hierarchy, which has semantics from low to high levels and concatenate all of them into a feature pyramid with high-level semantics all through. After that, we apply Multiple Filter to U-net to improve the performance of U-net. The architecture of MFUnet is shown in Figure 4.

The model of our method departs from a single-scale layer of optional size as input and constitute corresponding sized feature maps at multiple levels filter as output. The concept of mounted is the bond between the different size of the convolution filter integral with the convoluted image.

We call Multiple Filter block as MF block, which is shown in Figure 3.

MFUnet applied three filters of different sizes 1x1, 3x3, 5x5 with the same speed directly on the original layer instead of using the same filter to the one input layer. Therefore, the multi-filter block is used to deal with the issue by upscaling the filter size instead of iteratively alleviating the image size.

The multi-filter block is interleaved three times in our architecture, a choice made through extensive experimentation. The first layer has a filter size 1×1, while the next layer raises the filter size of 3×3 and a down-sampling size of 2, accompanied by the last layer, which size is 5×5 and a down-sampling size of 2 to extract features.

After that, we join on concatenating all of them into the Relu activation function, as seen (1).

Raw image Ground truth MFUnet U-net

Fig. 5. The last feature maps visualization of MFUnet and U-net

(6)

It is noteworthy that the spatial resolution alleviates when we go up. And not surprising that the semantic value for each layer increases while more high-level structures are discovered.

The multi-filter block puts forward a view to detecting from multiple feature sizes. To arrive at these features, we experimented with fewer or more size convolution filter. With the motivation of splitting up regularly into smaller 1×1, 3×3, 5×5 square sub-regions, this retains essential spatial information. For each subregion, we explore a few plain features. After training under 50 epochs, we visualized the last features map of each network in Figure 5. It showed that the MFUnet recognize the local region better.

IV. EXPERIMENT AND RESULTS

The proposed MFUnet model is modular and separated from any application so that it can be easily adjusted for image scale segmentation tasks.

To illustrate its suitability to image segmentation, we proposed the multiple filters based FCN models on challenging Liver tumor segmentation task 1 and Chaos challenge task 2.

4.1 Evaluation datasets

In the evaluation section, we introduce the image datasets used in segmentation experiments. The liver segmentation from computed tomography (CT) data sets, which were collected at the portal stage after contrast injection, was given to the pre-evaluation of living donated liver transplantation donors.

4.1.1 LiTS dataset

This data derives from several clinical sites, including the Ludwig Maximilian University of Munich, Radboud University Medical Center of Nijmegen, Poly-technique & CHUM Research Center Montreal, Tel Aviv University, Sheba Medical Center, IRCAD Institute Strasbourg and Hebrew University of Jerusalem. This dataset is publicly available on the LiTS challenge, which is organized in conjunction with ISBI 2017 and MICCAI 2017 [27].

There are 201 contrast-enhanced CT volume scans of liver, and manual tumor annotations performed slice by slice without the same number of slices in each volume. Each image from the dataset is

captured at 512×512 resolution. This dataset already divides two parts: a training set and a test set. In the training data, it contains 131 CT scans, and the test data have 70 CT scans.

The image data is very various in terms of resolution and image quality as shown in Figure 1 because they were acquired with different CT scanners and acquisition protocols. In particular, the image resolution ranges from 0.56 mm to 1.0 mm in axial and 0.45 mm to 6.0 mm in the z-direction. In addition, the number of slices in z ranges from 42 to 1026.

4.1.2 Chaos dataset

The datasets were collected from the PACS of DEU Hospital. It contains CT images of40different patients, where the orientation and alignment of all patients are the same. These patients are potential liver donors, who have a healthy liver. In other words, the liver has no tumors, lesions, or any other diseases.

Each data set contains 512×512 resolution DICOM images. The x-y distance within the range of 0.7-0.8 mm and having 3 to 3.2 mm inter-slice distance (ISD) (i.e., smaller ISD cannot be used for these acquisitions due to routine clinical procedures).

This corresponds to an average of 90 slices per data set (i.e., minimum 77, maximum 105 slices). In total, there are 2874 slices for training, and 3533 slices for private testing.

These CT images were taken from the upper abdomen of a patient after contrast agent injection.

During the injection period, the hepatic parenchyma of the liver and liver vein are also enhanced and maximized oversupplying the blood by the portal vein. Portal veins are well increased, also helping some enhancement of the hepatic veins to be seen during this phase. Consequently, this phase is broadly applied for the liver and vascular segmentation before surgery.

Although this dataset only contains a healthy liver, the enhanced vascular structures due to the contrast injection make the adjacent organ has a similar Hounsfield value range, which are the challenge of this task as Figure 2. In addition to the same with the LiTS dataset, this dataset includes irregular liver shapes (i.e., various size or orientation) and has the substantial shape disparities of anatomical structures among patients.

4.2 Model training and implementation details The dataset used contains an imbalance issue that (1)

(7)

needs to be addressed. With these CT datasets, HU value clipping and data albumentation was used.

With CT scans, the most common procedure is HU-value clipping to calculate some values relative to the liver. It helps to concentrate on the important aspect of an image for the segmentation task.

Another technique is also used to simplify the network learning task, which is data normalization.

Therefore, with an input CT image , we denote function , which means Hounsfield, to remove the unrelated organs and tissues. In this case, the image was normalized to a range of 0.0 to 1.0 using Hounsfield units under float64 type.

We recognize that the heterogeneity in liver contrast is extensive among slices. As shown in Figure 1 and Figure 2, the contrast, brightness, size, and shape of the liver are different a lot among images. Therefore, the next step is standardization.

Given an input , we define the transform function with some parameters α, β, γ includes basic linear transform and grammar correction. The threshold (selected through experimentation) was used to balance grammar and contrast among volumes. This normalization stage is useful for alleviating the effect of contrast and illumination condition while scanning the histological slides via function. The processing stage is performed as (2).

(2)

Where  is the mean of the image matrix as threshold to apply for all images in dataset. The value of α and β are used to scale input. The value of γ is to adjust contrast, brightness illumination of the image. is the pixel value for each point. All of them are set in our experiments empirically.

Data albumentation [31] is the library of providing more data samples from the available data through transformation methods that preserve or alter the properties of data. It is robust than data augmentation. Like data albumentation, data augmentation has widespread use in a deep learningnetwork since the model required a large amount of data to serving for training. In here, we considered three albumentation techniques. The first was transformation operation through rotation of the image in limiting 15 degrees, in addition to flip/

mirror the image along vertical or horizontal axes.

The second technique applied a component analysis gramma. Data albumentation was done in real-time on the fly during training. No testing time augmentation was applied.

In the proposed model, we have used Python 3 programming language to conduct the experiments.

The network models have been implemented using Keras with Tensorflow backend. We have minimized the dice loss and the models have been trained for 50 epochs using Adam optimizer with learning rates 0.001.

4.3 Liver segmentation results

The goal of this experiment is to demonstrate the ability of automated to identify the liver region by using multi-filter sizes on U-net networks. The experimental results are carried out in a sequence of effectively steps as follows:

 We evaluate the performance of deep learning models based on their ability to discriminate against the liver region.

 We assess the effort of multi-feature sizes.

The evaluation for the liver segmentation method is done by comparing the predicted results with the ground-truth image, which is provided by a radiologist. The common evaluation measurements are discussed: Dice score, Jaccard index, RVD.

Given A as the truee image and B as the predicted image, we perform the term of the above measurements as (3). Dice Score [32] is used to quantify the performance of the image segmentation method. It calculates the relevance between truth image and predicted image. The term dice score is described as (3).

(3) Jaccard index, which is also known as IOU, is the Fig. 6. There are some sample figures which

were predicted by MFUnet

(8)

same as the Dice score. However, there are some differences. The IOU tends to penalize single instances of bad classification more than the Dice score quantitatively even when they can both agree that this one instance is bad.

We discuss and analyze the computed tomography CT scan dataset to evaluate the performance of the proposed architecture. There is some predicted image, which is shown in Figure 6, by MFUnet. The red line indicated for predicting by MFUnet. The green line showed the ground truth. And the yellow line is the overlap line. The overall accuracy is reported in Table 1 and Table 2. The accuracy ranged from 94.0% to 96% for the Dice score. The MFUnet model achieved the higher Dice score (range 95.5% - 96.0%) than FPN and U-net Baseline. That means the MFUnet can segment the region of interest a little better. Additionally, the Jaccard index of FPN is 86.7%, which is quite lower than U-net and MFUnet with 90.8% and 90.4%.

Therefore, Multiple Filter has a better performance when combining with U-net than the feature pyramid in FPN architecture.

Comparing with state of the art methods, Li and Yuan methods reached 96.5 and 96.5 dice score, respectively. While our model was trained end to end to get final result, Yuan et al. [30] proposed two models for liver segmentation. A first simple CDNN model is used for the coarse liver segmentation of the entire CT volume. Then the another CDNN is trained to the liver region for fine liver segmentation.

The 2D DenseUNet [28] model was still not trained end-to-end from start to finish. They used Resnet to get the coarse liver segmentation in the first stage.

Then they trained 2D dense U-net and 3D dense Unet with the fine-tuning technique for getting the result. The H-denseUNet is a complex network.

Table 2 shows the number of layers of our proposed method MFUnet and H-denseUNet. The number of layers of Hdense Unet is 199, which is more than 2 times the MFUnet with 87 layers. The MFUnet is efficient to train than H-denseUNet for getting the competitive result.

Additionally, we also show our experiments on the Chaos dataset in Table 3. We consider the effort of multiple filter sizes by comparing the performance of U-net and MFUnet under training from the scratch method. The dice score via U-net is 74.75%, while the dice score via MFUnet is 93.87% under the same environment. Therefore, in this case, the effect is most evident. In the next step, when changing the batch size from 16 to 8, the dice score increased from 93.87% to 94.92%. We recognized that the batch size of 8 is appropriate for our model. The best result is 94.92% dice score in this experiment on this dataset. It shows that MFUnet helps to improve the performance of U-net architecture.

For instance, in Figure 7, the U-net and MFUnet model segment a liver with a provided boundary.

Figure 7a, 7b shows the raw images with ground truth from the LiTS dataset. Figure 7c, 7d shows the output from the original U-net and MFUnet respectively. In some complicated cases, while the U-net model is either over-segmented or under-segmented in Figure 7e, our proposed MFUnet performed considerably better in Figure 7f.

Our proposed model performs better, although only slightly. We suppose that multiple filter block allows MFUnet to perform better pixel classification.

Table 1. The comparison of the proposed method

(MFUnet) with other state-of-the-art architectures on LiTS dataset

Model Dice Jaccard

U-net [2] 95.2 90.8

FPN [3] 90.9 86.7

MFUnet 96.0 90.4

Li et al. [28] 96.5 92.6 Yuan et al. [30] 96.7 92.9

Table 2. The number of layers

Method Number of Layers

U-net [2] 39

Proposed - MFUnet 87 H-denseUNet [28] 199

Table 3. Ablation study on different proposed method on the Chaos dataset

Method Size image

Batch size

Dice Score U-net [2] 512x512 16 74.75 Proposed - MFUnet 512x512 16 93.87 Proposed - MFUnet 512x512 8 94.92

(9)

a) Raw

b) Groundtruth

c) Output from U-net

d) Output from MFUnet

e) Comparison (Groundtruth

& U-net)

f) Comparison (Groundtruth

& MFUnet)

Fig. 7. Segmenting a liver. Both models seem to segment the liver close to the ground truth. The red line indicates for ground-truth. The green line indicates for output of U-net or MFUnet. And

the yellow color shows overlapping

(10)

V. DISCUSSIONS AND CONCLUSION

In this research, we considered multiple filter mechanism and examined how to incorporate this idea into segmentation to exploit the local features in the CT abdominal image. We empirically observed and found that multi-filter blocks help network learning more local features than the standard network. Our model is the end to end model from start to finish. It is not only efficient to train but also competitive in the result with other methods. As demonstrated in the segmentation framework, we noted that training was slightly more complex than the standard U-net architecture. Specifically, it generates 4.832.713 parameters more than the original structure with 485.673 parameters.

In this regard, future research will focus more and more on exploiting tumors, and the performance of MFUnet can be further enhanced by utilizing fine resolution input batches without any additional heuristics.

REFERENCES

[1] E. Shelhamer, J. Long, and T.

Darrell, “Fully Convolutional Networks for Semantic Segmentation,”

IEEE Trans. Pattern Anal. Mach. Intell.

, vol. 39, no. 4, pp. 640–651, 2017

[2] O. Ronneberger, P. Fischer, and T.

Brox, “U-net: Convolutional networks for biomedical image segmentation,”

Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect.

Notes Bioinformatics)

, vol. 9351, pp.

234–241, 2015

[3] T. Y. Lin, P. Dollár, R. Girshick, K.

He, B. Hariharan, and S. Belongie,

“Feature pyramid networks for object detection,”

Proc. - 30th IEEE Conf.

Comput. Vis. Pattern Recognition, CVPR 2017

, vol. 2017-Janua, pp. 936–944, 2017

[4] S. J. Kim and H. S. Kim,

“Multi-Tasking U-net 기반 파프리카 병해충 진단.”

Smart Media J.

, vol. 9, no.

1, pp. 16–22, 2020.

[5] M. Längkvist, L. Karlsson, and A.

Loutfi, “Inception-v4,

Inception-ResNet and the Impact of Residual Connections on Learning,”

Pattern Recognit. Lett.

, vol. 42, no. 1, pp.

11–24, 2014

[6] M. Scully, V. Magnotta, C.

Gasparovic, P. Pelligrimo, D. Feis, and H.

J. Bockholt, “3D Segmentation In The Clinic: A Grand Challenge II at MICCAI 2008 - MS Lesion Segmentation,”

Miccai

, pp. 1–6, 2008

[7] X. Zhuang, “Challenges and methodologies of fully automatic whole heart segmentation: A review,”

J. Healthc. Eng.

, vol. 4, no. 3, pp. 371–407, 2013

[8] G. Litjens

et al.

, “Evaluation of prostate segmentation algorithms for MRI: the PROMISE12,”

Med Image Anal

, vol. 18, no. 2, pp. 359–373, 2015 [9] O. A. J. Del Toro

et al.

,

“VISCERAL -VISual concept extraction challenge in RAdioLogy: ISBI 2014 challenge organization,”

CEUR Workshop Proc.

, vol. 1194, pp. 6–15, 2014

[10] A. Basher, S. Ahmed, and H. Y.

Jung, “One Step Measurements of hippocampal Pure Volumes from MRI Data Using an Ensemble Model of 3-D Convolutional Neural Network,”

Smart Media J.

, vol. 9, no. 2, pp. 22–32, 2020 [11] N. C. F. Codella

et al.

, “Skin lesion

analysis toward melanoma detection: A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC),”

Proc. - Int.

Symp. Biomed. Imaging

, vol. 2018-April, pp. 168–172, 2018

[12] B. van Ginneken, T. Heimann, and M. Styner, “3D segmentation in the clinic: A grand challeng,”

Int. Conf. Med.

Image Comput. Comput. Assist. Interv.

, vol. 10, pp. 7–15, 2007

[13] Y. Boykov

et al.

, “Liver Segmentation in CT Data : A Segmentation Refinement Approach,”

Methods

, vol. 70, no. c, pp. 109–131, 2007

(11)

[14] Y. Boykov and G. Funka-Lea,

“Graph cuts and efficient N-D image segmentation,”

Int. J. Comput. Vis.

, vol.

70, no. 2, pp. 109–131, 2006

[15] W. A. Barrett and E. N. Mortensen,

“Interactive live-wire boundary extraction,”

Med. Image Anal.

, vol. 1, no. 4, pp. 331–341, 1997

[16] A. Beck and V. Aurich, “Hepatux - a semiautomatic liver segmentation system,”

3D Segmentation Clin. A Gd.

Chall.

, pp. 225–233, 2007

[17] B. M. Dawant, R. Li, B. Lennon, and S. Li, “Semi-automatic segmentation of the liver and its evaluation on the MICCAI 2007 grand challenge data set,”

3D Segmentation Clin. A Gd.

Chall.

, pp. 215–221, 2007

[18] A. Wimmer, G. Soza, and J.

Hornegger, “Two-stage

semi-automatic organ segmentation framework using radial basis functions and level sets,”

3D segmentation Clin. a Gd. challenge, LNCS, Springer Berlin, Heidelb.

, pp. 207–214, 2007

[19] P. Slagmolen, A. Elen, D. Seghers, D. Loeckx, F. Maes, and K. Haustermans,

“Atlas based liver segmentation using nonrigid registration with a B-spline transformation model,”

Proc. MICCAI Work. 3D segmentation Clin. a Gd. Chall.

, pp. 197–206, 2007

[20] H. Lamecker, T. Lange, and M.

Seebass, “Segmentation of the liver using a 3D statistical shape model,”

Konrad-Zuse-Zentrum f

ü

r Informationstechnik Berlin

, vol. 09, no.

April, pp. 27, 2004

[21] D. Kainmueller, T. Lange, and H.

Lamecker, “Shape constrained automatic segmentation of the liver based on a heuristic intensity model,”

Proc. MICCAI Work. 3D Segmentation Clin. A Gd. Chall.

, pp. 109–116, 2007 [22] R. Susomboon, “A hybrid

approach for liver segmentation,”

… Segmentation Clin. …

, vol. i, pp.

151–160, 2007

[23] L. Rusko and G. Bekes, “Fully automatic liver segmentation for contrast-enhanced CT images,”

Int.

Conf. Med. Image Comput. Comput.

Interv. Segmentation Clin. a Gd.

challenge, 2007

, pp. 143–150, 2007 [24] S. J. Lim, Y. Y. Jeong, and Y. S. Ho,

“Automatic liver segmentation for volume measurement in CT Images,”

J. Vis. Commun. Image Represent.

, vol. 17, no. 4, pp. 860–875, 2006

[25] A. B. B, I. Diamant, E. Klang, and M.

Amitai, “Fully Convolutional Network for Liver,”

Deep Learn. Data Labeling Med. Appl.

, vol. 1, pp. 77–85, 2016 [26] H. T. Tran, A. R. Oh, I. S. Na, and S.

H. Kim, “Liver Segmentation and 3D Modeling from Abdominal CT Images,”

Smart Media J.

, vol. 5, no. 1, pp. 49–54, 2016

[27] P. Bilic

et al.

, “The Liver Tumor Segmentation Benchmark (LiTS),” pp.

1–43, 2019

[28] X. Li, H. Chen, X. Qi, Q. Dou, C. W.

Fu, and P. A. Heng, “H-DenseUNet:

Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes,”

IEEE Trans. Med. Imaging

, vol. 37, no. 12, pp. 2663–2674, 2018 [29] S. Chen, K. Ma, and Y. Zheng,

“Med3D: Transfer Learning for 3D Medical Image Analysis,” pp. 1–12, 2019

[30] Y. Yuan, “Hierarchical Convolutional-Deconvolutional Neural Networks for Automatic Liver and Tumor Segmentation,” vol. i, pp. 3–6, 2017

[31] A. Buslaev, A. Parinov, E.

Khvedchenya, V. I. Iglovikov, and A. A.

Kalinin, “Albumentations: fast and flexible image augmentations,” 2018 [32] C. H. Sudre, W. Li, T. Vercauteren,

S. Ourselin, and M. Jorge Cardoso,

“Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,”

Lect.

Notes Comput. Sci. (including Subser.

Lect. Notes Artif. Intell. Lect. Notes Bioinformatics)

, vol. 10553 LNCS, pp.

240–248, 2017

(12)

Authors

Vo Thi Tuong Vi

She received the degree of Bachelor from Faculty of Commerce and Tourism, Industrial University of Ho Chi Minh University (IUH), Viet Nam in 2017 and the M.Eng. degree from the Department of Electronics and Computer Engineering, Chonnam National University in 2020. She is currently studying Ph.D program at the Department of AI Convergene, Chonnam National University, South Korea. Her research interests include medical image understanding and deep learning.

A-Ran Oh

She received her B.S degree in school of Computer Statistics from Chosun University, Korea in 2009. She is currently a research engineer at the Department of AI Convergence, Chonnam National University, Korea.

Guee-Sang Lee

He received the B.S. degree in Electrical Engineering and the M.S. degree in Computer Engineering from Seoul National University, Korea in 1980 and 1982, respectively. He received the Ph.D degree in Computer Science from Pennsylvania State University in 1991. He is currently a Professor of the Department of Electronics and Computer Engineering in Chonnam National University, Korea. His research interests are mainly in the field of image processing computer vision and video technology.

Hyung-Jeong Yang

She received her B.S, M.S and Ph.D. from Chonbuk National University, Korea. She is currently a associate professor at the Department of Electronics and Computer Engineering, Chonnam National University, Gwangju, Korea.

Her main research interests include multimedia data mining, pattern recognition, artificial intelligence, e-Learning and e-Disign.

Soo-Hyung Kim

He received his B.S. degree in Computer Engineering from Seoul National University in 1986, and the M.S. and Ph.D.

degrees in Computer Science from the Korea Advanced Institute of Science and Technology in 1988 and 1993, respectively. Since 1997, he has been a Professor of the Department of AI Convergence, Chonnam National University, South Korea. His research interests include pattern recognition, document image processing, medical image understanding, and deep learning.