• 검색 결과가 없습니다.

multi-scale highlighting foregrounds White matter hyperintensities segmentation using the ensemble U-Net with NeuroImage

N/A
N/A
Protected

Academic year: 2024

Share "multi-scale highlighting foregrounds White matter hyperintensities segmentation using the ensemble U-Net with NeuroImage"

Copied!
10
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

ContentslistsavailableatScienceDirect

NeuroImage

journalhomepage:www.elsevier.com/locate/neuroimage

White matter hyperintensities segmentation using the ensemble U-Net with multi-scale highlighting foregrounds

Gilsoon Park

a

, Jinwoo Hong

a

, Ben A. Duffy

b

, Jong-Min Lee

a,

, Hosung Kim

b

aDepartment of Biomedical Engineering, Hanyang University, Seoul, South Korea

bUSC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA 90033, USA

a r t i c le i n f o

Keywords:

White matter hyperintensities Segmentation

Deep learning U-Net

Multi-scale highlighting foregrounds

a b s t r a ct

Whitematterhyperintensities(WMHs)areabnormalsignalswithinthewhitematterregiononthehumanbrain MRIandhavebeenassociatedwithagingprocesses,cognitivedecline,anddementia.Inthecurrentstudy,wepro- posedaU-Netwithmulti-scalehighlightingforegrounds(HF)forWMHssegmentation.Ourmethod,U-Netwith HF,isdesignedtoimprovethedetectionoftheWMHvoxelswithpartialvolumeeffects.Weevaluatedthesegmen- tationperformanceoftheproposedapproachusingtheChallengetrainingdataset.Thenweassessedtheclinical utilityoftheWMHvolumesthatwereautomaticallycomputedusingourmethodandtheAlzheimer’sDisease NeuroimagingInitiativedatabase.WedemonstratedthattheU-NetwithHFsignificantlyimprovedthedetection oftheWMHvoxelsattheboundaryoftheWMHsorinsmallWMHclustersquantitativelyandqualitatively.Upto date,theproposedmethodhasachievedthebestoverallevaluationscores,thehighestdicesimilarityindex,and thebestF1-scoreamong39methodssubmittedontheWMHSegmentationChallengethatwasinitiallyhosted byMICCAI2017andiscontinuouslyacceptingnewchallengers.Theevaluationoftheclinicalutilityshowed thattheWMHvolumethatwasautomaticallycomputedusingU-NetwithHFwassignificantlyassociatedwith cognitiveperformanceandimprovestheclassificationbetweencognitivenormalandAlzheimer’sdiseasesubjects andbetweenpatientswithmildcognitiveimpairmentandthosewithAlzheimer’sdisease.Theimplementation ofourproposedmethodispubliclyavailableusingDockerhub(https://hub.docker.com/r/wmhchallenge/pgs).

1. Introduction

Whitematterhyperintensities(WMHs)appearasabnormal hyper- signalswithinthewhitematterregiononthehumanbrainMRIinclud- ingT2-weighted (T2w),protondensity(PD),andfluid-attenuatedin- versionrecovery(FLAIR)imaging.Theseatypicalsignalsmostlyresult fromagingprocesses suchasdemyelinationandaxonalloss,bothas aresultofcerebralsmallvesseldiseases(PrinsandScheltens,2015).

Theyarefrequentlyobservedintheelderlyandtendtoincreaseinsize andnumberwithage(Habesetal.,2016),whileatthesametimebe- ingassociatedwithseveralpotentialvascularriskfactors,particularly hypertension(Abrahametal.,2016).

Based on quantitative analyses of WMHs (Barber et al., 1999; Duboiset al., 2014; Habes etal., 2016; Leeet al., 2016; Prinsand Scheltens,2015),previousstudiesshowedthatthepresenceandsever- ityofWMHs areassociatedwithdementia(Duboisetal.,2014)and increaserisk ofconditions suchasAlzheimer’sdisease(Habeset al., 2016;Leeetal., 2016),vasculardementia,anddementiawithLewy bodies(Barberetal.,1999).Thesestudiessuggestthatthequantitative

Correspondingauthor.

E-mailaddress:[email protected](J.-M.Lee).

characterizationofWMHs playsanimportantroleinvariousclinical researchintoneurologicaldisorders.

ManualdelineationofWMHsprovidesground-truthforvolumetric quantificationofWMHs.However,itisalaborious,tedious,andtime- consuming taskandrequiresahighlevelofexpertise toavoidunac- ceptablelevelsofintra-andinter-ratervariability.Besides,thisbecomes moreproblematicwiththesizeofadataset,encouragingautomatedseg- mentation.

Anumberof methodshavebeenproposedtosegmentWMHs au- tomatically.Jeonetal.(2011)attemptedWMHssegmentationbased on theMarkov random fieldand anintensity thresholding method.

Otherstudieshavedevelopedk-nearestneighbors-basedclusteringap- proaches (Griffanti etal., 2016; Jianget al., 2018;Steenwijk etal., 2013).Thesemethodsusedvariousfeatures(e.g.,spatialinformation, intensityinformation,andtexture)fromT1wandFLAIRimagesasthe inputoftheclusteringalgorithm.Dadaretal.(2017)evaluatedvari- ousclassifierssuchaslogisticregression,supportvectormachines,de- cisiontrees,andrandomforests.Theyobservedthattherandomfor- estachievedthebestperformance.Recently,convolutionalneuralnet- works(CNN), aclassof deepneuralnetworks,have rapidlybecome

https://doi.org/10.1016/j.neuroimage.2021.118140.

Received15May2020;Receivedinrevisedform13April2021;Accepted16April2021 Availableonline3May2021.

1053-8119/© 2021TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense

(2)

aprimarymethodinmedicalimagesegmentationandshownremark- ableperformance(Litjensetal.,2017).ForthesegmentationofWMHs, Moeskopsetal. (2018)proposed aCNNmodelbasedon multi-scale patchesextractedfromT1w,T1w inversion-recovery,andFLAIRim- ages.Rachmadietal.(2018)addedglobalspatialinformationtothe patchthatwasusedasaninputtoaCNNtoimprovethesegmentation ofWMHs.

Theuseofdifferentevaluationmetrics(e.g.,Dice,Jaccard,Haus- dorff indices),as wellas evaluations againstmanually WMHsdelin- eationsbydifferentexperts,makesitdifficulttocomparetheperfor- manceofsegmentationmethodsfromvariousstudiessystematically.To addresstheseissues,theWMHSegmentationChallenge2017washeld forastandardizedcomparisonoftheautomaticsegmentationofWMHs (Kuijfetal.,2019)inconjunctionwiththe20thInternationalConfer- enceonMedicalImageComputingandComputerAssistedIntervention (MICCAI)(Descoteauxetal.,2017).TheChallengeprovidedapublic platformtostandardizetheevaluationofWMHssegmentationmethods basedonaunifieddatasetofMRI,evaluationmetrics,andexpertlabel- ing.Twentyteamsproposednewmethodsandperformedtrainingand testingonthedatasetprovided.Byusinganensembleapproach,which isagoodwaytoreduceover-fittingofdeeplearningalgorithms,withthe U-Netwhichisadeepencoder-decoderarchitectureandhasskipcon- nectionsconcatenatingthefeaturemapsintheencodertothefeature mapsinthedecoder,Lietal.(2018)achievedthebestperformance.

Partialvolumeeffects(PVE)onMRIimagesduetolimitedspatial resolutionisassociatedwiththeproblemwhereonepixel/voxelrepre- sentsasignalofamixtureofdifferentbraintissues.Thiseffectbecomes strongatvoxelsoftheboundarybetweenstructureshavingdifferenttis- suecharacteristicsorbetweenalesionandthesurroundingbraintissue.

UnidentificationofthevoxelwithPVEmayresultinunderestimating thesegmentationofthetargetlesion,inparticularforsmalllesionsand lesionsofboundaries.

Suchaproblemmaynotbeinevitablefordeepneuralnetworkap- proaches.Indeed,wetestedastandardU-NetmethodonWMHsegmen- tationandobservedtheunsuccessfulclassificationofvoxelsplacedat boththeedgesofWMHsorsmallWMHs.Thepredictedprobabilityof foregroundislowfortheWMHvoxelswithstrongPVEduetotheirfea- turesuncertainty,andthenetworkstendtofailtoidentifythosepartial volumeWMHs(PV-WMHs).Ingeneral,thevolumeofthePV-WMHsis relativelysmallcomparedtothevolumeoftheoverallWMHs.Thusthe PV-WMHscontributelittletothenetworkloss,resultinginthenetwork insufficientlylearnthePV-WMHs.

Weproposethenetworktobetrainedusingamulti-scaleapproach ofhighlightingforegrounds(HF).InthestandardU-Net,thetrainingis accomplishedbyminimizingtheDicelossattheoutputlayer,whichis computedbycomparingtheposteriorprobabilitymapwiththeground truthlabel.ToemphasizethelabelvoxelsonWMHboundaries,which wouldlikelylieonPV-WMHvoxels,weproposetocomputeandmin- imizetheDicelossesattheparticularintermediatedecoderlayersby comparingtheiroutputprobabilitymapswiththecorrespondinglabels generatedusingtheproposedHFmethod.TheHFapproachdownsam- plesthegroundtruthlabelssequentiallybyapplying2×2max-pooling withstride2,resizingthelabelstothesizeofeachofthedecoderlayers.

Theapproachemphasizestheinfluenceofthevoxelslyingonthelesion boundariesorconsistingofsmalllesionsonthetrainingofanetwork.

WeusedthelabelsgeneratedusingtheproposedHFmethodtotrain auxiliaryclassifiersintheintermediatedecoderlayers.Trainingbyin- sertingauxiliaryclassifiersintheintermediatelayersisknownasdeep supervision.Innaturalimageclassification,GoogLeNet(Szegedyetal., 2015) is based on this method. However, this study did not sys- tematically evaluate the use of auxiliary classifiers. Independently, Leeetal.(2015)proposeddeeply-supervisednetworksthatwerecom- binedwithauxiliaryclassifiersatallintermediatelayersforimageclas- sification.They showed thatthe deepsupervision method improved theconvergencerateofnetworks byalleviatingthevanishingorex- plodinggradientproblem.Italsoimprovedthediscriminativeability

ofthefeatureslearnedbydirectlydrivingthelow-andmid-levelfea- turesinintermediatelayerstoveryhigh-levelfeatures(i.e.,targetout- put).Wangetal.(2015)appliedthedeepsupervisionmethodtodeeper convolutionalnetworks.Theyproposedtoaddauxiliaryclassifiersaf- tercertainintermediatelayersforbetterclassificationperformancein thedeeperconvolutionalnetworks.Chenetal.(2016a)usedthedeep supervisionframeworkforneuronalstructuresegmentationonelectron microscopyimages.Theirmethodaddeddeconvolutionallayerstothe intermediateencoderlayerstotrainauxiliaryclassifiersandfusedthe outputs.Severalvariantsofthedeeply-supervisednetworkswerefurther introducedtosegmentbraintissue(Chenetal.,2018),liver,wholeheart andgreatvessel(Douetal.,2017),andretinalvessels(Linetal.,2018).

Zhuetal.(2017)proposedtouseaU-Netwithdeepsupervisionfor prostatesegmentationinMRimages.Thesepreviousworks(Chenetal., 2018;Chenetal.,2016a;Douetal.,2017;Linetal.,2018;Zhuetal., 2017)reportedthatdeepsupervisioncouldimprovesegmentationac- curacy.

Though theproposed method is similar tothese previous works (Chenetal.,2018;Chenetal.,2016a;Douetal.,2017;Linetal.,2018; Zhuetal.,2017),ourapproachhasseveraldifferencesintheprocessing.

Wegeneratelabelimagesatdifferentresolutionsfromthegroundtruth labelingwhileemphasizingtheforegroundvoxelsusingtheproposed highlightforeground(HF)method.Thegeneratedmulti-scalelabelim- ageisusedforthenetworktolearnthefeaturesfocusingonthefore- groundareaatdifferentresolutions.Ontheotherhand,theseprevious worksusedthegroundtruthlabelswithoutmodificationattheoriginal imageresolution.Duringtraining,theyupsampledfeaturemapstothe originalimageresolutioninordertogeneratetheoutput.Instead,the proposedmethodcomputesthelossfunctionswithouttheneedforthe upsamplingofnetworkoutputs,mitigatingGPUmemoryusage.

TheMICCAI WMHSegmentationChallenge continuestohostfur- therstudiessinceitsinitialopening.Ofthe39methodssubmittedthe challenge asofMarch1st,2020,ourteamthatproposedtheensem- bleU-Netwithmulti-scaleHFcurrentlyachievesthebestperformance.

Inthefollowingsections,weoutlinethemethodsweusedtoachieve state-of-the-artperformanceonthistask.WefirstoutlinetheU-Netwith HFsmodel.Then,weshowthattheproposedmethodsignificantlyim- provedtheWMHssegmentationperformancecomparedtothestandard U-Net.Intheresultssection,weevaluatetheproposedmethodincom- parison toothermethodsandassessthe influenceofthe multi-scale HFsandtheeffectaccordingtotheirorganization.Finally,using the Alzheimer’sDiseaseNeuroimagingInitiative(ADNI)dataset,weinves- tigatethepotentialclinicalutilityoftheHFmethodbyautomatically segmenting WMHsandassociatingtheWMH volumeswithcognitive performancescores thatareusedforthediagnosis ofmild cognitive impairmentandAlzheimer’sDisease(i.e.,Mini-MentalStateExamina- tion(MMSE),Alzheimer’sDiseaseAssessmentScale-CognitiveSubscale (ADAS-Cog),andClinicalDementiaRatingScalesum(CDRsum)).

2. Materialsandmethods

2.1. Challengedatasetandpre-processing

Wevalidatedourmethodbasedonthedatasetandevaluationframe- workin WMHSegmentation Challenge2017becausethisprovidesa standardized assessment of thesegmentation performance of WMHs (Kuijf et al., 2019).The challenge organizationprovided a training datasetandatestdatasetconsistingof170subjectsintotal.Detailsof thedatasetsaregiveninTable1.Thetrainingdatasetincluded60sub- jectsandwaspubliclyavailableanddownloadableafterregistrationat https://wmh.isi.uu.nl/data/.Weusedthisdatasettotrainournetwork andinvestigatetheeffectoftheHFmethod.Thetestdatasetincluding therestofthe110subjectswasonlyavailableforevaluatingpredictions whensubmittedtothechallenge.

Each subjectincludedthebrainMR imagesbefore andafterpre- processing for T1w andFLAIR images anda manual delineation of

(3)

Table1

TheoverviewofthecharacteristicsoftheWMHSegmentationChallenge2017dataset.

Institute Scanner T1 voxel size(TR/TE(/TI)) FLAIR voxel size(TR/TE/TI) Train Test

UMC Utrecht 3T Philips Achieva 1 . 00 × 1 . 00 × 1 . 00 m m 3(7.9/4.5 ms) 0 . 96 × 0 . 95 × 3 . 00 m m 3(11,000/125/2800 ms) 20 30 NUHS Singapore 3T Siemens TrioTim 1 . 00 × 1 . 00 × 1 . 00 m m 3(2300/1.9/900 ms) 1 . 00 × 1 . 00 × 3 . 00 m m 3(9000/82/2500 ms) 20 30 VU Amsterdam 3T GE Signa HDxt 0 . 94 × 0 . 94 × 1 . 00 m m 3(7.8/3.0 ms) 0 . 98 × 0 . 98 × 1 . 20 m m 3(8000/126/2340 ms) 20 30 1.5T GE Signa HDxt 0 . 98 × 0 . 98 × 1 . 50 m m 3(12.3/5.2 ms) 1 . 21 × 1 . 21 × 1 . 30 m m 3(6500/117/1987 ms) 0 10 3T Philips Ungenuity 0 . 87 × 0 . 87 × 1 . 00 m m 3(9.9/4.6 ms) 1 . 04 × 1 . 04 × 0 . 56 m m 3(4800/279/1650 ms) 0 10

Abbreviations:TR:repetitiontime;TE:echotime;TI:inversiontime.

WMHs.TheimageswereacquiredfromfivedifferentMRscannersin threedifferent institutes.Theimagesacquiredinthethree MRscan- ners(i.e.,3TPhilipsAchieva,3TSiemensTrioTim,and3TGESigna HDxt)wereusedforbothtrainingandtesting.Theimagesacquiredin theothertwoMR scanners(i.e.,1.5TGESignaHDxtand3TPhilips Ingenuity)wereusedonlyfortesting.Allthe3DFLAIRimagesfrom VUAmsterdaminstitutewereresampledintotheaxialdirectionwith 3mmslicethickness.TheWMHswerelabeledontheFLAIRimagesby twoexperts,basedonStandardsforReportIngVascularchangesonnEu- roimaging(STRIVE)criteria(Wardlawetal.,2013).Theorganizerspro- videdthepre-processeddatalike1)T1wimagesthatwereregisteredto theFLAIRimagesusingtheElastixtoolbox(Kleinetal.,2009);2)the T1wandFLAIRimagesthatunderwentcorrectionfortheintensitynon- uniformityusingSPM12(AshburnerandFriston,2000).

We further pre-processed these data for training or testing our method.First,toreducefalsepositives,weremovednon-braintissue, using ROBEX(Iglesiasetal., 2011).Second, we performedintensity normalizationtomatchtheintensitydistributionamongthetraining data.Foreachimage,wecalculatedmeansandvariancesusingintensi- tiesrangingfrom2ndto98thpercentilesinthebrainregion.Wethen normalizedtheintensitieswithinthebrainineachimageusingz-score transformation.Finally,toequalizethesizeoftheinputdatatothenet- work,theaxialslicesineach3Dimagewerecroppedorpaddedtoa sizeof200×200.Weused2Dslicestotrainour2DCNNmodel.

2.2. Networkarchitecture

Inthecurrentstudy,weproposetocombinemulti-scaleHFswith aU-Netarchitecture(Ronnebergeretal.,2015).Themainideaofthe U-Netistheskipconnectionsbetweentheencoderanddecodertoal- lowthenetworktoreusethefeaturemapsintheencoder.Thisgener- allyhelpsthenetworktopredictdensesegmentationresultsandalle- viatesthevanishinggradientproblem.VariantsoftheU-Nethavebeen usedindiversemedicalimagesegmentationproblemsandhavedemon- stratedoutstanding performance(Çiçeket al.,2016; Drozdzalet al., 2018;Guerreroetal.,2018).IntheoriginalChallengein 2017,Four amongthetopteams,includingthetopteam(Lietal.,2018),exploited theU-Netarchitecture(Kuijfetal.,2019).Here,weadvancea2DU- NetforWMHssegmentation(Fig.1)withtwonovelcomponents:useof differentdetailsofthenetworkarchitectureandinclusionofthemulti- scaleHFs.

2.2.1. Detailsofthenetworkconfiguration

Asseenin Fig.1, themodified networkis basedontheencoder- decoderstructure.Intheencoderduringbothtrainingandtesting,the axialslicesofFLAIRandT1wmodalitiesarefedintothenetworkas atwo-channelinput(i.e.,theinputsizeis200×200×2).Wechose theaxialsliceforthe2Dinputdatabecausethebestimageresolution wasfoundontheFLAIRaxialslice.Inournetwork,weadoptthefol- lowingconfigurationsassuggestedinrecentworks:1)Insteadof3×3 kernelconvolutionsinthestandardU-Net,weuse5×5kernelconvo- lutionsinthefirsttwolayersforhandlingdifferenttransformationsas inLietal.(2018);2)Batchnormalizationisaddedtothe18convolu- tionallayerseach,whichacceleratesthetrainingprocessandimprove thenetworkperformancebyreducinginternalcovariateshift(Ioffeand

Szegedy, 2015).3) Finally,we use anexponential linearunit (ELU) (Clevertetal.,2015)astheactivationfunctionfornon-linearitycapac- ityinsteadofarectifiedlinearunit(ReLU)inthestandardU-Net.ReLU isneitheractivatednorupdatedatanegativevalue,whileELUdoesnot onlyhaveallthestrengthsofReLUbutalsoisactivatedandupdatedat negativevalues,improvingthelearningcharacteristics(Clevertetal., 2015).Theencodercontainsfour2×2max-poolinglayerswithastride 2aftereverytwoconvolutionlayersfordownsampling(Fig.1).Upsam- plinglayersbasedonnearest-neighborinterpolationareappliedafter everytwoconvolutionallayersinthedecoder(Fig.1).Priortodown- sampling,thefeaturemapsintheencoderareconcatenatedtothefea- ture mapsrightafterupsampling in thedecoder.At theoutputcon- volutionallayers,a1×1convolutionwithsoftmaxfunctionisusedto convertthefeaturemapsintothelabelspacewiththedepthoftwo(i.e., twoclasses;WMHsandnon-WMHs).

2.2.2. Multi-scalehighlightingforegrounds

Weaddmodifiedlabelimagesbythemulti-scaleHFapproachtothe intermediatelayersinthedecoderinournetwork(Fig.1).Theinter- mediateoutputconvolutionallayersconvertthefeaturemapsintothe multi-scalesegmentationprobabilitymaps(blackarrowsinFig.1).The multi-scaleHFsmax-poolthelabelimage(Fig.2).Giventheforeground pixelsinonelabelimage,thelabelimagemax-pooledbyHFisdefined asfollow:

𝑙𝐻𝐹𝑘 =𝑓𝑀𝑃( 𝑙𝐻𝐹𝑘−1)

(𝑘=1,2,,𝑛), (1)

Where𝑓𝑀𝑃 isa2 × 2max-poolingoperatorwithstride2and𝑙𝑘𝐻𝐹 is thelabelimagemax-pooled,whichgeneratedbyapplying𝑓𝑀𝑃 𝑘times and𝑙0𝐻𝐹istheoriginalforegroundimage(i.e.,groundtruthlabel).The backgroundimage,𝑙𝐵𝐺𝑘 ,isdefinedasfollow:

𝑙𝐵𝐺𝑘 =1−𝑙𝐻𝐹𝑘 (𝑘=0, 1,2,,𝑛), (2) Where𝑙𝐵𝐺0 istheoriginalbackgroundimageand𝑙𝐵𝐺𝑘 isabackgroundof inverting𝑙𝐻𝐹𝑘 .

Theforeground/backgroundimagesbymulti-scaleHFsareusedto generatelossesthroughcomparisonwiththecorrespondingoutputs.We used asoft Dice score asalossfunction. Let 𝐿𝑐=(𝑙𝑐0, 𝑙𝑐1,,𝑙𝑐𝑀)be ground truth labelimage (𝑙𝑐0) and𝑀 represents different scalesfor multi-scaleHF.The𝑐 representsthetypeeitherHFor BG.Then,let 𝑆𝑐=(𝑠𝑐0, 𝑠𝑐1,,𝑠𝑐𝑀)bethesegmentationresultingfromthenetwork.

𝑠𝑐0and(𝑠𝑐1,, 𝑠𝑐𝑀)istheoutputsegmentationatthesizeoftheorigi- nallabelimage𝑙𝑐0andmulti-scalelabelimages(𝑙𝑐1,, 𝑙𝑐𝑀)each.The Multi-scaleLossFunctions(MLF)canbewrittenas

MLF=

C

c=1

M

m=0

wm2(lcmscm)+𝜖

lcm+scm+𝜖, (3)

where𝐶istheforegroundorbackground,istheelement-wiseprod- uct,𝑤𝑚istheweightfor𝑚thscalelossfunction,and𝜖isasmoothing constanttopreventMLFfromdivisionby0,whichwesetas0.00001for currentnetworktraining.Thesumofallof𝑤𝑚isone.Weevaluatedvar- ioussetsof𝑤𝑚andfoundthebestsegmentationperformancewhen𝑤𝑚 wasthesameforallofthelossesasdescribedinthefollowingsection.

(4)

Fig.1.TheworkflowoftheproposedU-Netwithmulti-scalehighlightingforegrounds.

Fig.2.Illustrationofthemulti-scalehighlightingforegrounds.

3. Experiments

3.1. Evaluationmetricsandrankingsystem

WeusedthefiveevaluationmetricsthattheWMHchallengeadopted tocomparethemethodsthatparticipatingteamsdevelopedquantita- tively(detailsinTable3).LetMLbetheWMHsmanuallylabeledby expertandALbetheWMHsautomaticallylabeledbytheproposedap- proach.Thefiveevaluationmetricsareasfollow:(1)theDicesimilarity coefficient(DSC)astheoverlapindexbetweenMLandAL,(2)amod- ifiedHausdorff distance(95thpercentile;H95)astheoveralldistance betweenMLandALboundaries,(3)theabsolutepercentagevolumedif- ference(AVD)betweenthetotalWMHsofMLandAL,(4)recallasthe sensitivityindetectingindividuallesions,and(5)F1-scoreastheaver- ageofprecisionandrecallindetectingindividuallesions.Thechallenge

organizationdefinedtheindividuallesionsinbothrecallandF1as3D connectedcomponentswithinanimage.Allfivemeasurementsarepos- itivereal-valuedandthecloserthemeasuredvaluesofDSC,recall,and F1approachone,thehigherthesimilaritybetweenMLandAL.Onthe contrary,thecloserthemeasuredvaluesofH95andAVDaretozero, thehigherthesimilaritybetweenMLandAL.Table3detailsthedef- initionofthesefivemetrics.Thesemetricsforourtestingresultswere computedbythechallengeorganization.

Thechallengeorganizationproposedasystemforrankingtheover- all performanceof participatingteams.Thissystemconsistedoffour steps.First,themeanofeachmetricwascomputedoveralltestdatafor eachteam’smethod.Second,foreachevaluationmetric,theorganiza- tionsortedalloftheteamsfrombesttoworst.Next,thebestandworst teamsreceivedarankscoreofzeroandone,respectively,forthatmetric.

Otherteamswereassignedarankscorebetweenzeroandonefollowing

(5)

theirresultswithintherangeofthatmetric.Finally,thefiverankscores wereaveragedintotheoverallrankscoreindicatingtheoverallperfor- manceofthatteam.Theoverallrankscorewasusedtodeterminethe rankingofeachteamontheresultboardonthechallengehomepage (https://wmh.isi.uu.nl/results/).

3.2. Implementationdetails

The proposed network and experimental networks were imple- mentedinPythonusingTensorflow(Abadietal.,2016).Thenetworks weretrainedonfourNVIDIATitan-XpGPUswith12GBRAM.Thehyper- parametersof thenetworks weresetasfollows: mini-batchsize=30, optimizer=Adam (Kingma andBa, 2014),learning rate=0.0002, the numberofepochs=1000,andHeinitialization(Heetal.,2015).Early stoppingbasedonavalidationdatasetwasusedtoavoidoverfittingin trainingdata.Theperformanceofallnetworksconvergedwithin1000 epochs.Dataaugmentationwasappliedduringtrainingtoenhancero- bustnessin thefaceof limitedtrainingdata. Tothis end,flipping of axial-slicedimagestoeachaxisandvariousaffinetransformsincluding translation,rotation,scaling,andshearingwererandomlyapplied.The detailsoftheparametersofdataaugmentationwereasfollows:theprob- abilityofflippingeachaxis=0.5,therangeoftranslationratio=(−0.1, 0.1),therangeofrotation=(−15,15),therangeofscalingratio=(0.9, 1.1),therangeofshearing=(−18,18).Thenetworksweretrainedon an1:3ratiooforiginaldatatoaugmenteddataateachepoch.

Toachieve robust segmentation results, we applied anensemble methodandaflipaveragingtotheproposedmethod.Inatrainingstep, weperformed5-fold cross-validationwhereeachfold(n=12)images wererandomlyandequallysampledfromeachsitedatasetinthewhole trainingdataset(n=60).Then,validatingeachfold,wetrainedthepro- posednetworkusingtheotherfourfolds(n=48),resultingin5networks trainedseparately.Inatestingstep,wefirstflippedeachindividualim- agewithrespect tothex-axis,y-axis,andxy-axis,whichgenerated3 flippedimagesperindividual.Then,eachofthefourimageswasused asinputtoanetworkgeneratedfrom5-foldcross-validation.Theflip averagingwasthemajorvotingoffouroutputsfromanoriginalinput andthreeinputsflippedtothex-axis,y-axis,andxy-axisinanetwork.As aresult,thefinaloutputwasthemajorvotingoffiveoutputsgenerated fromtheflipaveragingofeachnetwork.

3.3. Theevaluationoftheimportanceofmulti-scalehighlighting foregrounds

Toinvestigatetheimportanceofmulti-scaleHFinWMHssegmen- tation,weevaluatedourmodelwithvariousparametersofmulti-scale HF,whichincluded1)thetypeofpoolingforHF,and2)theweightsof lossesatoutputlayers.Weperformedthisevaluationthroughacross- scannervalidationusing thetrainingdatasets(60subjects)of WMHs segmentationchallenge.Thedatasetfortheevaluationwassplitintoa trainingdataset(1stand2ndsites:40subjects)andavalidationdataset (3rdsite:20subjects).Ourevaluationresultsarebasedontherawnet- workoutput(i.e.,withoutensemblingandflipaveraging).

ThetypeofpoolingforHF:Wecomparedthenetworkusingthepro- posedmax-pooledlabelimageswiththecorrespondingaverage-pooling version.Wedownsampledthegroundtruthlabelimageintothelower resolutionsattheintermediate outputlayers byrepeatedlyusinga2

×2averagepoolingoperationwithstride2.Wethengeneratedtheir hard labelsbythresholdingthedownsampled softlabels at 0.5.The networkforthisexperimentwasequippedwitheitherfourmax-pooled orfouraverage-pooledlabelimagesandthesamelossweightsatall oftheoutputlayers.Toavoidcherry-pickingresultsbyrandomseed, weperformedacross-scannervalidation(3-foldcrossvalidation)with 5runsof randominitialization(resultingin15 trainednetworks for eachtestedmethod)fortheoriginalU-Net,U-NetwithHF,andU-Net withaverage-pooledlabels(AVG).Eachfoldrepresentedeachscanner dataset:i.e.,theUMCdataset,NUHSdataset,orVUdataset.Wetreated

theresultsofeachfoldasanindividualsample.Weperformedpairedt- testsforthecomparisonamongthetestedmethodsandcalculated95%

confidenceinterval(CI)accordingtothepairedsampletest.

The weights oflosses about allof the outputlayers: In this experi- ment,weinvestigatedtheimpactofthesetof𝑤𝑚 inEq.(3).Thenet- workwithfourmax-pooledlabelimageswasusedforthisexperiment.

Three setsof𝑤𝑚 weretested:1)𝑤0=155,𝑤1=154,𝑤2=153,𝑤3=152, and𝑤4= 1

15;2)𝑤0= 3

15,𝑤1= 3

15,𝑤2= 3

15,𝑤3= 3

15,and𝑤4= 3

15;3)Fi- nally,𝑤0= 1

15,𝑤1= 2

15,𝑤2= 3

15,𝑤3= 4

15,and𝑤4= 5

15;Forthisexper- iments,wetrainedthreenetworks(onlyonerunforeachfold;atotal ofthreeruns)andaveragedtheresultsonapatient-leveltoinvestigate statisticaldifferences.

3.4. Clinicalutilityevaluation

Theaimherewastoassesstheclinicalutilityoftheproposedmethod byevaluatingwhethertheproposedHF-basedautomatedvolumetryis abiomarkerof thedementiaseverity(i.g.,cognitiveperformancede- cline)oradiagnosticmeasurementofdementia.Accordingly,weana- lyzedtheassociationoftheWMHvolumeswithcognitiveperformance scoresorwiththediagnosisamongcognitivelynormal(CN),mildcog- nitiveimpairment(MCI),andearlyAlzheimer’sDisease(AD).Also,to analyzetheeffectofWMHvolumeonsubjectdiagnosis(CN,MCI,and AD),wecomparedclassificationperformancesusinglogisticregression underthreedifferentconditions(WMHvolumeonly,clinicalvariables only,andWMHvolumeandclinicalvariablescombined).

3.4.1. ADNIdatasetandpreprocessing

WeemployedthedatasetsfromtheAlzheimer’sDiseaseNeuroimag- ingInitiative(ADNI)database(http://adni.loni.usc.edu).Werandomly selected243ADNIsubjectswhocompletedcognitiveevaluationsand MRIscansattheirbaselinevisits.Thedatainclude73CN,115MCI, and55ADsubjectsatbaselinediagnosis.WeusedMini-MentalState Examination(MMSE),Alzheimer’sDiseaseAssessmentScale-Cognitive Subscale (ADAS-Cog),andClinical DementiaRatingScalesum(CDR sum)ascognitiveperformanceevaluationsthatfollowedastandardized protocol(Petersenetal.,2010).ThescorerangesofMMSE,ADAS-Cog, andCDRsumwere0–30,0–70,and0–18,respectively.AhigherADAS- CogoraCDRsumscoreindicateslowercognitiveperformance.Onthe otherhand,alowerMMSEmeanslowercognitiveperformance.Each individualMRIscanconsistedofasetofaT1wimageandFLAIRimage acquiredaxial-plane.Table2detailsdemographicandclinicalinforma- tion.

We pre-processedall imagesthrough thesamesteps thatarede- scribedin Section2.1. Challengedatasetandpre-processingfor con- sistentdataprocessing.Wedidnotperformthecroppingorpaddingof theimagesto200×200axialplanesbecauseinputtinganarbitrarysize imageisacceptedinfullyconvolutionalnetworksattesttime.

3.4.2. Statisticalanalysis

Weusedagenerallinearmodel(GLM)toassesswhetherWMHvol- umescomputedusingourmethodwereassociatedwiththecognitive performancescoresorthediagnosisofsubjects.Weincludedeachtype ofthecognitivescoresorthediagnosis(i.e.,CN,MCI,andAD)asade- pendentvariableandtheWMHsvolumeasanindependentvariable.In theGLM,wealsoincludedage,cardiovascularrisk,education,gender, ApoE4genotype,andraceascovariatestocollectfortheirconfound- ingeffects.Thecardiovascularriskscorerangedfrom0to5bycount- ingthefollowingdiseasesorcharacteristicsindividually:hypertension, stroke,smoking,diabetesmellitus,andcardiovasculardisease.Tomiti- gatethepossibleissueoftheskeweddistributionofthecognitivescores andWMHvolumes,weappliedthesquareroottransformationtoMMSE andCDRsum,andthelog-transformationtoWMHvolumesassuggested inCarmichaeletal.(2010).Then,allthetransformedvaluesandADAS- Cogwerenormalizedusingthez-scoretransformation.Allthestatistical testswereimplementedusingMatlab2019b.

(6)

Table2

Theoverviewofthecharacteristicsof243ADNIsubjects.

Characteristic

Total,Mean (SD)

Diagnosis at baseline, Mean (SD) Cognitively normal MCI AD

Sample, No. 243 73 115 55

Age, y 73 (7.5) 75 (6.5) 71 (7.9) 76 (7.1)

CV risk 1.7 (1.0) 1.7 (1.1) 1.7 (1.0) 1.6 (1.0) Education, y 16 (2.7) 16 (2.5) 17 (2.7) 16 (2.7) Male, No. (%) 123 (51) 36 (49.3) 56 (48.7) 31 (56.4) ApoE4 genotype, 0/1/2 114/100/29 48/23/2 50/49/16 16/28/11 Race

White 218 63 105 50

Other 25 10 10 5

MMSE score 27 (2.8) 29 (1.3) 28 (2.1) 23 (2.1) ADAS-Cog score 11 (6.8) 6 (3.1) 10 (4.8) 20 (5.8) CDR sum 1.75 (1.99) 0.12 (0.53) 1.4 (1.1) 4.6 (1.7) WMHs, c m 3 9.94 (11.74) 9.13 (14.33) 8.72 (9.04) 13.58 (12.41)

Table3

Listofevaluationmetricsandtheirdefinitions.Theorganizationofthechallengedefined individuallesionsforrecallandF1as3Dconnectedcomponents.𝑁meansthenumber of3Dconnectedcomponents. ̂𝑑(𝑋,𝑌)meansmax

𝑥𝑋min

𝑦𝑌𝑑(𝑥,𝑦),where𝑑(𝑥,𝑦)isthedistance betweenxandypoints.𝑉𝑀𝐿and𝑉𝐴𝐿meantheWMHvolumecomputedbymanual segmentationandourmethod,respectively.

DSC H95 AVD Recall F 1

Equation 𝐹𝑃+2𝐹𝑇 𝑃𝑁+2𝑇 𝑃 max { ̂𝑑 ( 𝑋, 𝑌 ) , ̂𝑑 ( 𝑌 , 𝑋 )} |𝑉𝑀𝐿𝑉𝑉𝐴𝐿|

𝑀𝐿

𝑁𝑇 𝑃 𝑁𝑇 𝑃+𝑁𝐹 𝑁

2𝑁𝑇 𝑃 2𝑁𝑇 𝑃+𝑁𝐹 𝑁+𝑁𝐹 𝑃

Abbreviations–DSC:theDiceSimilarityCoefficient;H95:amodifiedHausdorff dis- tance(95thpercentile);AVD:theabsolutepercentagevolumedifference;Recall:the sensitivityfordetectingindividuallesions;F1:F1-scoreforindividuallesions;TP:true positive;FN:falsenegative;FP:falsepositive.

3.4.3. Classificationanalysis

WeassessedwhetheralterationsinWMHvolumewereusedtoclas- sifyanunseenindividualintoCN, MCI,orAD.Tothisend,weused logisticregressionasaclassifierandperformedtheclassificationunder threedifferentconditionswhereinputfeaturesvaried:1)WMHvolume only;2)clinicalvariablesonly(age,cardiovascularrisk,education,gen- der,ApoE4genotype,andrace;thesearementionedinSection3.4.2), and3)WMHvolumeandclinicalvariablescombined.Theclassification wasevaluatedusingaleave-one-outstrategy.Wecalculatedthereceiver operatingcharacteristic(ROC)curvesfromtheclassificationresulting infromeachofthethreeconditionsandcomparedtheirareaunderthe curve(AUC)values.ClassificationanalysiswasprocessedusingMatlab 2019b.

4. Results

Inthissection,wecompareourresultswiththeresultsofthetop 2nd–5thalgorithmslistedintheWMHSegmentationChallenge asof April30,2020.Wealsoshowtheinfluenceofmulti-scaleHFonthepro- posednetwork’ysperformance.Finally,theresultsoftheclinicalanaly- sisarepresented.

4.1. ResultsoftheWMHsegmentationchallenge

As of March 1st2020, the following four teams, as well as our team, were listed as the top five in the challenge leaderboard: 1) sysu_media_2– deep2Dmulti-scalestackedU-Netandensemblelearning;

2)sysu_media– fullyconvolutionalensembleneuralnetworks(Lietal., 2018);3)anonymous_20200413– brainatlasguidedattentionU-Net;

and4)coroflo– multi-dimensionalconvolutionalgatedrecurrentunits andensemblelearning.Moredescriptionofeachmethodisavailableon thechallengewebsite.

Table4showstheresultsofthetopfiveteamsaboutfivemetrics(i.e., DSC,H95,AVD,recall,andF1).Boldtextindicatesthebestperformance

amongallalgorithmsforthegivenmetric.Ourmethodpgsachievedthe bestoverallperformance.

TheresultsofourmethodaredetailedinTable5.Partofthetest dataset(n=90)wasfromthreesites(Utrecht,Singapore,andAMSGE3T) thatmatchedthesitesfromwhichthetrainingdatasetswereacquired.

Whereastheother20subjectswerefromAMSGE1.5TandAMSPETMR whichwerecompletelyunseen.However,theproposedmethoddemon- stratedsimilarperformanceinthisunseendatasetrelativetothethree- sitedataset.

4.2. Theinfluenceofthemulti-scalehighlightingforegroundsandother parametersonsegmentationaccuracy

Weevaluatedthesegmentationaccuracyoftheproposednetwork underdifferentconfigurationsofthemulti-scaleHF.Table6showsthe segmentationresultsand95%confidenceintervalsfor15multipleruns oforiginalU-Net,U-NetwithHF,andU-NetwithAVG.TheU-Netwith HFachievedthebestperformanceacrossallmetricscomparedtothe othertwomethods.Table7showstheeffectoftheweightsoflossesat alltheoutputlayersonsegmentationaccuracy.Thenetworkwithequal weightsamongall thefiveoutputlayersoutperformed thenetworks usingotherarrangementsoftheweights.

4.3. Clinicalutilityoftheproposedsegmentationapproach

WMHvolumescomputedusingourmethodweresignificantlyasso- ciatedwithcognitivescoresandgroupdiagnosisoftheANDIsubjects (Table8;MMSE,CDRsum,andgroupdiagnosis:Bonferronicorrection p-value <0.05; ADAS-Cog:FDRcorrectionp-value< 0.05). Inother words,thebiggerWMHwas,themorewascognitivedeclineandthe moreseverewasthediagnosisofapatientobserved.Furthermore,our linearmodelshowedthata1-standarddeviation(SD)increaseinWMH volumecorresponded0.133and0.165SDincreasesinADAS-Cogand CDRsumscores,respectively,anda0.176SDdecreaseinMMSEscore.

(7)

Table4

ComparisonofourmethodwithothermethodsintheMICCAIWMHsegmentationchal- lenge.Boldfontsindicatethebestresultamongallteams.

Team Rank DSC H95 (mm) AVD (%) Recall F 1

1 pgs (ours) 0.0185 0.81 5.63 18.58 0.82 0.79 2 sysu_media_2 0.0187 0.80 5.76 28.73 0.87 0.76 3 sysu_media 0.0288 0.80 6.30 21.88 0.84 0.76 4 anonymous_20200413 0.0314 0.79 6.17 22.99 0.83 0.77

5 coroflo 0.0493 0.79 5.46 22.53 0.76 0.77

Abbreviations–DSC:theDiceSimilarityCoefficient;H95:amodifiedHausdorff distance (95thpercentile);AVD:theabsolutepercentagevolumedifference;Recall:thesensitivity fordetectingindividuallesions;F1:F1-scoreforindividuallesions.

Fig.3. Thereceiveroperatingcharacteristic(ROC)curvesforlogisticregressionabouttwotypesoftheclassification(CNvsADandMCIvsAD).

Table5

Resultsaboutfivemetricsoftheproposedmethodinthetestdatasetofthe challenge.

DSC H95 (mm) AVD (%) Recall F 1 Utrecht ( n = 30) 0.81 6.76 18.62 0.81 0.75 Singapore ( n = 30) 0.84 4.71 15.69 0.83 0.80 AMS GE3T ( n = 30) 0.80 3.74 22.03 0.83 0.82 AMS GE1.5T ( n = 10) 0.74 9.30 22.09 0.75 0.77 AMS PETMR ( n = 10) 0.80 7.00 13.25 0.82 0.81 Weighted average 0.81 5.63 18.58 0.82 0.79 rank [0…1] 0.000 0.004 0.003 0.086 0.000

EventhoughfeedingWMH volumealonetotheclassifierdid not achieveabetterclassificationcomparedtousingclinicalvariablesonly (Fig.3),WeobservedthatcombiningWMHvolumewithclinicalvari-

ablesresultedinthebestclassificationperformancesbetweenCNand AD(AUC=0.75)andbetweenMCIandAD(AUC=0.67).Intheclassifica- tionofCNvs.MCI,theclassificationperformancewasnotimprovedby combiningWMHvolumeandclinicalvariables(AUC=0.58)compared tousingclinicalvariablesonly(AUC=0.59).

5. Discussion

WeproposedanewU-Netvariantwithmulti-scalehighlightingfore- grounds(HF)inthispaper.Ournetworkframeworkwasdesignedto improvethedetectionoftheWMHvoxelsinvolvingadegreeofpar- tial volumeeffects.Weaddedthemulti-scalelabelimagesthatwere max-pooledbyHFtoaU-NetforWMHssegmentation.Theproposed methodhasbeenplacedatthetoprankfortheoverallscoreintheMIC- CAIWMHSegmentationChallenge.TheWMHvolumecomputedusing ourautomatedapproachwassignificantlyassociatedwithcognitiveper-

Table6

Thesegmentationresultsand95%confidenceintervalsforthe15multiplerunsoforiginalU-Net,U-NetwithHF,andU-Netwith average-pooledlabels(AVG).Boldfontsindicatethebestresult.

DSC HD95 (mm) AVD (%) recall F 1-score

Original U-Net Mean ± SD 0.8033 ± 0.0204 6.2509 ± 1.6870 18.8337 ± 6.086 0.7261 ± 0.0538 0.7273 ± 0.0466 95% CI [0.0060, 0.0089] [ 0.7435, − 0.3917] [ 2.8959, 1.2721] [0.0212, 0.0440] [0.0228, 0.0298]

Original + AVG Mean ± SD 0.8037 ± 0.0191 6.9537 ± 2.5907 17.6740 ± 6.4449 0.7373 ± 0.0552 0.7340 ± 0.0513 95% CI [0.0055, 0.0085] [ 1.7264, − 0.8143] [ 2.0247, 0.1762] [0.0086, 0.0341] [0.0126, 0.0266]

Original + HF Mean ± SD 0.8107 ± 0.0203 5.6833 ± 1.8080 16.7497 ± 6.9472 0.7587 ± 0.0456 0.7536 ± 0.0436

p-valuecorrectedbyFDR<0.05betweenU-NetwithHFandother(pairedt-test).

Abbreviation– SD:standarddeviation;CI:confidenceinterval.

(8)

Table7

Influenceofvariousweightingamongalltheoutputlayerlossesonthesegmentationaccuracy.Boldfontsindicatethebest result.

DSC H95 (mm) AVD (%) Recall F 1

Original + HF (all 0.2) 0.8139 ± 0.0878 5.0383 ± 4.3894 15.893 ± 24.845 0.7725 ± 0.1033 0.7575 ± 0.0903 Original + HF (1/15 →5/15) 0.8095 ± 0.0952 5.4751 ± 4.9443 17.890 ± 29.337 0.7707 ± 0.1055 0.7523 ± 0.0956 Original + HF(5/15 →1/15) 0.8093 ± 0.0912 6.1427 ± 5.7800 16.709 ± 18.847 0.7496 ± 0.1020 0.7595 ± 0.0746

Table8

SummaryoftheresultsofthegenerallinearmodelusingWMHvolumesandcovariatesasindependentvariables andcognitivescoresandgroupdiagnosisasdependentvariables.

Cognitive scores

Group(CN/MCI/AD)

MMSE ADAS-Cog CDR sum

β p-value β p-value β p-value β p-value

WMH volume 0.176 < 0.001 0.133 0.017 0.165 0.009 0.131 0.008

Age

CV risk

Education 0.062 < 0.001 0.061 0.001 0.047 0.031

Gender

ApoE4 genotype 0.296 < 0.001 0.184 < 0.001 0.421 < 0.001 0.299 < 0.001

Race

p-valuecorrectedbyFDR<0.05.

p-valuecorrectedbyBonferronimethod<0.05.

formancescores(MMSE,ADAS-Cog,andCDR-sum)andthedementia diagnosis(CN,MCI,andAD).Furthermore,theautomatedWMHvol- umetryimprovedtheclassificationofunseensubjectsintoCN,MCI,or AD.

5.1. ComparisontoothermethodsevaluatedintheWMHsegmentation challenge

Ourmethodhasachievedthebestoverallevaluationscores,thehigh- estdicesimilarityindex,andthebestF1-scoreintheMICCAIWMHSeg- mentationChallengeamongallthelisted39methods.Ourmethodhas alsoachievedthetop5forotherevaluationmetrics(Hausdorff distance 95:2nd,averagevolumedifference:3rd, andrecall:5th).Giventhat theDicesimilarityindexrepresentstheoverlapbetweentheautomated andmanualsegmentationandtheHausdorff distancerepresentstheir boundarygap,ourachievementofhighaccuracyinthesetwoindices demonstratesthattheproposedmethodsuccessfullydetectstheWMH voxelsthatarelocatedeitherattheboundaryoftheWMHorinsmall WMHvolumesandconsequentlyexposedtoPVE.Indeed,avisualin- spectionofindividualsegmentationsshowsthesuperiorsegmentation ofsuchpartialvolumevoxelsintheproposedU-Netwithmulti-scaleHF comparedtothestandardU-Net(Fig.4).InFig.4,theproposedmethod moreaccuratelysegmentedWMHsontheboundaryofmanualWMHs (Subject1)aswellassmallclustersofWMHs(Subjects2-4)compared tothestandardU-Net.Despitearelativelylowrecall(5thrank,meaning relativelymorefalse-negativevoxelsdetected),weachievedthebestF1- scorewhichistheharmonicmeanofrecallandprecision.Thissuggests twothings.First,ourmethodyieldedahighertrue-positiverateanda lowerfalse-positiveratethanothermethods.Second,ourmethodmay havedifficultyin detectingsomeWMHvoxelseventhoughitdetects thesmallclusterandtheborderofWMHsbetterthanotherapproaches.

5.2. Configurationofthemulti-scalehighlightingforegrounds

Thesegmentationperformancewasthebestwhensettingequalall theweightsofdifferentscaleHFssuggestingthatthefeaturemapsgen- eratedfromallscalesareequallyimportantfordeeplearningofWMHs segmentation.Our method achievedstatistically significantimprove- mentsin allmetricscompared totheoriginalU-Net, andsignificant improvementsinallmetricsexceptedaveragevolumedifferencecom- paredtotheU-NetwithAVG(Table6).AlthoughU-NetwithAVGtends

tohavebetterperformanceinallmetricsthantheoriginalU-Net,these differences didnotreachthestatisticalsignificance.These resultsin- dicatethatthesuperiorperformanceofU-NetwithHFdidnotmerely resultfromaluckyrandomseed.ThesignificantimprovementsinDice scoreandF1-scorebyourmethodsuggestthattheproposedmethodcan moreaccuratelydetectWMHclustersthatarehardtodetectbyother methods,suchassmallWMHsinvolvinglargepartialvolumeeffects.

Basedontheseresults,therefore,weconfirmthattheproposedHF approachishighlyadvantageoustoWMHssegmentationandlikelyto segmentationofbrainlesionswhichhavesimilarcharacteristics(size, shape,orintensity),suchasmultiplesclerosis(Weedaetal.,2019),mi- crobleeds(Seghieretal.,2011),andperivascularspace(Ballerinietal., 2018).Sinceourmethodusesmulti-scalelabelimagesemphasizingfore- groundvoxels(i.e.,unbalanceddata),itcanbeusedincombinationwith variouslossfunctions(boundaryloss,(Kervadecetal.,2019);focalloss, (Linetal.,2017);Tverskyloss,(Salehietal.,2017);andfocalTversk loss,(AbrahamandKhan,2019))toovercometheproblemofunbal- anceddata. Itmaybeworth tryingtocombinetheabove-mentioned lossfunctionsandHFinvariouswaysandcomparetheresultstofinda lossfunctionthatfitswellwithHF.Additionally,ourmethodcanalso simplybe appliedtovariousnetworks basedontheencoder-decoder structuresuchasU-Netvariantsorothervariantsofdeepsupervision methods.

5.3. Clinicalevaluation

PreviousstudiesshowedthattheincreaseinWMHsvolumerelatesto cognitivedecline(Barberetal.,1999;Duboisetal.,2014;Habesetal., 2016;Leeetal.,2016;PrinsandScheltens,2015).Onthebasisoffind- ingsinthesestudies,weevaluatedtheclinicalutilityoftheproposed approachbyinvestigatingwhetherautomatedWMHvolumetrycanpre- dictcognitiveperformancedeclinesorthediagnosisofasubject(CN, MCI,andAD).Ourresultsdemonstratethattheautomaticallycomputed WMHvolumeissignificantlyassociatedwithcognitiveperformancein thedirectionwehypothesized(Table8).

Furthermore,wehypothesizedthatfeedingtheautomaticallycom- putedWMHvolumestoaclassifierindividuallycandiagnosesubjects.

Theresultsofthisexperimentshowedthatclassificationperformancefor CNvs.ADandMCIvs.ADcanbeimprovedusingthecombinedfeature- setofWMHvolumesandclinicalvariables.Theseresultsareconsistent

(9)

Fig.4.Segmentationresultsinthetrainingdataset.

Fromtoptobottomareaxialslicesoffourdifferent subjects.FromlefttorightareFLAIRimage,ground truth,theresults fromthestandard U-Net,andthe resultsfromtheproposedU-Netwithmulti-scaleHF method.YellowvoxelsindicateWMHssegmentedus- ingeachmethod.Colorboxesshowourapproach’sim- provementoffalse-positives(orangeborder)orfalse- negatives(greenborder)observedinthestandardU- Net.

withpreviousfindingsthatWMHsprovideanimagingmarkerforAD (Habesetal.,2016;Leeetal.,2016;PrinsandScheltens,2015).

6. Conclusions

Inthecurrentstudy,weproposedaU-Netwithmulti-scalehighlight- ingforegrounds(HF).Ourvariousevaluationsshowthattheproposed method improvesdetectingWMH voxels withpartialvolume effects asintended.However,itstillremainschallengingforourmodeltore- tainbothhighprecisionandrecall.Attention-basedmodels(Chenetal., 2016b;Wooetal.,2018)thateffectivelylearnimportantcharacteristics ofatargetstructureforsegmentationcanpotentiallybeawaytosolve thisissue.ToimproveWMHssegmentation,integratinganattention- basedmodelintodeepneuralnetworksisthussuggestedinthefuture.

Ourclinicalevaluationdemonstratestheclinicalutilityofourmethod.

Yet,theindividualdiagnosisof unseensubjectsusingWMH volumes aloneisbelowtheclinicalstandard.Inafurtherstudy,otherinforma- tionofWMHssuchasthelocationordistributionofWMHvolumesand thelongitudinaltrajectory ofWMH volumechangeswouldbe incor- poratedfortheimprovementofindividualdiagnosis.Theimplementa- tionofourproposedmethodisavailableatDockerhub(Merkel(2014); https://hub.docker.com/r/wmhchallenge/pgs).

Dataandcodeavailabilitystatements

The data that support the findings of this study are openly available in 2017 MICCAI WMH Segmentation Chal- lenge homepage at https://wmh.isi.uu.nl/data/, reference number (Kuijf, 2019). Our code also is openly available at https://hub.docker.com/r/wmhchallenge/pgs.

Creditauthorshipcontributionstatement

GilsoonPark:Methodology,Software,Formalanalysis,Writing– originaldraft,Visualization.JinwooHong:Methodology,Formalanal- ysis. BenA. Duffy:Methodology, Software. Jong-Min Lee:Concep- tualization,Methodology,Supervision.HosungKim:Formalanalysis, Methodology,Writing– originaldraft.

Acknowledgments

ThisresearchwassupportedbytheBio&MedicalTechnologyDevel- opmentProgramoftheNationalResearchFoundation(NRF)&funded by theKoreangovernment(MSIT) (No.2020M3E5D9080788)anda grantoftheKoreaHealthTechnologyR&DProjectthroughtheKorea HealthIndustryDevelopmentInstitute(KHIDI),fundedbytheMinistry of Health & Welfare, RepublicofKorea (grantnumber: HI19C0745) andagrantoftheNationalInstitutesofHealthgrants(P41EB015922, U54EB020406,U19AG024904),BrightFocus(A2019052S).

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., 2016. Tensorflow: Large-Scale Machine Learning on Hetero- geneous Distributed Systems. arXiv preprint arXiv:1603.04467.

Abraham, N. , Khan, N.M. , 2019. A novel focal tversky loss function with improved atten- tion u-net for lesion segmentation. IEEE, pp. 683–687 .

Abraham, H.M. , Wolfson, L. , Moscufo, N. , Guttmann, C.R. , Kaplan, R.F. , White, W.B. , 2016. Cardiovascular risk factors and small vessel disease of the brain: Blood pressure, white matter lesions, and functional decline in older persons. J. Cereb. Blood Flow Metab. 36, 132–142 .

Ashburner, J. , Friston, K.J. , 2000. Voxel-based morphometry —the methods. Neuroimage 11, 805–821 .

참조

관련 문서

(2017) Efficacy and tolerability of rivastigmine patch therapy in patients with mild-to-moderate Alzheimer’s dementia associated with minimal and moderate ischemic

ABSTRACT The translocator protein (TSPO) is one of the important targets for Positron Emission Tomography (PET) imaging because it is associated with brain cancer, stroke,

Purpose : To evaluate white matter abnormalities on diffusion tensor imaging (DTI) in patients with mild Alzheimer dis- ease (AD) and mild cognitive impairment (MCI), using

Assessment and estimation of particulate matter formation potential and respiratory effects from air emission matters in industrial sectors and cities/regions, Journal

O bjectives:The aim of this study was to compare activities of daily living (ADLs) according to degenera- tive changes in brain [i.e., medial temporal lobe atrophy (MTA),

Diffusion tensor images were obtained to investigate the damage of brain white matter in non-smokers and an adequate drinking group (less than 10 points)