ContentslistsavailableatScienceDirect
NeuroImage
journalhomepage:www.elsevier.com/locate/neuroimage
White matter hyperintensities segmentation using the ensemble U-Net with multi-scale highlighting foregrounds
Gilsoon Park
a, Jinwoo Hong
a, Ben A. Duffy
b, Jong-Min Lee
a,∗, Hosung Kim
baDepartment of Biomedical Engineering, Hanyang University, Seoul, South Korea
bUSC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA 90033, USA
a r t i c le i n f o
Keywords:
White matter hyperintensities Segmentation
Deep learning U-Net
Multi-scale highlighting foregrounds
a b s t r a ct
Whitematterhyperintensities(WMHs)areabnormalsignalswithinthewhitematterregiononthehumanbrain MRIandhavebeenassociatedwithagingprocesses,cognitivedecline,anddementia.Inthecurrentstudy,wepro- posedaU-Netwithmulti-scalehighlightingforegrounds(HF)forWMHssegmentation.Ourmethod,U-Netwith HF,isdesignedtoimprovethedetectionoftheWMHvoxelswithpartialvolumeeffects.Weevaluatedthesegmen- tationperformanceoftheproposedapproachusingtheChallengetrainingdataset.Thenweassessedtheclinical utilityoftheWMHvolumesthatwereautomaticallycomputedusingourmethodandtheAlzheimer’sDisease NeuroimagingInitiativedatabase.WedemonstratedthattheU-NetwithHFsignificantlyimprovedthedetection oftheWMHvoxelsattheboundaryoftheWMHsorinsmallWMHclustersquantitativelyandqualitatively.Upto date,theproposedmethodhasachievedthebestoverallevaluationscores,thehighestdicesimilarityindex,and thebestF1-scoreamong39methodssubmittedontheWMHSegmentationChallengethatwasinitiallyhosted byMICCAI2017andiscontinuouslyacceptingnewchallengers.Theevaluationoftheclinicalutilityshowed thattheWMHvolumethatwasautomaticallycomputedusingU-NetwithHFwassignificantlyassociatedwith cognitiveperformanceandimprovestheclassificationbetweencognitivenormalandAlzheimer’sdiseasesubjects andbetweenpatientswithmildcognitiveimpairmentandthosewithAlzheimer’sdisease.Theimplementation ofourproposedmethodispubliclyavailableusingDockerhub(https://hub.docker.com/r/wmhchallenge/pgs).
1. Introduction
Whitematterhyperintensities(WMHs)appearasabnormal hyper- signalswithinthewhitematterregiononthehumanbrainMRIinclud- ingT2-weighted (T2w),protondensity(PD),andfluid-attenuatedin- versionrecovery(FLAIR)imaging.Theseatypicalsignalsmostlyresult fromagingprocesses suchasdemyelinationandaxonalloss,bothas aresultofcerebralsmallvesseldiseases(PrinsandScheltens,2015).
Theyarefrequentlyobservedintheelderlyandtendtoincreaseinsize andnumberwithage(Habesetal.,2016),whileatthesametimebe- ingassociatedwithseveralpotentialvascularriskfactors,particularly hypertension(Abrahametal.,2016).
Based on quantitative analyses of WMHs (Barber et al., 1999; Duboiset al., 2014; Habes etal., 2016; Leeet al., 2016; Prinsand Scheltens,2015),previousstudiesshowedthatthepresenceandsever- ityofWMHs areassociatedwithdementia(Duboisetal.,2014)and increaserisk ofconditions suchasAlzheimer’sdisease(Habeset al., 2016;Leeetal., 2016),vasculardementia,anddementiawithLewy bodies(Barberetal.,1999).Thesestudiessuggestthatthequantitative
∗Correspondingauthor.
E-mailaddress:[email protected](J.-M.Lee).
characterizationofWMHs playsanimportantroleinvariousclinical researchintoneurologicaldisorders.
ManualdelineationofWMHsprovidesground-truthforvolumetric quantificationofWMHs.However,itisalaborious,tedious,andtime- consuming taskandrequiresahighlevelofexpertise toavoidunac- ceptablelevelsofintra-andinter-ratervariability.Besides,thisbecomes moreproblematicwiththesizeofadataset,encouragingautomatedseg- mentation.
Anumberof methodshavebeenproposedtosegmentWMHs au- tomatically.Jeonetal.(2011)attemptedWMHssegmentationbased on theMarkov random fieldand anintensity thresholding method.
Otherstudieshavedevelopedk-nearestneighbors-basedclusteringap- proaches (Griffanti etal., 2016; Jianget al., 2018;Steenwijk etal., 2013).Thesemethodsusedvariousfeatures(e.g.,spatialinformation, intensityinformation,andtexture)fromT1wandFLAIRimagesasthe inputoftheclusteringalgorithm.Dadaretal.(2017)evaluatedvari- ousclassifierssuchaslogisticregression,supportvectormachines,de- cisiontrees,andrandomforests.Theyobservedthattherandomfor- estachievedthebestperformance.Recently,convolutionalneuralnet- works(CNN), aclassof deepneuralnetworks,have rapidlybecome
https://doi.org/10.1016/j.neuroimage.2021.118140.
Received15May2020;Receivedinrevisedform13April2021;Accepted16April2021 Availableonline3May2021.
1053-8119/© 2021TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense
aprimarymethodinmedicalimagesegmentationandshownremark- ableperformance(Litjensetal.,2017).ForthesegmentationofWMHs, Moeskopsetal. (2018)proposed aCNNmodelbasedon multi-scale patchesextractedfromT1w,T1w inversion-recovery,andFLAIRim- ages.Rachmadietal.(2018)addedglobalspatialinformationtothe patchthatwasusedasaninputtoaCNNtoimprovethesegmentation ofWMHs.
Theuseofdifferentevaluationmetrics(e.g.,Dice,Jaccard,Haus- dorff indices),as wellas evaluations againstmanually WMHsdelin- eationsbydifferentexperts,makesitdifficulttocomparetheperfor- manceofsegmentationmethodsfromvariousstudiessystematically.To addresstheseissues,theWMHSegmentationChallenge2017washeld forastandardizedcomparisonoftheautomaticsegmentationofWMHs (Kuijfetal.,2019)inconjunctionwiththe20thInternationalConfer- enceonMedicalImageComputingandComputerAssistedIntervention (MICCAI)(Descoteauxetal.,2017).TheChallengeprovidedapublic platformtostandardizetheevaluationofWMHssegmentationmethods basedonaunifieddatasetofMRI,evaluationmetrics,andexpertlabel- ing.Twentyteamsproposednewmethodsandperformedtrainingand testingonthedatasetprovided.Byusinganensembleapproach,which isagoodwaytoreduceover-fittingofdeeplearningalgorithms,withthe U-Netwhichisadeepencoder-decoderarchitectureandhasskipcon- nectionsconcatenatingthefeaturemapsintheencodertothefeature mapsinthedecoder,Lietal.(2018)achievedthebestperformance.
Partialvolumeeffects(PVE)onMRIimagesduetolimitedspatial resolutionisassociatedwiththeproblemwhereonepixel/voxelrepre- sentsasignalofamixtureofdifferentbraintissues.Thiseffectbecomes strongatvoxelsoftheboundarybetweenstructureshavingdifferenttis- suecharacteristicsorbetweenalesionandthesurroundingbraintissue.
UnidentificationofthevoxelwithPVEmayresultinunderestimating thesegmentationofthetargetlesion,inparticularforsmalllesionsand lesionsofboundaries.
Suchaproblemmaynotbeinevitablefordeepneuralnetworkap- proaches.Indeed,wetestedastandardU-NetmethodonWMHsegmen- tationandobservedtheunsuccessfulclassificationofvoxelsplacedat boththeedgesofWMHsorsmallWMHs.Thepredictedprobabilityof foregroundislowfortheWMHvoxelswithstrongPVEduetotheirfea- turesuncertainty,andthenetworkstendtofailtoidentifythosepartial volumeWMHs(PV-WMHs).Ingeneral,thevolumeofthePV-WMHsis relativelysmallcomparedtothevolumeoftheoverallWMHs.Thusthe PV-WMHscontributelittletothenetworkloss,resultinginthenetwork insufficientlylearnthePV-WMHs.
Weproposethenetworktobetrainedusingamulti-scaleapproach ofhighlightingforegrounds(HF).InthestandardU-Net,thetrainingis accomplishedbyminimizingtheDicelossattheoutputlayer,whichis computedbycomparingtheposteriorprobabilitymapwiththeground truthlabel.ToemphasizethelabelvoxelsonWMHboundaries,which wouldlikelylieonPV-WMHvoxels,weproposetocomputeandmin- imizetheDicelossesattheparticularintermediatedecoderlayersby comparingtheiroutputprobabilitymapswiththecorrespondinglabels generatedusingtheproposedHFmethod.TheHFapproachdownsam- plesthegroundtruthlabelssequentiallybyapplying2×2max-pooling withstride2,resizingthelabelstothesizeofeachofthedecoderlayers.
Theapproachemphasizestheinfluenceofthevoxelslyingonthelesion boundariesorconsistingofsmalllesionsonthetrainingofanetwork.
WeusedthelabelsgeneratedusingtheproposedHFmethodtotrain auxiliaryclassifiersintheintermediatedecoderlayers.Trainingbyin- sertingauxiliaryclassifiersintheintermediatelayersisknownasdeep supervision.Innaturalimageclassification,GoogLeNet(Szegedyetal., 2015) is based on this method. However, this study did not sys- tematically evaluate the use of auxiliary classifiers. Independently, Leeetal.(2015)proposeddeeply-supervisednetworksthatwerecom- binedwithauxiliaryclassifiersatallintermediatelayersforimageclas- sification.They showed thatthe deepsupervision method improved theconvergencerateofnetworks byalleviatingthevanishingorex- plodinggradientproblem.Italsoimprovedthediscriminativeability
ofthefeatureslearnedbydirectlydrivingthelow-andmid-levelfea- turesinintermediatelayerstoveryhigh-levelfeatures(i.e.,targetout- put).Wangetal.(2015)appliedthedeepsupervisionmethodtodeeper convolutionalnetworks.Theyproposedtoaddauxiliaryclassifiersaf- tercertainintermediatelayersforbetterclassificationperformancein thedeeperconvolutionalnetworks.Chenetal.(2016a)usedthedeep supervisionframeworkforneuronalstructuresegmentationonelectron microscopyimages.Theirmethodaddeddeconvolutionallayerstothe intermediateencoderlayerstotrainauxiliaryclassifiersandfusedthe outputs.Severalvariantsofthedeeply-supervisednetworkswerefurther introducedtosegmentbraintissue(Chenetal.,2018),liver,wholeheart andgreatvessel(Douetal.,2017),andretinalvessels(Linetal.,2018).
Zhuetal.(2017)proposedtouseaU-Netwithdeepsupervisionfor prostatesegmentationinMRimages.Thesepreviousworks(Chenetal., 2018;Chenetal.,2016a;Douetal.,2017;Linetal.,2018;Zhuetal., 2017)reportedthatdeepsupervisioncouldimprovesegmentationac- curacy.
Though theproposed method is similar tothese previous works (Chenetal.,2018;Chenetal.,2016a;Douetal.,2017;Linetal.,2018; Zhuetal.,2017),ourapproachhasseveraldifferencesintheprocessing.
Wegeneratelabelimagesatdifferentresolutionsfromthegroundtruth labelingwhileemphasizingtheforegroundvoxelsusingtheproposed highlightforeground(HF)method.Thegeneratedmulti-scalelabelim- ageisusedforthenetworktolearnthefeaturesfocusingonthefore- groundareaatdifferentresolutions.Ontheotherhand,theseprevious worksusedthegroundtruthlabelswithoutmodificationattheoriginal imageresolution.Duringtraining,theyupsampledfeaturemapstothe originalimageresolutioninordertogeneratetheoutput.Instead,the proposedmethodcomputesthelossfunctionswithouttheneedforthe upsamplingofnetworkoutputs,mitigatingGPUmemoryusage.
TheMICCAI WMHSegmentationChallenge continuestohostfur- therstudiessinceitsinitialopening.Ofthe39methodssubmittedthe challenge asofMarch1st,2020,ourteamthatproposedtheensem- bleU-Netwithmulti-scaleHFcurrentlyachievesthebestperformance.
Inthefollowingsections,weoutlinethemethodsweusedtoachieve state-of-the-artperformanceonthistask.WefirstoutlinetheU-Netwith HFsmodel.Then,weshowthattheproposedmethodsignificantlyim- provedtheWMHssegmentationperformancecomparedtothestandard U-Net.Intheresultssection,weevaluatetheproposedmethodincom- parison toothermethodsandassessthe influenceofthe multi-scale HFsandtheeffectaccordingtotheirorganization.Finally,using the Alzheimer’sDiseaseNeuroimagingInitiative(ADNI)dataset,weinves- tigatethepotentialclinicalutilityoftheHFmethodbyautomatically segmenting WMHsandassociatingtheWMH volumeswithcognitive performancescores thatareusedforthediagnosis ofmild cognitive impairmentandAlzheimer’sDisease(i.e.,Mini-MentalStateExamina- tion(MMSE),Alzheimer’sDiseaseAssessmentScale-CognitiveSubscale (ADAS-Cog),andClinicalDementiaRatingScalesum(CDRsum)).
2. Materialsandmethods
2.1. Challengedatasetandpre-processing
Wevalidatedourmethodbasedonthedatasetandevaluationframe- workin WMHSegmentation Challenge2017becausethisprovidesa standardized assessment of thesegmentation performance of WMHs (Kuijf et al., 2019).The challenge organizationprovided a training datasetandatestdatasetconsistingof170subjectsintotal.Detailsof thedatasetsaregiveninTable1.Thetrainingdatasetincluded60sub- jectsandwaspubliclyavailableanddownloadableafterregistrationat https://wmh.isi.uu.nl/data/.Weusedthisdatasettotrainournetwork andinvestigatetheeffectoftheHFmethod.Thetestdatasetincluding therestofthe110subjectswasonlyavailableforevaluatingpredictions whensubmittedtothechallenge.
Each subjectincludedthebrainMR imagesbefore andafterpre- processing for T1w andFLAIR images anda manual delineation of
Table1
TheoverviewofthecharacteristicsoftheWMHSegmentationChallenge2017dataset.
Institute Scanner T1 voxel size(TR/TE(/TI)) FLAIR voxel size(TR/TE/TI) Train Test
UMC Utrecht 3T Philips Achieva 1 . 00 × 1 . 00 × 1 . 00 m m 3(7.9/4.5 ms) 0 . 96 × 0 . 95 × 3 . 00 m m 3(11,000/125/2800 ms) 20 30 NUHS Singapore 3T Siemens TrioTim 1 . 00 × 1 . 00 × 1 . 00 m m 3(2300/1.9/900 ms) 1 . 00 × 1 . 00 × 3 . 00 m m 3(9000/82/2500 ms) 20 30 VU Amsterdam 3T GE Signa HDxt 0 . 94 × 0 . 94 × 1 . 00 m m 3(7.8/3.0 ms) 0 . 98 × 0 . 98 × 1 . 20 m m 3(8000/126/2340 ms) 20 30 1.5T GE Signa HDxt 0 . 98 × 0 . 98 × 1 . 50 m m 3(12.3/5.2 ms) 1 . 21 × 1 . 21 × 1 . 30 m m 3(6500/117/1987 ms) 0 10 3T Philips Ungenuity 0 . 87 × 0 . 87 × 1 . 00 m m 3(9.9/4.6 ms) 1 . 04 × 1 . 04 × 0 . 56 m m 3(4800/279/1650 ms) 0 10
∗Abbreviations:TR:repetitiontime;TE:echotime;TI:inversiontime.
WMHs.TheimageswereacquiredfromfivedifferentMRscannersin threedifferent institutes.Theimagesacquiredinthethree MRscan- ners(i.e.,3TPhilipsAchieva,3TSiemensTrioTim,and3TGESigna HDxt)wereusedforbothtrainingandtesting.Theimagesacquiredin theothertwoMR scanners(i.e.,1.5TGESignaHDxtand3TPhilips Ingenuity)wereusedonlyfortesting.Allthe3DFLAIRimagesfrom VUAmsterdaminstitutewereresampledintotheaxialdirectionwith 3mmslicethickness.TheWMHswerelabeledontheFLAIRimagesby twoexperts,basedonStandardsforReportIngVascularchangesonnEu- roimaging(STRIVE)criteria(Wardlawetal.,2013).Theorganizerspro- videdthepre-processeddatalike1)T1wimagesthatwereregisteredto theFLAIRimagesusingtheElastixtoolbox(Kleinetal.,2009);2)the T1wandFLAIRimagesthatunderwentcorrectionfortheintensitynon- uniformityusingSPM12(AshburnerandFriston,2000).
We further pre-processed these data for training or testing our method.First,toreducefalsepositives,weremovednon-braintissue, using ROBEX(Iglesiasetal., 2011).Second, we performedintensity normalizationtomatchtheintensitydistributionamongthetraining data.Foreachimage,wecalculatedmeansandvariancesusingintensi- tiesrangingfrom2ndto98thpercentilesinthebrainregion.Wethen normalizedtheintensitieswithinthebrainineachimageusingz-score transformation.Finally,toequalizethesizeoftheinputdatatothenet- work,theaxialslicesineach3Dimagewerecroppedorpaddedtoa sizeof200×200.Weused2Dslicestotrainour2DCNNmodel.
2.2. Networkarchitecture
Inthecurrentstudy,weproposetocombinemulti-scaleHFswith aU-Netarchitecture(Ronnebergeretal.,2015).Themainideaofthe U-Netistheskipconnectionsbetweentheencoderanddecodertoal- lowthenetworktoreusethefeaturemapsintheencoder.Thisgener- allyhelpsthenetworktopredictdensesegmentationresultsandalle- viatesthevanishinggradientproblem.VariantsoftheU-Nethavebeen usedindiversemedicalimagesegmentationproblemsandhavedemon- stratedoutstanding performance(Çiçeket al.,2016; Drozdzalet al., 2018;Guerreroetal.,2018).IntheoriginalChallengein 2017,Four amongthetopteams,includingthetopteam(Lietal.,2018),exploited theU-Netarchitecture(Kuijfetal.,2019).Here,weadvancea2DU- NetforWMHssegmentation(Fig.1)withtwonovelcomponents:useof differentdetailsofthenetworkarchitectureandinclusionofthemulti- scaleHFs.
2.2.1. Detailsofthenetworkconfiguration
Asseenin Fig.1, themodified networkis basedontheencoder- decoderstructure.Intheencoderduringbothtrainingandtesting,the axialslicesofFLAIRandT1wmodalitiesarefedintothenetworkas atwo-channelinput(i.e.,theinputsizeis200×200×2).Wechose theaxialsliceforthe2Dinputdatabecausethebestimageresolution wasfoundontheFLAIRaxialslice.Inournetwork,weadoptthefol- lowingconfigurationsassuggestedinrecentworks:1)Insteadof3×3 kernelconvolutionsinthestandardU-Net,weuse5×5kernelconvo- lutionsinthefirsttwolayersforhandlingdifferenttransformationsas inLietal.(2018);2)Batchnormalizationisaddedtothe18convolu- tionallayerseach,whichacceleratesthetrainingprocessandimprove thenetworkperformancebyreducinginternalcovariateshift(Ioffeand
Szegedy, 2015).3) Finally,we use anexponential linearunit (ELU) (Clevertetal.,2015)astheactivationfunctionfornon-linearitycapac- ityinsteadofarectifiedlinearunit(ReLU)inthestandardU-Net.ReLU isneitheractivatednorupdatedatanegativevalue,whileELUdoesnot onlyhaveallthestrengthsofReLUbutalsoisactivatedandupdatedat negativevalues,improvingthelearningcharacteristics(Clevertetal., 2015).Theencodercontainsfour2×2max-poolinglayerswithastride 2aftereverytwoconvolutionlayersfordownsampling(Fig.1).Upsam- plinglayersbasedonnearest-neighborinterpolationareappliedafter everytwoconvolutionallayersinthedecoder(Fig.1).Priortodown- sampling,thefeaturemapsintheencoderareconcatenatedtothefea- ture mapsrightafterupsampling in thedecoder.At theoutputcon- volutionallayers,a1×1convolutionwithsoftmaxfunctionisusedto convertthefeaturemapsintothelabelspacewiththedepthoftwo(i.e., twoclasses;WMHsandnon-WMHs).
2.2.2. Multi-scalehighlightingforegrounds
Weaddmodifiedlabelimagesbythemulti-scaleHFapproachtothe intermediatelayersinthedecoderinournetwork(Fig.1).Theinter- mediateoutputconvolutionallayersconvertthefeaturemapsintothe multi-scalesegmentationprobabilitymaps(blackarrowsinFig.1).The multi-scaleHFsmax-poolthelabelimage(Fig.2).Giventheforeground pixelsinonelabelimage,thelabelimagemax-pooledbyHFisdefined asfollow:
𝑙𝐻𝐹𝑘 =𝑓𝑀𝑃( 𝑙𝐻𝐹𝑘−1)
(𝑘=1,2, …,𝑛), (1)
Where𝑓𝑀𝑃 isa2 × 2max-poolingoperatorwithstride2and𝑙𝑘𝐻𝐹 is thelabelimagemax-pooled,whichgeneratedbyapplying𝑓𝑀𝑃 𝑘times and𝑙0𝐻𝐹istheoriginalforegroundimage(i.e.,groundtruthlabel).The backgroundimage,𝑙𝐵𝐺𝑘 ,isdefinedasfollow:
𝑙𝐵𝐺𝑘 =1−𝑙𝐻𝐹𝑘 (𝑘=0, 1,2, …,𝑛), (2) Where𝑙𝐵𝐺0 istheoriginalbackgroundimageand𝑙𝐵𝐺𝑘 isabackgroundof inverting𝑙𝐻𝐹𝑘 .
Theforeground/backgroundimagesbymulti-scaleHFsareusedto generatelossesthroughcomparisonwiththecorrespondingoutputs.We used asoft Dice score asalossfunction. Let 𝐿𝑐=(𝑙𝑐0, 𝑙𝑐1,…,𝑙𝑐𝑀)be ground truth labelimage (𝑙𝑐0) and𝑀 represents different scalesfor multi-scaleHF.The𝑐 representsthetypeeitherHFor BG.Then,let 𝑆𝑐=(𝑠𝑐0, 𝑠𝑐1, …,𝑠𝑐𝑀)bethesegmentationresultingfromthenetwork.
𝑠𝑐0and(𝑠𝑐1, …, 𝑠𝑐𝑀)istheoutputsegmentationatthesizeoftheorigi- nallabelimage𝑙𝑐0andmulti-scalelabelimages(𝑙𝑐1, …, 𝑙𝑐𝑀)each.The Multi-scaleLossFunctions(MLF)canbewrittenas
MLF=
∑C
c=1
∑M
m=0
wm2(lcm◦scm)+𝜖
lcm+scm+𝜖, (3)
where𝐶istheforegroundorbackground,◦istheelement-wiseprod- uct,𝑤𝑚istheweightfor𝑚thscalelossfunction,and𝜖isasmoothing constanttopreventMLFfromdivisionby0,whichwesetas0.00001for currentnetworktraining.Thesumofallof𝑤𝑚isone.Weevaluatedvar- ioussetsof𝑤𝑚andfoundthebestsegmentationperformancewhen𝑤𝑚 wasthesameforallofthelossesasdescribedinthefollowingsection.
Fig.1.TheworkflowoftheproposedU-Netwithmulti-scalehighlightingforegrounds.
Fig.2.Illustrationofthemulti-scalehighlightingforegrounds.
3. Experiments
3.1. Evaluationmetricsandrankingsystem
WeusedthefiveevaluationmetricsthattheWMHchallengeadopted tocomparethemethodsthatparticipatingteamsdevelopedquantita- tively(detailsinTable3).LetMLbetheWMHsmanuallylabeledby expertandALbetheWMHsautomaticallylabeledbytheproposedap- proach.Thefiveevaluationmetricsareasfollow:(1)theDicesimilarity coefficient(DSC)astheoverlapindexbetweenMLandAL,(2)amod- ifiedHausdorff distance(95thpercentile;H95)astheoveralldistance betweenMLandALboundaries,(3)theabsolutepercentagevolumedif- ference(AVD)betweenthetotalWMHsofMLandAL,(4)recallasthe sensitivityindetectingindividuallesions,and(5)F1-scoreastheaver- ageofprecisionandrecallindetectingindividuallesions.Thechallenge
organizationdefinedtheindividuallesionsinbothrecallandF1as3D connectedcomponentswithinanimage.Allfivemeasurementsarepos- itivereal-valuedandthecloserthemeasuredvaluesofDSC,recall,and F1approachone,thehigherthesimilaritybetweenMLandAL.Onthe contrary,thecloserthemeasuredvaluesofH95andAVDaretozero, thehigherthesimilaritybetweenMLandAL.Table3detailsthedef- initionofthesefivemetrics.Thesemetricsforourtestingresultswere computedbythechallengeorganization.
Thechallengeorganizationproposedasystemforrankingtheover- all performanceof participatingteams.Thissystemconsistedoffour steps.First,themeanofeachmetricwascomputedoveralltestdatafor eachteam’smethod.Second,foreachevaluationmetric,theorganiza- tionsortedalloftheteamsfrombesttoworst.Next,thebestandworst teamsreceivedarankscoreofzeroandone,respectively,forthatmetric.
Otherteamswereassignedarankscorebetweenzeroandonefollowing
theirresultswithintherangeofthatmetric.Finally,thefiverankscores wereaveragedintotheoverallrankscoreindicatingtheoverallperfor- manceofthatteam.Theoverallrankscorewasusedtodeterminethe rankingofeachteamontheresultboardonthechallengehomepage (https://wmh.isi.uu.nl/results/).
3.2. Implementationdetails
The proposed network and experimental networks were imple- mentedinPythonusingTensorflow(Abadietal.,2016).Thenetworks weretrainedonfourNVIDIATitan-XpGPUswith12GBRAM.Thehyper- parametersof thenetworks weresetasfollows: mini-batchsize=30, optimizer=Adam (Kingma andBa, 2014),learning rate=0.0002, the numberofepochs=1000,andHeinitialization(Heetal.,2015).Early stoppingbasedonavalidationdatasetwasusedtoavoidoverfittingin trainingdata.Theperformanceofallnetworksconvergedwithin1000 epochs.Dataaugmentationwasappliedduringtrainingtoenhancero- bustnessin thefaceof limitedtrainingdata. Tothis end,flipping of axial-slicedimagestoeachaxisandvariousaffinetransformsincluding translation,rotation,scaling,andshearingwererandomlyapplied.The detailsoftheparametersofdataaugmentationwereasfollows:theprob- abilityofflippingeachaxis=0.5,therangeoftranslationratio=(−0.1, 0.1),therangeofrotation=(−15◦,15◦),therangeofscalingratio=(0.9, 1.1),therangeofshearing=(−18◦,18◦).Thenetworksweretrainedon an1:3ratiooforiginaldatatoaugmenteddataateachepoch.
Toachieve robust segmentation results, we applied anensemble methodandaflipaveragingtotheproposedmethod.Inatrainingstep, weperformed5-fold cross-validationwhereeachfold(n=12)images wererandomlyandequallysampledfromeachsitedatasetinthewhole trainingdataset(n=60).Then,validatingeachfold,wetrainedthepro- posednetworkusingtheotherfourfolds(n=48),resultingin5networks trainedseparately.Inatestingstep,wefirstflippedeachindividualim- agewithrespect tothex-axis,y-axis,andxy-axis,whichgenerated3 flippedimagesperindividual.Then,eachofthefourimageswasused asinputtoanetworkgeneratedfrom5-foldcross-validation.Theflip averagingwasthemajorvotingoffouroutputsfromanoriginalinput andthreeinputsflippedtothex-axis,y-axis,andxy-axisinanetwork.As aresult,thefinaloutputwasthemajorvotingoffiveoutputsgenerated fromtheflipaveragingofeachnetwork.
3.3. Theevaluationoftheimportanceofmulti-scalehighlighting foregrounds
Toinvestigatetheimportanceofmulti-scaleHFinWMHssegmen- tation,weevaluatedourmodelwithvariousparametersofmulti-scale HF,whichincluded1)thetypeofpoolingforHF,and2)theweightsof lossesatoutputlayers.Weperformedthisevaluationthroughacross- scannervalidationusing thetrainingdatasets(60subjects)of WMHs segmentationchallenge.Thedatasetfortheevaluationwassplitintoa trainingdataset(1stand2ndsites:40subjects)andavalidationdataset (3rdsite:20subjects).Ourevaluationresultsarebasedontherawnet- workoutput(i.e.,withoutensemblingandflipaveraging).
ThetypeofpoolingforHF:Wecomparedthenetworkusingthepro- posedmax-pooledlabelimageswiththecorrespondingaverage-pooling version.Wedownsampledthegroundtruthlabelimageintothelower resolutionsattheintermediate outputlayers byrepeatedlyusinga2
×2averagepoolingoperationwithstride2.Wethengeneratedtheir hard labelsbythresholdingthedownsampled softlabels at 0.5.The networkforthisexperimentwasequippedwitheitherfourmax-pooled orfouraverage-pooledlabelimagesandthesamelossweightsatall oftheoutputlayers.Toavoidcherry-pickingresultsbyrandomseed, weperformedacross-scannervalidation(3-foldcrossvalidation)with 5runsof randominitialization(resultingin15 trainednetworks for eachtestedmethod)fortheoriginalU-Net,U-NetwithHF,andU-Net withaverage-pooledlabels(AVG).Eachfoldrepresentedeachscanner dataset:i.e.,theUMCdataset,NUHSdataset,orVUdataset.Wetreated
theresultsofeachfoldasanindividualsample.Weperformedpairedt- testsforthecomparisonamongthetestedmethodsandcalculated95%
confidenceinterval(CI)accordingtothepairedsampletest.
The weights oflosses about allof the outputlayers: In this experi- ment,weinvestigatedtheimpactofthesetof𝑤𝑚 inEq.(3).Thenet- workwithfourmax-pooledlabelimageswasusedforthisexperiment.
Three setsof𝑤𝑚 weretested:1)𝑤0=155,𝑤1=154,𝑤2=153,𝑤3=152, and𝑤4= 1
15;2)𝑤0= 3
15,𝑤1= 3
15,𝑤2= 3
15,𝑤3= 3
15,and𝑤4= 3
15;3)Fi- nally,𝑤0= 1
15,𝑤1= 2
15,𝑤2= 3
15,𝑤3= 4
15,and𝑤4= 5
15;Forthisexper- iments,wetrainedthreenetworks(onlyonerunforeachfold;atotal ofthreeruns)andaveragedtheresultsonapatient-leveltoinvestigate statisticaldifferences.
3.4. Clinicalutilityevaluation
Theaimherewastoassesstheclinicalutilityoftheproposedmethod byevaluatingwhethertheproposedHF-basedautomatedvolumetryis abiomarkerof thedementiaseverity(i.g.,cognitiveperformancede- cline)oradiagnosticmeasurementofdementia.Accordingly,weana- lyzedtheassociationoftheWMHvolumeswithcognitiveperformance scoresorwiththediagnosisamongcognitivelynormal(CN),mildcog- nitiveimpairment(MCI),andearlyAlzheimer’sDisease(AD).Also,to analyzetheeffectofWMHvolumeonsubjectdiagnosis(CN,MCI,and AD),wecomparedclassificationperformancesusinglogisticregression underthreedifferentconditions(WMHvolumeonly,clinicalvariables only,andWMHvolumeandclinicalvariablescombined).
3.4.1. ADNIdatasetandpreprocessing
WeemployedthedatasetsfromtheAlzheimer’sDiseaseNeuroimag- ingInitiative(ADNI)database(http://adni.loni.usc.edu).Werandomly selected243ADNIsubjectswhocompletedcognitiveevaluationsand MRIscansattheirbaselinevisits.Thedatainclude73CN,115MCI, and55ADsubjectsatbaselinediagnosis.WeusedMini-MentalState Examination(MMSE),Alzheimer’sDiseaseAssessmentScale-Cognitive Subscale (ADAS-Cog),andClinical DementiaRatingScalesum(CDR sum)ascognitiveperformanceevaluationsthatfollowedastandardized protocol(Petersenetal.,2010).ThescorerangesofMMSE,ADAS-Cog, andCDRsumwere0–30,0–70,and0–18,respectively.AhigherADAS- CogoraCDRsumscoreindicateslowercognitiveperformance.Onthe otherhand,alowerMMSEmeanslowercognitiveperformance.Each individualMRIscanconsistedofasetofaT1wimageandFLAIRimage acquiredaxial-plane.Table2detailsdemographicandclinicalinforma- tion.
We pre-processedall imagesthrough thesamesteps thatarede- scribedin Section2.1. Challengedatasetandpre-processingfor con- sistentdataprocessing.Wedidnotperformthecroppingorpaddingof theimagesto200×200axialplanesbecauseinputtinganarbitrarysize imageisacceptedinfullyconvolutionalnetworksattesttime.
3.4.2. Statisticalanalysis
Weusedagenerallinearmodel(GLM)toassesswhetherWMHvol- umescomputedusingourmethodwereassociatedwiththecognitive performancescoresorthediagnosisofsubjects.Weincludedeachtype ofthecognitivescoresorthediagnosis(i.e.,CN,MCI,andAD)asade- pendentvariableandtheWMHsvolumeasanindependentvariable.In theGLM,wealsoincludedage,cardiovascularrisk,education,gender, ApoE4genotype,andraceascovariatestocollectfortheirconfound- ingeffects.Thecardiovascularriskscorerangedfrom0to5bycount- ingthefollowingdiseasesorcharacteristicsindividually:hypertension, stroke,smoking,diabetesmellitus,andcardiovasculardisease.Tomiti- gatethepossibleissueoftheskeweddistributionofthecognitivescores andWMHvolumes,weappliedthesquareroottransformationtoMMSE andCDRsum,andthelog-transformationtoWMHvolumesassuggested inCarmichaeletal.(2010).Then,allthetransformedvaluesandADAS- Cogwerenormalizedusingthez-scoretransformation.Allthestatistical testswereimplementedusingMatlab2019b.
Table2
Theoverviewofthecharacteristicsof243ADNIsubjects.
Characteristic
Total,Mean (SD)
Diagnosis at baseline, Mean (SD) Cognitively normal MCI AD
Sample, No. 243 73 115 55
Age, y 73 (7.5) 75 (6.5) 71 (7.9) 76 (7.1)
CV risk 1.7 (1.0) 1.7 (1.1) 1.7 (1.0) 1.6 (1.0) Education, y 16 (2.7) 16 (2.5) 17 (2.7) 16 (2.7) Male, No. (%) 123 (51) 36 (49.3) 56 (48.7) 31 (56.4) ApoE4 genotype, 0/1/2 114/100/29 48/23/2 50/49/16 16/28/11 Race
White 218 63 105 50
Other 25 10 10 5
MMSE score 27 (2.8) 29 (1.3) 28 (2.1) 23 (2.1) ADAS-Cog score 11 (6.8) 6 (3.1) 10 (4.8) 20 (5.8) CDR sum 1.75 (1.99) 0.12 (0.53) 1.4 (1.1) 4.6 (1.7) WMHs, c m 3 9.94 (11.74) 9.13 (14.33) 8.72 (9.04) 13.58 (12.41)
Table3
Listofevaluationmetricsandtheirdefinitions.Theorganizationofthechallengedefined individuallesionsforrecallandF1as3Dconnectedcomponents.𝑁meansthenumber of3Dconnectedcomponents. ̂𝑑(𝑋,𝑌)meansmax
𝑥∈𝑋min
𝑦∈𝑌𝑑(𝑥,𝑦),where𝑑(𝑥,𝑦)isthedistance betweenxandypoints.𝑉𝑀𝐿and𝑉𝐴𝐿meantheWMHvolumecomputedbymanual segmentationandourmethod,respectively.
DSC H95 AVD Recall F 1
Equation 𝐹𝑃+2𝐹𝑇 𝑃𝑁+2𝑇 𝑃 max { ̂𝑑 ( 𝑋, 𝑌 ) , ̂𝑑 ( 𝑌 , 𝑋 )} |𝑉𝑀𝐿𝑉−𝑉𝐴𝐿|
𝑀𝐿
𝑁𝑇 𝑃 𝑁𝑇 𝑃+𝑁𝐹 𝑁
2𝑁𝑇 𝑃 2𝑁𝑇 𝑃+𝑁𝐹 𝑁+𝑁𝐹 𝑃
∗Abbreviations–DSC:theDiceSimilarityCoefficient;H95:amodifiedHausdorff dis- tance(95thpercentile);AVD:theabsolutepercentagevolumedifference;Recall:the sensitivityfordetectingindividuallesions;F1:F1-scoreforindividuallesions;TP:true positive;FN:falsenegative;FP:falsepositive.
3.4.3. Classificationanalysis
WeassessedwhetheralterationsinWMHvolumewereusedtoclas- sifyanunseenindividualintoCN, MCI,orAD.Tothisend,weused logisticregressionasaclassifierandperformedtheclassificationunder threedifferentconditionswhereinputfeaturesvaried:1)WMHvolume only;2)clinicalvariablesonly(age,cardiovascularrisk,education,gen- der,ApoE4genotype,andrace;thesearementionedinSection3.4.2), and3)WMHvolumeandclinicalvariablescombined.Theclassification wasevaluatedusingaleave-one-outstrategy.Wecalculatedthereceiver operatingcharacteristic(ROC)curvesfromtheclassificationresulting infromeachofthethreeconditionsandcomparedtheirareaunderthe curve(AUC)values.ClassificationanalysiswasprocessedusingMatlab 2019b.
4. Results
Inthissection,wecompareourresultswiththeresultsofthetop 2nd–5thalgorithmslistedintheWMHSegmentationChallenge asof April30,2020.Wealsoshowtheinfluenceofmulti-scaleHFonthepro- posednetwork’ysperformance.Finally,theresultsoftheclinicalanaly- sisarepresented.
4.1. ResultsoftheWMHsegmentationchallenge
As of March 1st2020, the following four teams, as well as our team, were listed as the top five in the challenge leaderboard: 1) sysu_media_2– deep2Dmulti-scalestackedU-Netandensemblelearning;
2)sysu_media– fullyconvolutionalensembleneuralnetworks(Lietal., 2018);3)anonymous_20200413– brainatlasguidedattentionU-Net;
and4)coroflo– multi-dimensionalconvolutionalgatedrecurrentunits andensemblelearning.Moredescriptionofeachmethodisavailableon thechallengewebsite.
Table4showstheresultsofthetopfiveteamsaboutfivemetrics(i.e., DSC,H95,AVD,recall,andF1).Boldtextindicatesthebestperformance
amongallalgorithmsforthegivenmetric.Ourmethodpgsachievedthe bestoverallperformance.
TheresultsofourmethodaredetailedinTable5.Partofthetest dataset(n=90)wasfromthreesites(Utrecht,Singapore,andAMSGE3T) thatmatchedthesitesfromwhichthetrainingdatasetswereacquired.
Whereastheother20subjectswerefromAMSGE1.5TandAMSPETMR whichwerecompletelyunseen.However,theproposedmethoddemon- stratedsimilarperformanceinthisunseendatasetrelativetothethree- sitedataset.
4.2. Theinfluenceofthemulti-scalehighlightingforegroundsandother parametersonsegmentationaccuracy
Weevaluatedthesegmentationaccuracyoftheproposednetwork underdifferentconfigurationsofthemulti-scaleHF.Table6showsthe segmentationresultsand95%confidenceintervalsfor15multipleruns oforiginalU-Net,U-NetwithHF,andU-NetwithAVG.TheU-Netwith HFachievedthebestperformanceacrossallmetricscomparedtothe othertwomethods.Table7showstheeffectoftheweightsoflossesat alltheoutputlayersonsegmentationaccuracy.Thenetworkwithequal weightsamongall thefiveoutputlayersoutperformed thenetworks usingotherarrangementsoftheweights.
4.3. Clinicalutilityoftheproposedsegmentationapproach
WMHvolumescomputedusingourmethodweresignificantlyasso- ciatedwithcognitivescoresandgroupdiagnosisoftheANDIsubjects (Table8;MMSE,CDRsum,andgroupdiagnosis:Bonferronicorrection p-value <0.05; ADAS-Cog:FDRcorrectionp-value< 0.05). Inother words,thebiggerWMHwas,themorewascognitivedeclineandthe moreseverewasthediagnosisofapatientobserved.Furthermore,our linearmodelshowedthata1-standarddeviation(SD)increaseinWMH volumecorresponded0.133and0.165SDincreasesinADAS-Cogand CDRsumscores,respectively,anda0.176SDdecreaseinMMSEscore.
Table4
ComparisonofourmethodwithothermethodsintheMICCAIWMHsegmentationchal- lenge.Boldfontsindicatethebestresultamongallteams.
Team Rank DSC H95 (mm) AVD (%) Recall F 1
1 pgs (ours) 0.0185 0.81 5.63 18.58 0.82 0.79 2 sysu_media_2 0.0187 0.80 5.76 28.73 0.87 0.76 3 sysu_media 0.0288 0.80 6.30 21.88 0.84 0.76 4 anonymous_20200413 0.0314 0.79 6.17 22.99 0.83 0.77
5 coroflo 0.0493 0.79 5.46 22.53 0.76 0.77
∗Abbreviations–DSC:theDiceSimilarityCoefficient;H95:amodifiedHausdorff distance (95thpercentile);AVD:theabsolutepercentagevolumedifference;Recall:thesensitivity fordetectingindividuallesions;F1:F1-scoreforindividuallesions.
Fig.3. Thereceiveroperatingcharacteristic(ROC)curvesforlogisticregressionabouttwotypesoftheclassification(CNvsADandMCIvsAD).
Table5
Resultsaboutfivemetricsoftheproposedmethodinthetestdatasetofthe challenge.
DSC H95 (mm) AVD (%) Recall F 1 Utrecht ( n = 30) 0.81 6.76 18.62 0.81 0.75 Singapore ( n = 30) 0.84 4.71 15.69 0.83 0.80 AMS GE3T ( n = 30) 0.80 3.74 22.03 0.83 0.82 AMS GE1.5T ( n = 10) 0.74 9.30 22.09 0.75 0.77 AMS PETMR ( n = 10) 0.80 7.00 13.25 0.82 0.81 Weighted average 0.81 5.63 18.58 0.82 0.79 rank [0…1] 0.000 0.004 0.003 0.086 0.000
EventhoughfeedingWMH volumealonetotheclassifierdid not achieveabetterclassificationcomparedtousingclinicalvariablesonly (Fig.3),WeobservedthatcombiningWMHvolumewithclinicalvari-
ablesresultedinthebestclassificationperformancesbetweenCNand AD(AUC=0.75)andbetweenMCIandAD(AUC=0.67).Intheclassifica- tionofCNvs.MCI,theclassificationperformancewasnotimprovedby combiningWMHvolumeandclinicalvariables(AUC=0.58)compared tousingclinicalvariablesonly(AUC=0.59).
5. Discussion
WeproposedanewU-Netvariantwithmulti-scalehighlightingfore- grounds(HF)inthispaper.Ournetworkframeworkwasdesignedto improvethedetectionoftheWMHvoxelsinvolvingadegreeofpar- tial volumeeffects.Weaddedthemulti-scalelabelimagesthatwere max-pooledbyHFtoaU-NetforWMHssegmentation.Theproposed methodhasbeenplacedatthetoprankfortheoverallscoreintheMIC- CAIWMHSegmentationChallenge.TheWMHvolumecomputedusing ourautomatedapproachwassignificantlyassociatedwithcognitiveper-
Table6
Thesegmentationresultsand95%confidenceintervalsforthe15multiplerunsoforiginalU-Net,U-NetwithHF,andU-Netwith average-pooledlabels(AVG).Boldfontsindicatethebestresult.
DSC HD95 (mm) AVD (%) recall F 1-score
Original U-Net Mean ± SD 0.8033 ± 0.0204 ∗ 6.2509 ± 1.6870 ∗ 18.8337 ± 6.086 ∗ 0.7261 ± 0.0538 ∗ 0.7273 ± 0.0466 ∗ 95% CI [0.0060, 0.0089] [ − 0.7435, − 0.3917] [ − 2.8959, − 1.2721] [0.0212, 0.0440] [0.0228, 0.0298]
Original + AVG Mean ± SD 0.8037 ± 0.0191 ∗ 6.9537 ± 2.5907 ∗ 17.6740 ± 6.4449 0.7373 ± 0.0552 ∗ 0.7340 ± 0.0513 ∗ 95% CI [0.0055, 0.0085] [ − 1.7264, − 0.8143] [ − 2.0247, − 0.1762] [0.0086, 0.0341] [0.0126, 0.0266]
Original + HF Mean ± SD 0.8107 ± 0.0203 5.6833 ± 1.8080 16.7497 ± 6.9472 0.7587 ± 0.0456 0.7536 ± 0.0436
∗p-valuecorrectedbyFDR<0.05betweenU-NetwithHFandother(pairedt-test).
Abbreviation– SD:standarddeviation;CI:confidenceinterval.
Table7
Influenceofvariousweightingamongalltheoutputlayerlossesonthesegmentationaccuracy.Boldfontsindicatethebest result.
DSC H95 (mm) AVD (%) Recall F 1
Original + HF (all 0.2) 0.8139 ± 0.0878 5.0383 ± 4.3894 15.893 ± 24.845 0.7725 ± 0.1033 0.7575 ± 0.0903 Original + HF (1/15 →5/15) 0.8095 ± 0.0952 5.4751 ± 4.9443 17.890 ± 29.337 0.7707 ± 0.1055 0.7523 ± 0.0956 Original + HF(5/15 →1/15) 0.8093 ± 0.0912 6.1427 ± 5.7800 16.709 ± 18.847 0.7496 ± 0.1020 0.7595 ± 0.0746
Table8
SummaryoftheresultsofthegenerallinearmodelusingWMHvolumesandcovariatesasindependentvariables andcognitivescoresandgroupdiagnosisasdependentvariables.
Cognitive scores
Group(CN/MCI/AD)
MMSE ADAS-Cog CDR sum
β p-value β p-value β p-value β p-value
WMH volume − 0.176 < 0.001 ∗∗ 0.133 0.017 ∗ 0.165 0.009 ∗∗ 0.131 0.008 ∗∗
Age – – – – – – – –
CV risk – – – – – – – –
Education 0.062 < 0.001 ∗∗ − 0.061 0.001 ∗∗ − 0.047 0.031 ∗ – –
Gender – – – – – – – –
ApoE4 genotype − 0.296 < 0.001 ∗∗ 0.184 < 0.001 ∗∗ 0.421 < 0.001 ∗∗ 0.299 < 0.001 ∗∗
Race – – – – – – – –
∗p-valuecorrectedbyFDR<0.05.
∗∗p-valuecorrectedbyBonferronimethod<0.05.
formancescores(MMSE,ADAS-Cog,andCDR-sum)andthedementia diagnosis(CN,MCI,andAD).Furthermore,theautomatedWMHvol- umetryimprovedtheclassificationofunseensubjectsintoCN,MCI,or AD.
5.1. ComparisontoothermethodsevaluatedintheWMHsegmentation challenge
Ourmethodhasachievedthebestoverallevaluationscores,thehigh- estdicesimilarityindex,andthebestF1-scoreintheMICCAIWMHSeg- mentationChallengeamongallthelisted39methods.Ourmethodhas alsoachievedthetop5forotherevaluationmetrics(Hausdorff distance 95:2nd,averagevolumedifference:3rd, andrecall:5th).Giventhat theDicesimilarityindexrepresentstheoverlapbetweentheautomated andmanualsegmentationandtheHausdorff distancerepresentstheir boundarygap,ourachievementofhighaccuracyinthesetwoindices demonstratesthattheproposedmethodsuccessfullydetectstheWMH voxelsthatarelocatedeitherattheboundaryoftheWMHorinsmall WMHvolumesandconsequentlyexposedtoPVE.Indeed,avisualin- spectionofindividualsegmentationsshowsthesuperiorsegmentation ofsuchpartialvolumevoxelsintheproposedU-Netwithmulti-scaleHF comparedtothestandardU-Net(Fig.4).InFig.4,theproposedmethod moreaccuratelysegmentedWMHsontheboundaryofmanualWMHs (Subject1)aswellassmallclustersofWMHs(Subjects2-4)compared tothestandardU-Net.Despitearelativelylowrecall(5thrank,meaning relativelymorefalse-negativevoxelsdetected),weachievedthebestF1- scorewhichistheharmonicmeanofrecallandprecision.Thissuggests twothings.First,ourmethodyieldedahighertrue-positiverateanda lowerfalse-positiveratethanothermethods.Second,ourmethodmay havedifficultyin detectingsomeWMHvoxelseventhoughitdetects thesmallclusterandtheborderofWMHsbetterthanotherapproaches.
5.2. Configurationofthemulti-scalehighlightingforegrounds
Thesegmentationperformancewasthebestwhensettingequalall theweightsofdifferentscaleHFssuggestingthatthefeaturemapsgen- eratedfromallscalesareequallyimportantfordeeplearningofWMHs segmentation.Our method achievedstatistically significantimprove- mentsin allmetricscompared totheoriginalU-Net, andsignificant improvementsinallmetricsexceptedaveragevolumedifferencecom- paredtotheU-NetwithAVG(Table6).AlthoughU-NetwithAVGtends
tohavebetterperformanceinallmetricsthantheoriginalU-Net,these differences didnotreachthestatisticalsignificance.These resultsin- dicatethatthesuperiorperformanceofU-NetwithHFdidnotmerely resultfromaluckyrandomseed.ThesignificantimprovementsinDice scoreandF1-scorebyourmethodsuggestthattheproposedmethodcan moreaccuratelydetectWMHclustersthatarehardtodetectbyother methods,suchassmallWMHsinvolvinglargepartialvolumeeffects.
Basedontheseresults,therefore,weconfirmthattheproposedHF approachishighlyadvantageoustoWMHssegmentationandlikelyto segmentationofbrainlesionswhichhavesimilarcharacteristics(size, shape,orintensity),suchasmultiplesclerosis(Weedaetal.,2019),mi- crobleeds(Seghieretal.,2011),andperivascularspace(Ballerinietal., 2018).Sinceourmethodusesmulti-scalelabelimagesemphasizingfore- groundvoxels(i.e.,unbalanceddata),itcanbeusedincombinationwith variouslossfunctions(boundaryloss,(Kervadecetal.,2019);focalloss, (Linetal.,2017);Tverskyloss,(Salehietal.,2017);andfocalTversk loss,(AbrahamandKhan,2019))toovercometheproblemofunbal- anceddata. Itmaybeworth tryingtocombinetheabove-mentioned lossfunctionsandHFinvariouswaysandcomparetheresultstofinda lossfunctionthatfitswellwithHF.Additionally,ourmethodcanalso simplybe appliedtovariousnetworks basedontheencoder-decoder structuresuchasU-Netvariantsorothervariantsofdeepsupervision methods.
5.3. Clinicalevaluation
PreviousstudiesshowedthattheincreaseinWMHsvolumerelatesto cognitivedecline(Barberetal.,1999;Duboisetal.,2014;Habesetal., 2016;Leeetal.,2016;PrinsandScheltens,2015).Onthebasisoffind- ingsinthesestudies,weevaluatedtheclinicalutilityoftheproposed approachbyinvestigatingwhetherautomatedWMHvolumetrycanpre- dictcognitiveperformancedeclinesorthediagnosisofasubject(CN, MCI,andAD).Ourresultsdemonstratethattheautomaticallycomputed WMHvolumeissignificantlyassociatedwithcognitiveperformancein thedirectionwehypothesized(Table8).
Furthermore,wehypothesizedthatfeedingtheautomaticallycom- putedWMHvolumestoaclassifierindividuallycandiagnosesubjects.
Theresultsofthisexperimentshowedthatclassificationperformancefor CNvs.ADandMCIvs.ADcanbeimprovedusingthecombinedfeature- setofWMHvolumesandclinicalvariables.Theseresultsareconsistent
Fig.4.Segmentationresultsinthetrainingdataset.
Fromtoptobottomareaxialslicesoffourdifferent subjects.FromlefttorightareFLAIRimage,ground truth,theresults fromthestandard U-Net,andthe resultsfromtheproposedU-Netwithmulti-scaleHF method.YellowvoxelsindicateWMHssegmentedus- ingeachmethod.Colorboxesshowourapproach’sim- provementoffalse-positives(orangeborder)orfalse- negatives(greenborder)observedinthestandardU- Net.
withpreviousfindingsthatWMHsprovideanimagingmarkerforAD (Habesetal.,2016;Leeetal.,2016;PrinsandScheltens,2015).
6. Conclusions
Inthecurrentstudy,weproposedaU-Netwithmulti-scalehighlight- ingforegrounds(HF).Ourvariousevaluationsshowthattheproposed method improvesdetectingWMH voxels withpartialvolume effects asintended.However,itstillremainschallengingforourmodeltore- tainbothhighprecisionandrecall.Attention-basedmodels(Chenetal., 2016b;Wooetal.,2018)thateffectivelylearnimportantcharacteristics ofatargetstructureforsegmentationcanpotentiallybeawaytosolve thisissue.ToimproveWMHssegmentation,integratinganattention- basedmodelintodeepneuralnetworksisthussuggestedinthefuture.
Ourclinicalevaluationdemonstratestheclinicalutilityofourmethod.
Yet,theindividualdiagnosisof unseensubjectsusingWMH volumes aloneisbelowtheclinicalstandard.Inafurtherstudy,otherinforma- tionofWMHssuchasthelocationordistributionofWMHvolumesand thelongitudinaltrajectory ofWMH volumechangeswouldbe incor- poratedfortheimprovementofindividualdiagnosis.Theimplementa- tionofourproposedmethodisavailableatDockerhub(Merkel(2014); https://hub.docker.com/r/wmhchallenge/pgs).
Dataandcodeavailabilitystatements
The data that support the findings of this study are openly available in 2017 MICCAI WMH Segmentation Chal- lenge homepage at https://wmh.isi.uu.nl/data/, reference number (Kuijf, 2019). Our code also is openly available at https://hub.docker.com/r/wmhchallenge/pgs.
Creditauthorshipcontributionstatement
GilsoonPark:Methodology,Software,Formalanalysis,Writing– originaldraft,Visualization.JinwooHong:Methodology,Formalanal- ysis. BenA. Duffy:Methodology, Software. Jong-Min Lee:Concep- tualization,Methodology,Supervision.HosungKim:Formalanalysis, Methodology,Writing– originaldraft.
Acknowledgments
ThisresearchwassupportedbytheBio&MedicalTechnologyDevel- opmentProgramoftheNationalResearchFoundation(NRF)&funded by theKoreangovernment(MSIT) (No.2020M3E5D9080788)anda grantoftheKoreaHealthTechnologyR&DProjectthroughtheKorea HealthIndustryDevelopmentInstitute(KHIDI),fundedbytheMinistry of Health & Welfare, RepublicofKorea (grantnumber: HI19C0745) andagrantoftheNationalInstitutesofHealthgrants(P41EB015922, U54EB020406,U19AG024904),BrightFocus(A2019052S).
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., 2016. Tensorflow: Large-Scale Machine Learning on Hetero- geneous Distributed Systems. arXiv preprint arXiv:1603.04467.
Abraham, N. , Khan, N.M. , 2019. A novel focal tversky loss function with improved atten- tion u-net for lesion segmentation. IEEE, pp. 683–687 .
Abraham, H.M. , Wolfson, L. , Moscufo, N. , Guttmann, C.R. , Kaplan, R.F. , White, W.B. , 2016. Cardiovascular risk factors and small vessel disease of the brain: Blood pressure, white matter lesions, and functional decline in older persons. J. Cereb. Blood Flow Metab. 36, 132–142 .
Ashburner, J. , Friston, K.J. , 2000. Voxel-based morphometry —the methods. Neuroimage 11, 805–821 .