에이유피알씨(The Area under the Precision-Recall Curve,

F. 평가 척도

3. 에이유피알씨(The Area under the Precision-Recall Curve,

에이유피알씨(AUPRC)는 피알씨(PRC) 곡선 아래 영역의 넓이 값으로 피알씨 곡선에서 x 축은 재현율(recall), y 축은 정밀도(precision)로 표현되며 파라미터 조절에 따른 값의 변화를 그래프로 표현한 것이다. 정밀도와 재현율은 다음과 같이 정의된다.

정밀도=모델이 정확하게 분류한 타겟 환자 수(TP) 모델이 타겟환자로 예측한 수(TP + FP) 재현율=모델이 정확하게 분류한 타겟 환자 수(TP)

실제 타겟 환자 수(TP + FN)

피알씨 곡선의 그래프 모형은 다음과 같다.

그림 21. 피알씨 곡선의 예시

48

4. 에프원 스코어(F1-score)

에프원 스코어(F1-score)는 정밀도와 재현율의 조화평균으로 계산되며 가중치를 준 평균이라고 해석할 수 있다. 주로 데이터가 불균등할 때 사용된다.

수식은 아래와 같다.

에프원 스코어 = 2 × 정밀도× 재현율 정밀도+ 재현율

에프원 스코어는 매크로(macro), 웨이티드(weighted), 마이크로(micro)로 나뉜다. 매크로는 클래스 크게 가중치를 주지 않아 클래스 크기에 상관없이 모든 클래스를 같은 비중으로 다룬다. 웨이티드는 클래스별 샘플 수로 가중치를 두어 평균을 계산한다. 마지막으로 마이크로는 모든 클래스의 TP, FP, FN 의 총 수를 헤아린 후 수치를 계산한다. 본 연구는 클래스별 샘플 수가 다르기 때문에 웨이티드를 성능 평가에 활용했다.

에프원 스코어는 결합된 지표이기 때문에 정해진 클래스 판별 기준값(threshold)이 존재하지 않는다. 다만, 알고리즘을 서로 비교할 때 정밀도과 재현율을 복합적으로 고려하고 싶을 경우 에프원 스코어를 통해 비교한다.

49 Ⅳ ^{. 실험 결과}

A. 예측 모델별 성능

모델별로 초매개변수 설정 실험을 통해 최적의 성능을 결과로 추려 비교하였다. 성능 비교에 앞서 모델별 구조 설명은 다음과 같다.

표 9. 모델별 구조 설명

모델 이름 구조 설명

RNN 단일 RNN

BRNN 양방향 RNN

CNN 단일 CNN

Series CRNN 진단+처방 정보를 직렬구조의

CNN, RNN 으로 학습 Parallel CRNN-All 진단+처방 정보를 병렬구조의

CNN, RNN 으로 학습 Parallel CRNN-Dx,Rx 진단 정보를 CNN, 처방 정보를

RNN 으로 병렬 학습 Parallel CRNN-Rx,Dx 처방 정보를 CNN, 진단 정보를

RNN 으로 병렬 학습

50

51

52

Parallel CRNN-Dx,Rx with

Attention

0.7775 0.8678 0.6047 0.7923

Parallel CRNN-Rx,Dx

0.8207 0.8824 0.6361 0.8293

Parallel CRNN-Rx,Dx with

Attention

0.8047 0.8796 0.6139 0.8135

웨이티드 에프원 스코어를 기준으로 국민건강보험공단 표본연구 DB 데이터셋의 경우, Parallel CRNN-Rx,Dx with Attention 모델이 가장 좋은 성능을 나타냈고 아주대학교병원 데이터셋의 경우, Parallel CRNN-Rx,Dx 모델이 가장 좋은 성능을 나타낸 것을 확인할 수 있다.

53

1 376112 Diabetic polyneuropathy 1.9 2 319835 Congestive heart failure 1.8

3 4319447 Urolithiasis 1.7

4 133637 Second degree burn of lower limb 1.6 5 443729 Peripheral circulatory disorder

associated with type 2 diabetes mellitus

1.5

6 4174977 Diabetic retinopathy 1.4 7 4091164 Tuberculosis of intrathoracic lymph

nodes, confirmed bacteriologically and histologically

1.1

7 4313846 Granulomatous hepatitis 1.1 7 381854 Disorder of conjunctiva 1.1 7 381252 Benign neoplasm of eye 1.1

54

결석 환자는 만성 신질환의 위험이 2 배가 높으며, 여성이고 과체중일 경우 위험성이 증가하는 것을 확인할 수 있다(Gambaro et al., 2016). 따라서, 3 순위를 차지한 요로결석(urolithiasis)은 만성 신질환과 관련이 깊음을 시사한다. 4 위를 차지한 하지의 2 도 화상(Second degree burn of lower limb) 환자의 경우, 신장 기능 장애가 올 확률이 높아 만성 신질환과 관련이 있는 것을 확인할 수 있다(Ibrahim et al., 2013).

55

1 432585 Blood coagulation disorder 30

2 134569 Erythema infectiosum 12

3 443731 Renal disorder due to type 2 diabetes

mellitus 11.3 4 4174977 Diabetic retinopathy 9.4

5 197320 Acute renal failure syndrome 9.0

6 444078 Inflammation of cervix 8.8

7 376065 Neurologic disorder associated with type 2 diabetes mellitus

8.8

8 381849 Degenerated eye 8.1

9 4166909 Superficial injury of lower limb 7.9

10 442752 Muscle pain 7.4

56

위험 인자로 소개하고 있다(Wong et al., 2013). 10 위에 위치한 근육통(muscle pain)은 만성 신질환 환자들의 공통적인 증상으로 알려져 있다(Caravaca et al., 2016).

57

mellitus), 안질환(eye disease), 혈액응고장애(blood coagulation disorder), 신장질환(renal disease), 근육통(muscle pain), 울혈성 심부전(congestive heart failure), 요로결석(urolithiasis), 하지의 2 도 화상(Second degree burn of lower limb) 으로 만성 신질환의 중요한 위험 요인인 것을 문헌을 통해 확인할 수 있었다. 문헌에서 발견하지 못한 전염성 홍반(erythema infectiosum), 자궁 경부염증(inflammation of cervix), 하지 표면 상해(superficial injury of lower limb), 결핵(tuberculosis), 결막 장애(disorder of conjunctiva), 눈의 양성 종양(benign neoplasm of eye)은 만성 신질환과 관련성을 확인하지 못하였다. 그러나 임상적으로 만성 신질환의 드러나지 않은 위험 요인들이 계속해서 발견되고 있기 때문에 확인되지 않은 자질들은 잘못 추출된 것이 아니라 아직 밝혀지지 않은 숨겨진

58

76.3%였고 에프원 스코어는 19.2%, 21.3%였다(Vijayarani and Dhayanand, 2015). Ramya and Radha 연구에서는 나이, 성별, 검사 데이터를 이용하여

59

고안해야 할 필요가 있을 것으로 생각된다. 또한, 어텐션 메커니즘의 1 차원적인 결과값만을 추출하여 패턴 정보의 흐름을 확인하지 못하였는데 단순히 높은 가중치만을 볼 것이 아닌 시점별로 가중치를 시각화하여 2 차원적인 패턴의 흐름을 추후 확인할 필요가 있다.

60 Ⅵ. 결론

본 연구를 통해 이에이치알 데이터를 활용한 딥러닝 기반의 만성 신질환 위험도를 예측하였다. 진단 및 처방 정보만을 이용하여 만성 신질환의 진단 여부를 예측하고, 시계열 정보를 고려하는 데 효과적인 딥러닝 기반의 모델을 적용하여, 다양한 모델 구조를 통해 성능을 비교 분석하였다. 또한 이에이치알 데이터를 바로 신경망에 입력하지 않고 임베딩을 통해 데이터의 희소성을 완화하고 어텐션 메커니즘을 적용하여 어떤 시점에서 만성 신질환 위험도 예측에 결정적인 영향을 미쳤는지 분석하였다.

진단 정보와 처방 정보만을 이용하여 조기 발견이 어려운 만성 신질환을 보다 효과적으로 예측하여 만성 신질환의 발병 지연 및 유병률 감소에 기여할 수 있을 것이며 더 나아가 본 모델이 다기관에 범용적으로 적용이 가능한 만성 신질환 조기 예측 전략으로 활용되길 기대한다.

61 참고문헌

1. Kopple JD: National Kidney Foundation K/DOQI clinical practice guidelines for nutrition in chronic renal failure. Am J Kidney Dis 37: S66–S70, 2001 2. Lovey AS, Coresh J: Chronic kidney disease. Lancet 379:165-180, 2012 3. 이형민, 오경원: 만성콩팥병 유병현황, 2013

4. 김영훈: 만성 콩팥병 단계에 다른 치료의 최신 경향, 대한내과학회지:

제 76권 제 5호, 2009

5. Vijayarani S, Dhayanand S: Data mining classification algorithms for kidney disease prediction. International Journal on Cybernetics and Informatics (IJCI). Aug; 4(4), 2015

6. Parul Sinha, Poonam Sinha: Comparative Study of Chronic Kidney Disease Prediction using KNN and SVM. International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 4 Issue 12, December, 2015

7. Ying Sha and May D Wang: Interpretable Predictions of Clinical Outcomes with An Attention-based Recurrent Neural Network. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 233–240, 2017

8. Brett K. Beaulieu-Jones, Casey S. Greene, the Pooled Resource Open-Access ALS Clinical Trials Consortium: Semi-supervised learning of the electronic health record for phenotype stratification. Journal of Biomedical Informatics 64 168-178, 2016

9. E Choi, A Schuetz, WF Stewart, and Jimeng Sun: Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association 24, 2, 361–370, 2016

10. Bengio, Y., Goodfellow, I. J., & Courville, A: Deep Learning. Book in preparationfor MIT Press, 2015

11. Goodfellow, I.; Bengio, Y.; and Courville, A: 2016. Deep learning. 2015.

62

12. M. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha:

Systematic poisoning attacks on and defenses for machine learning in healthcare, IEEE journal of biomedical and health informatics, vol. 19, no. 6, pp. 1893–1905, 2015

13. Cho, K., Courville, A., and Bengio, Y: Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia 17, 11, 1875-1886, 2015

14. Zhang, D., Yao, L., Zhang, X., Wang, S., Chen, W., Boots, R.: EEG-based intention recognition from spatio-temporal representations via cascade and parallel convolutional recurrent neural networks. arXiv preprint arXiv:1708.06578, 2017

15. Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3367–3375, 2015

16. Donahue, Jeff, Hendricks, Lisa Anne, Guadarrama, Sergio, Rohrbach, Marcus, Venugopalan, Sub- hashini, Saenko, Kate, and Darrell, Trevor.:

Long-term recurrent convolutional networks for visual recognition and description. arXiv preprint arXiv:1411.4389, 2014

17. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D: Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, 1989

18. Y. Kim: Convolutional Neural Networks for Sentence Classification. arXiv:

1408.5882, 2014

19. Lipton, Zachary C., Berkowitz, John, and Elkan, Charles: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019, 2015

20. S.Hochreiter and J.Schmidhuber; Long short-term memory. Neural computation, 9(8):1735–1780, 1997

21. Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., and Bengio, Y: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Empiricial Methods in Natural Language Processing, 2014a

63

22. Schuster, M. and Paliwal, K. K: Bidirectional recurrent neural networks.

Signal Processing, IEEE Transactions on, 45(11), 2673–2681. 1997

23. Sutskever, I., Vinyals, O., and Le, Q: Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems.

2014

24. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean: Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111-3119. 2013b

25. E. Choi, A. Schuetz, W. F. Stewart, and J. Sun: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv:1602.03686, Feb, p. 45, 2016

26. Hripcsak G, Duke JD, Shah NH, et al: Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform;216:574–578, 2015

27. D. Bahdanau, K. Cho, and Y. Bengio: Neural machine translation by jointly learning to align and translate. Technical report, arXiv:1409.0473, 2014 28. Huang M-J, Wei R-bao, Wang Y, Su T, Di P, Li Q, Yang X, Li P, Chen X:

Blood coagulation system in patients with chronic kidney disease: a prospective observational study. BMJ Open 2017;7:e014294, 2016

29. El Nahas AM, Bello AK: Chronic kidney disease: the global challenge.

Lancet. 365: 331–40, 2005

30. Wong CW, Wong TY, Chen CY, Sabanayagam C: Kidney and eye diseases:

common risk factors, etiological mechanisms, and pathways. Kidney Int: 85:

1290 – 1302, 2014

31. Caravaca, F., Gonzales, B., Bayo, M. Á., and Luna, E: Musculoskeletal pain in patients with chronic kidney disease. Nefrología (English Edition), 36, 433-440, 2016

32. Pop-Busui R, Roberts L, Pennathur S, Kretzler M, Brosius FC, Reldman EL:

The management of diabetic neuropathy in CKD. Am J Kidney Dis. 55: 365–

385, 2010

64

33. Silverberg D., Wexler D., Blum M., Schwartz D., Iaina A: The association between congestive heart failure and chronic renal disease. Curr Opin Nephrol Hypertens 13:163–170. 2004

34. Gambaro, G.; Croppi, E.; Bushinsky, D.; Jaeger, P.; Cupisti, A.; Ticinesi, A.; Mazzaferro, S.; D’Addessi, A.; Ferraro, P.M: The risk of chronic kidney disease associated with urolithiasis and its urological treatments: A review. J. Urol, 198, 268–273, 2017

35. Ibrahim A.E, Sarhane K.A., Fagan S.P, Goverman J: Renal dysfunction in burns: a review, Ann Burns Fire Disasters, v.26(1), 2013

36. Andrew OT, Schoenfeld PY, Hopewell PC, et al: Tuberculosis in patients with end-stage renal disease. Am J Med. 68:59e65, 1980

37. Kazancioğlu R: Risk factors for chronic kidney disease: an update. Kidney Int Suppl (2011). 3: 368–371, 2013

38. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong CK, Rijnbeek PR, Lei J, Pratt N, Noren GN, Li Y, Stang PE, Madigan D, Ryan PB: Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers, 2015 39. A. N. Jagannatha and H. Yu: Bidirectional recurrent neural networks for

medical event detection in electronic health records. in Proc. conf. Assoc.

Comput. Linguistics North American Chapter Meeting, pp. 473– 482, 2016 40. E Choi, MT Bahadori, L Song, WF Stewart, and J Sun: GRAM: Graph-based Attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 787–795, 2017

41. Sanne M. Schreuder, Jaap Stoker, Shandra Bipat: Prediction of presence of kidney disease in patients undergoing intravenous iodinated contrast enhanced computed tomography: a validation study. Eur Radiol 27:1613-1621, 2017

42. Navdeep Tangri, MD, FRCPC Lesley A. Stevens, MD, MS, FRCPC John Griffith, PhD Hocine Tighiouart, MS Ognjenka Djurdjev, MSc David Naimark, MD, FRCPC Adeera Levin, MD, FRCPC Andrew S. Levey, MD:

A Predictive Model for Progression of Chronic Kidney Disease to Kidney

65

Failure. JAMA, April 20, Vol 305, No. 15, 2011

43. Akhter Mohiuddin Rather, Arun Agarwal, V.N. Sastry: Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications 42, 3234–324, 2015

44. M.T.Ribeiro, S.Singh, and C.Guestrin: "Why Should I Trust You?":

Explaining the Predictions of Any Classifier. In SIGKDD, 2016.

45. F.Naeymi-Rad: A Feature Dictionary Supporting a Multi-Domain Medical Knowledge Base. Computer Methods and Programs in Biomedicine, November, 217-228, 1989

46. Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles, Catherine Coffey, and Jimeng Sun: Multi-layer representation learning for medical concepts. In KDD, 2016a.

47. Lamnodar Jena, Narendra Ku.Kamila: N.K: Distributed data mining classification algorithms for prediction of chronic kidney disease. IJERMT 4 (11) 110-118, 2015

48. Manish Kumar: Prediction of Chronic Kidney Disease Using Random Forest Machine Learning Algorithm. International Journal of Computer Science and Mobile Computing, Vol.5 Issue.2, February, pg. 24-33. 2016

49. S. Ramya, N. Radha: Diagnosis of Chronic Kidney Disease Using Machine Learning Algorithms. International Journal of Innovative Research in Computer and Communication Engineering. Vol. 4, Issue 1, January, 2016

66 부록

<부록 1. 메드투벡을 이용한 임베딩 시각화>

그림 22. 메드투벡을 이용한 임베딩 결과값을 PCA 로 재구성한 시각화. (A) 2 차원으로 나타낸 임베딩 결과값. (B) 3 차원으로 나타낸 임베딩 결과값.

그림 23. 메드투벡을 이용한 임베딩 결과값을 PCA 로 재구성 및 구형의 형태로 정규화한 시각화. (A) 2 차원으로 나타낸 임베딩 결과값. (B) 3 차원으로 나타낸 임베딩 결과값.

67

그림 24.오몹 시디엠 컨셉 아이디 2801 와 가까운 코사인 거리의 컨셉 아이디의 시각화

68 <부록 2. 활성화 함수 수식 및 모형>

1. 시그모이드 함수

시그모이드 함수는 로지스틱 함수와 동일한 의미이며 수식 및 그래프 모형은 아래와 같다.

û(<) = â(<) = 1 1 + T^ä

그림 25 는 시그모이드 함수의 그래프 모형과 1 차 미분한 그래프이다.

그림 25. 시그모이드 함수와 미분된 시그모이드 함수. (A) 시그모이드 함수. (B) 1 차 미분된 시그모이드 함수.

시그모이드 함수의 범위는 [0,1]이기 때문에 출력 값이 0 이상의 값을 지닌다는 문제가 발생한다. S 자 커브 형태이기 때문에 -5 보다 작거나 5 보다 큰 입력 값의 경우, 그라디언트 값이 지나치게 작아져 학습이 잘 되지 않고, 속도가 느리다는 단점이 있다.

69

2. 쌍곡탄젠트 함수

쌍곡탄젠트 함수는 시그모이드 함수의 크기와 위치를 조절한 함수이며,

문서에서 Chronic Kidney Disease Risk Prediction using Electronic Health Records Pattern Information based on Deep Learning (페이지 59-0)

에이유피알씨(The Area under the Precision-Recall Curve,

F. 평가 척도

3. 에이유피알씨(The Area under the Precision-Recall Curve,

48

49

Ⅳ . 실험 결과

A. 예측 모델별 성능

50

51

52

53

54

55

56

57

58

59

60

Ⅵ. 결론

61

참고문헌

62

63

64

65

66

부록

67

68

<부록 2. 활성화 함수 수식 및 모형>

69

Ⅳ ^{. 실험 결과}