* JTPARK@yuhs.ac(JTP); JAYKWON@yuhs.ac(JYK)
Abstract
Preeclampsia is one ofthe leading causes of maternal and fetal morbidity and mortality. Due to the lack of effective preventive measures, its prediction is essential to its prompt manage- ment. This study aimed to develop models usingmachinelearning to predict late-onset pre- eclampsia using hospital electronic medical record data. The performance ofthemachinelearning based models and models using conventional statistical methods were also com- pared. A total of 11,006 pregnant women who received antenatal care at Yonsei University Hospital were included. Maternal data were retrieved from electronic medical records during the early second trimester to 34 weeks. The prediction outcome was late-onset preeclampsia occurrence after 34 weeks’ gestation. Pattern recognition and cluster analysis were used to select the parameters included in theprediction models. Logistic regression, decision tree model, naïve Bayes classification, support vector machine, random forest algorithm, and sto- chastic gradient boosting method were used to construct theprediction models. C-statistics was used to assess the performance of each model. The overall preeclampsia development rate was 4.7% (474 patients). Systolic blood pressure, serum blood urea nitrogen and creati- nine levels, platelet counts, serum potassium level, white blood cell count, serum calcium level, and urinary protein were the most influential variables included in theprediction models.
주요어 : 머신러닝, 데이터마이닝, 예측모델, 콜레스테롤, 고콜레스테롤혈증, 고콜레스테롤, 체지방
Abstract The purpose ofthe present study is to develop a model for predicting hypercholesterolemia using an integrated set of body fat mass variables based on machinelearning techniques, beyond the study ofthe association between body fat mass and hypercholesterolemia. For this study, a total of six models were created using two variable subset selection methods and machinelearning algorithms based on the Korea National Health and Nutrition Examination Survey (KNHANES) data. Among the various body fat mass variables, we found that trunk fat mass was the best variable for predicting hypercholesterolemia. Furthermore, we obtained the area under the receiver operating characteristic curve value of 0.739 and the Matthews correlation coefficient value of 0.36 in the model usingthe correlation-based feature subset selection and naive Bayes algorithm. Our findings are expected to be used as important information in the field of disease prediction in large-scale screening and public health research.
Keywords: deep learning, DO (dissolved oxygen), LSTM (long short-term memory), machinelearning, water quality prediction
Introduction
격한 도시화와 경제 발전으로 인해 증가된 오폐수 방류량은 하천의 수질오염 문제를 유발 하였으며 악화된 수질을 개선하기 위한 다양한 노력이 시도되고 있다. 하천의 수질 오염 정 도를 판단하는 수질인자 중 DO (dissolved oxygen)의 변화는 역사적으로 하천 연구의 주요 대 OPEN ACCESS
Accurate prediction for individual patient is still difficult with conventional statistical methods because most of clinical characteristics show multidimensional and non-lineal relationship [16]. GB method is one ofthemachinelearning methods that is highly flexible in detecting and recognizing complex non-linear relationships between variables. Compared to FIGO staging predicting with only factors related to tumor, the GB model can provide more individualized prediction based on various factors. To the best of our knowledge, current study is the first study to use a machinelearning classifier for the prognostic predictionof endpoint of OS in EOC patients using factors including sequential CA-125 level during
주요어: 기계학습, 지하수위, 시계열 예측 모델, 국가지하수관측망
Heesung Yoon, Pilsun Yoon, Eunhee Lee, Gyoo-Bum Kim and Sang-Ho Moon, 2016, Application ofmachinelearning technique-based time series models for predictionof groundwater level fluctuation to national groundwater monitoring network data. Journal ofthe Geological Society of Korea. v. 52, no. 3, p. 187-199 ABSTRACT: In the present study, we developed artificial neural network (ANN) and support vector machine (SVM) based time series models and applied them to groundwater level time series data of 216 observatories in National Groundwater Monitoring Network. The purpose ofthe development and application ofthe time series model is to evaluate the model applicability to simulation of groundwater level fluctuation due to the rainfall by forecasting missing and abnormal data and filtering out the effect of groundwater pumping and stream water stage fluctuation. First, 1 day lead time direct prediction model for each station was built and utilized for establishing recursive prediction model. Results of time series modeling of groundwater level show that they can fill the missing data and filter out the effect of pumping and stream water fluctuation on groundwater level effectively. Results of error index analysis show that ANN models are slightly superior to SVM in direct prediction, however, SVM models are more stable for conducting the recursive prediction. Based on the result of model parameter selection process usingthe trial and error method, the present study suggests appropriate range of model parameter values for the given time series data of National Groundwater Monitoring Network. We expect that the applied method and results of this study can be useful for managing groundwater monitoring network by detecting abnormal groundwater level data and groundwater resources effectively by appling it to groundwater recharge estimation.
Abstract Although various efforts have been made every year to reduce electric fire accidents such as accident analysis and inspection for electric fire accidents, there is no effective countermeasure due to lack of effective decision support system and existing cumulative data utilization method. The purpose of this study is to develop an algorithm for predicting electric fire based on data such as electric safety inspection data, electric fire accident information, building information, and weather information. Through the pre-processing of collected data for each institution such as Korea Electrical Safety Corporation, Meteorological Administration, Ministry of Land, Infrastructure, and Transport, Fire Defense Headquarters, convergence, analysis, modeling, and verification process, we derive the factors influencing electric fire and develop prediction models. The results showed insulation resistance value, humidity, wind speed, building deterioration(aging), floor space ratio, building coverage ratio and building use. The accuracy ofprediction model using random forest algorithm was 74.7%.
의 가장 낮은 기대 승률을 기록하였다. 팀 A1은 가장 낮은 1.57점 의 예상 실점으로 예측되었지만 예상 득점 또한 가장 낮은 1.63으 로 예측되어 52%의 기대 승률을 기록하였다.
2.5 Win percentage prediction experimental results 표 6에서처럼 피타고리안 승률과 실제 승률의 차이는 평균 12.5%를 기록하였다. 그리고 제안 모델에서 예측한 승률과 실제 승률의 차이는 평균 8.3%를 기록하였다. 따라서 제안 모델이 피타고 리안 승률보다 약 4.2% 더 정확하게 예측하였음을 알 수 있다.
GRU와 GBDT 모형 simulation 결과의 RSR과 RMSE
를 비교한 결과 측정 빈도가 높은 경우 GRU 모형이 좋은 성능을 보였으나 측정 빈도가 낮은 경우 반대로 GBDT가 좀 더 좋은 예측 성능을 보였으며, 전체적으 로는 앙상블 모형인 GBDT가 측정 빈도에 따른 RSR 의 변동 폭이 적은 것으로 분석되었다. Fig. 5에 GRU 와 GBDT의 탁도 예측값 분포를 비교하였다. GBDT는 측정 빈도에 관계없이 측정값 (observation)과 예측값 (prediction)이 유사한 분포를 보이는 반면 GRU는 24 시간 측정 빈도에 비해 2시간 측정 빈도에서 측정값 과 예측값의 오차가 줄어들어 , 1:1 line에 근접하여 분 포하였다.
특히, 저자들이 제안한 토탈 여유도 (total margin) 서포트 벡터 기계가 기업도산 예측에 있어서 좋은 수행능력을 제공한다는 것을 보일 것이다.. 서포트 벡터 기계는 원래 두 범주를 가지는 패턴 분류문제에서 개발되었다 (Bartlett과 Mangasar- ian, 1992; Cherkassky와 Mulier, 1998; Cr[r]
Another issue in efficient implementation of ML-OPC is model training. Given a set of training segments with their reference mask bias values (through MB-OPC runs, for example), the goal is to train the model, i.e., determine the network structure and network parameters such as edge weights and node biases, such that thepredictionof mask bias is as accurate as possible. A key in this process is sampling training segments, because using all segments from sample layouts is a waste of time and may cause overfitting ofthe model toward the segments which occur more frequently. Since sample layouts may not contain all segments that may arise in actual OPC process, generation of synthetic pat- terns, discussed in Section 5, may help extend the coverage ofmachinelearning model.
요약 : 수리 또는 계량적 모형을 사용하는 사회과학연구에서 분석의 초점은 종속변수와 설명변수의 관계를 밝히는 것, 즉 설명 중심의 모형(explanatory modeling)이 지금까지 주류를 이루었다. 반면 예측(prediction) 능력 제고에 초점을 맞춘 분석은 드물었다. 본 연구에서는 이론 및 가설을 검증하거나 변수 간의 관계를 밝 히는 설명 중심의 모형이 아니라 신규 관찰치에 대한 예측 오차를 줄이는, 예측 중심의 비모수 모형(non- parametric model)을 검토하였다. 서울시 강남구를 사례지역으로 선정한 후, 2011년부터 2014년까지 신고된 단독주택 실거래가를 기초자료로 하여 주택가격을 추정하였다. 적용한 비모수 모형은 기계학습 분야에서 제시 된 일반가산모형(generalized additive model), 랜덤 포리스트, MARS(multivariate adaptive regression splines), SVM(support vector machines) 등이며 비교적 최근에 개발된 MARS나 SVM의 예측력이 뛰어남을 확인할 수 있었다. 마지막으로 이러한 비모수 모형에 공간적 자기상관성을 추가적으로 반영한 결과, 모형의 가격 예측력 이 보다 개선되었음을 알 수 있었다. 본 연구를 계기로 그간 모수 모형에 집중되었던 부동산 가격추정 방법론 이 비모수 모형으로 확대 및 다양화되기를 기대한다.
23 Department of Statistics, Pukyong National University
Received 22 March 2013, revised 17 April 2013, accepted 2 May 2013
Abstract
Evolutionary algorithms have been applied to multi-objective optimization problems by approximation methods using computational intelligence. Those methods have been improved gradually in order to generate more exactly many approximate Pareto optimal solutions. The paper introduces a new method using support vector machine to find an approximate Pareto frontier in multi-objective optimization problems. Moreover, this paper applies an evolutionary algorithm to the proposed method in order to generate more exactly approximate Pareto frontiers. Then a decision making with two or three objective functions can be easily performed on the basis of visualized Pareto frontiers by the proposed method. Finally, a few examples will be demonstrated for the effectiveness ofthe proposed method.
Abstract As pet dogs rapidly increase in number, abandoned and lost dogs are also increasing in number.
In Korea, animal registration has been in force since 2014, but the registration rate is not high owing to safety and effectiveness issues. Biometrics is attracting attention as an alternative. In order to increase the recognition rate from biometrics, it is necessary to collect biometric images in the same form as much as possible-from the face. This paper proposes a method to determine whether a dog is facing front or not in a real-time video. The proposed method detects the dog's eyes and nose using deep learning, and extracts five types of directional face information through the relative size and position ofthe detected face. Then, a machinelearning classifier determines whether the dog is facing front or not. We used 2,000 dog images for learning, verification, and testing. YOLOv3 and YOLOv4 were used to detect the eyes and nose, and Multi-layer Perceptron (MLP), Random Forest (RF), and the Support Vector Machine (SVM) were used as classifiers. When YOLOv4 and the RF classifier were used with all five types ofthe proposed face orientation information, the face recognition rate was best, at 95.25%, and we found that real-time processing is possible.
III. Software Design
1. DLZ algorithm using Big data
FA-50에 적용된 JDAM DLZ 알고리즘은 라이브러리 형 태로 구현되어 있으나, FA-50 항공기와 동일한 환경을 제 공하는 지상모의장비(AHB, Avionics Hot Bench)를 활용 하여 빅데이터를 수집할 수 있다. AHB에서 실제 항공기와 동일한 환경을 구축한 후, FA-50 JDAM DLZ의 입력 값에 따른 출력 값 변화 패턴을 분석한 결과 대부분 선형적인 형 태로 나타나는 것을 확인하였다. 해당 입력 값 및 출력 값 은 2.1에서 정의하였다. Fig. 3은 FA-50 JDAM DLZ 알고 리즘의 입력 값 중 다른 조건은 동일하게 설정하고 항공기 속도인 Mach값만 변경할 경우의 최대 투하 사거리를 나타 내고 있다. 이 그래프를 입력값과 출력값을 표로 작성한 후 엑셀을 활용하여 회귀식으로 나타내면 다음 식1과 같다.
상대적으로 Failure rate 가 낮은 것은 Bed 부분 과 Rotary Dia Spindle head 로 분석이 되었다 .
Fig. 7 MTBF of part for grinding wheel head
향후 설계 변경과 시스템의 부품 선정과정에서 커플링과 베어링 , Seal 등의 선정시 신뢰도를 고려 하면 예측한 신뢰도 보다 향상될 것이며 가장 , 많은 영향을 주는 grinding wheel head, Regulating 등에 신뢰도 향상 wheel table, RW Spindle mount
e-mail:{leetaeho, kimmw95, byungjun}@skku.edu O* , kyungtaekim76@gmail.com ** , youn7147@skku.edu *
Daily maximum power demand analysis usingmachinelearning model
Tae-Ho Lee O , Min-Woo Kim * , Byung-Jun Lee * , Kyung-Tae Kim ** , Hee-Yong Youn * Dept. of Electrical and Computer Engineering, Sungkyunkwan University O Dept. of Electrical and Computer Engineering, Sungkyunkwan University *