결론 및 향후 연구과제 - 국민대학교국민대학교국민대학교국민대학교

요즘 인터넷 뉴스 댓글에서 문제가 되고 있는 악성 댓글은 댓글 작성 자들이 전혀 상대방을 배려하지 않는 공격적인 단어를 사용함으로써 댓 글을 읽는 상대방을 흥분시키고, 논리적인 비판을 하려는 노력하는 모습 을 보이기보다는 나와 다른 의견을 갖고 있는 상대방을 비방하고, 비속 어를 사용하여 공격한다. 이러한 글을 쓰지 못하도록 운영자가 삭제할 수도 있지만 인터넷 신문의 자율성을 유지하기 위해서 인터넷 신문 댓글 작성자들이 자율적으로 규제할 수 있도록 하는 것이 최선의 방법이다.

본 논문에서는 인터넷 뉴스에 대한 악성 댓글을 판단하는 시스템을 제 안하고 구현하였다. 제안한 시스템의 성능은 문서 처리 과정의 자질 선 택시 어절 자체와 명사를 포함한 실험이 다른 품사 실험보다 우수한 성 능을 보였다.

향후 과제로는 문서에 자주 사용되는 단어 중에는 문서 범주화에 영향 을 미치지 않은 것이 있다. 그러나, 이 단어는 높은 빈도에 의해 가중치 계산에 큰 영향을 준다. 이런 불용어를 제거하는 방법이 연구되어야 할 것이다.

그리고, 현재 수집된 금칙어 사전을 이용하여 분류기가 잘못 판단한 댓글을 다시 한번 필터링 하거나, “극도로 싫어한다.”, ”쌤통이다.”, ”얼마 나 잘되나”와 같은 악성 댓글에 나타나는 독특한 표현들에 대한 연구를

적용한 최적의 시스템을 제시할 수 있을 것이다.

참 참 참

참 고 고 고 문 고 문 문 문 헌 헌 헌 헌

[1] Sara Owsley, Sanjay C. Sood, and Kristian J. Hammond, “Domain Specific Affective Classification of Documents ” , The AAAI Spring Symposia on Computational Approaches to Analysing Weblogs, pp.181-183, 2006.

[2] T. Joachims,”Support Vector Machine(SVM^light)”, http://svmlight.joachims.org/, 2004

[3] Hong Qu, Andrea La Pietra, Sarah Poon, “Automated Blog Classification: Challenges and Pitfalls”, The AAAI Spring Symposia on Computational Approaches to Analysing Weblogs, pp.184-186, 2006.

[4] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques”, In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.79-86, 2002.

[5] Soo-Min Kim and Edward Hovy, “Determining the sentiment of opinions”, In COLING-2004, pp.1367-1373, 2004.

[6] N.Hiroshima, S. Yamada, O. Furuse and R. Kataoka,“Searching for Sentences Expression Opinions by using Declaratively Subjective Clues”, In Proceedings of the Workshop on Sentiment and Subjectivity in Text, c2006 Association for Computational Linguistics, pp.39– 46, 2006.

[7] P.D. Turney and M.L. Littman, “Unsupervised Learning of Semantic Orientation from a Hundred-billion-word Corpus”, National Research Council, Institute for Information Technology, (No. ERB-1094, NRC #44929), 2002.

[8] Michael Gamon. “Sentiment Classification on Customer Feedback Data: noisy data, large feature vectors, and the role of

linguistic analysis”, In Proceedings the 20^th, International Conference on Computational Linguistics, pp.841– 847, 2004.

[9] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, “A Sentimental Education: Sentiment Analysis using Subjectivity Summarization Based on Minimum Cuts”. ACL 2004, pp.271– 278, 2004.

[10] P.D. Turney and M.L. Littman. “Measuring Praise and Criticism: Inference of Semantic Orientation from Association”

ACM Transactions on Information Systems (TOIS). Vol. 21, No. 4.

October 2003. pp.315-346, 2003.

[11] 강승식, 한국어 형태소 분석 및 정보 검색, 홍릉과학출판사, 2002.

[12] Joachims, T. “Text categorization with Support Vector Machines: Learning with Many Relevant Features. In Machine Learning”, ECML-98, Tenth European Conference on Machine Learning, pp.137-142, 1998.

[13] 권순희, “하이퍼미디어 시대의 언어 문화 교육 연구 : 인터넷 신문 수용자의 이해 반응과 이해 교육 방안”, 서울대학교 국어 교육연구소, 2003.

[14] Tak W.Yan, Hector Garcia-Molina, “SIFT – A Tool for Wide-Area Information Dissenmination”, In Proceedings of th 1995 USENIX Technical Conference, pp.177-186, 1995.

[15] Salton, G., “Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer”, Addison-Wesley Publishing, 1989.

[16] Pattie Maes, “Agents that reduce work and information overload”, Communications of the ACM, pp.33-40, 1994.

[17] Frakes, W. B. and R. B. Yates, “Information Retrieval:

Data Structures and Algorithm”, Prentice-Hall, 1992.

[18] Joachims, “Learning to Classify Text using Support Vector Machines: Methods, Theory and Algorithms”, Kluwer Academic

Publishers, 2002.

[19] Vapnik, “The Nature of Statistical Learning Theory”, Springer, 1995.

[20] Chapelle, O., Haffner, P. Vapnik, V., “SVM for histogram-based image classification”, IEEE Trans. On Neural Networks, pp.1055-1064, 1999.

[21] T. Doszkocs, J. Reggia, and X. Lin, “Connectionist models and information retrieval”, Annual Review of Information Science & Technology, pp.209-260, 1990.

[22] Nobuaki Hiroshima, Setsuo Yamada, Osamu Furuse and Ryoji Kataoka, “Searching for. sentences expressing opinions by using declaratively subjective clues”, In Proceedings of the.

Workshop on Sentiment and Subjectivity in Text, pp.39– 46, 2006.

[23] Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G. and Spyropoulos, C. D., "An Evaluation of Naive Bayesian Anti-Spam Filtering", Proc of the 11th European Conference on Machine Learning, pp.9-17, 2000.

Abstract

A Design and Implementation of Malicious Web Log Identification System by Using

SVM

by Kim, Myo-Sil

Major in Computer Science Education Graduate School of Education

Kookmin University Seoul, Korea

To write opinion of oneself about news is sweet at on-line and it owns jointly the opinion of the person who is different with information which is necessary to be easy, there is a possibility of getting quickly. In this paper, we present a system that can be used to classify malicious Web Logs of news which ground it slanders the specific person without or damage an honor. The system gathers and analyzes Web Logs. It

experiments with 6 data models by extracting features and calculates the weight by TF*IDF. These features were used as feature parameters in a Support Vector Machine to classify malicious Web Logs.

문서에서 국민대학교국민대학교국민대학교국민대학교 (페이지 47-53)