• 검색 결과가 없습니다.

Text Data Mining to build a Dataset for Clothing Recommendation System

N/A
N/A
Protected

Academic year: 2021

Share "Text Data Mining to build a Dataset for Clothing Recommendation System"

Copied!
4
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

㿪㻲 㔲㓺䎲 ◆㧊䎆 ㎡ ῂ㿫㦚 㥚䞲

䎣㓺䔎

◆㧊䎆

Ⱎ㧊┳

㧊㭒㌗*, 㩫㍶䌲**, 㹾㭖㡓* *㒃㔺╖䞯ᾦ ╖䞯㤦 㩫⽊䐋㔶Ὃ䞯ὒ **㒃㔺╖䞯ᾦ 㓺Ⱎ䔎㔲㓺䎲㏢䝚䔎㤾㠊䞯ὒ [email protected], [email protected]

Text Data Mining to build a Dataset for

Clothing Recommendation System

Ju-Sang Lee*, Sun-Tae Chung**,Jun-Yup Cha

*Dept. of Information and Telecommunication Eng., Graduate School, Soongsil University **Dept. of Smart Systems Software, Soongsil University

㿪㻲㔲㓺䎲㦖 ╖⨟㦮 㩫⽊⯒ 㧊㣿䞮㡂 䔏㩫 ㌂㣿㧦Ṗ ㍶䢎䞶Ⱒ䞲 ㌗䛞㦮 Ⰲ㓺䔎⯒ 㿪㻲䞮⓪ ộ㧊┺. 䡚㨂 㿪㻲㔲㓺䎲㦒⪲ 㥶ⳛ䞲 Netflix, Amazon, Youtube ❇㦖 ₆㠛⌊㦮 ㌗䛞 ⹥ ㌂㣿㧦 ◆㧊䎆⯒ 䏶╖⪲ 㧊⬾㠊 㪢㦒⋮ 㓺䌖䔎 㠛 ⹥ ㏢′⳾ ₆㠛㧊 㿪㻲 㔲㓺䎲㦚 ῂ㿫䞮₆ 㥚䟊㍶ ₆⹮㧊 ♶ ◆㧊䎆㎡ 㧦㼊 Ṗ 㠜㦒Ⳇ ◆㧊䎆 㑮㰧㠦☚ 䞲ἚṖ 㧞┺. ⽎ ⏒ⶎ㠦㍲⓪ 㢍 㿪㻲 㔲㓺䎲 ῂ㿫㦚 㥚䟊 䔏㩫 ₆㠛Ⱒ㧊 㞚┢ ⳾✶ 㦮⮮ⰺ㧻✺㧊 ㌂㣿䞶 㑮 㧞⓪ ◆㧊䎆 ㎡ ῂ㿫 ⹿⻫㠦 ╖䟊 㩲㞞䞮Ⳇ, ἶṳ ◆㧊䎆 ㎡ ῂ㿫 㦚 㥚䞲 䎣㓺䔎 ◆㧊䎆 Ⱎ㧊┳ 㻮Ⰲ ὒ㩫ὒ ἆὒ㠦 ╖䟊 ₆㑶䞲┺. 1. 昢嵦 㿪㻲 㔲㓺䎲㦖 㡂⩂ ㌗䛞✺㭧 䔏㩫 ㌂㣿㧦Ṗ Ṗ㧻 ㍶䢎䞶Ⱒ䞲 Ⰲ㓺䔎Ⱒ 㺔㞚 㿪㻲䞮⓪ 㩫⽊ 䞚䎆Ⱇ 㔲 㓺䎲㧊┺. 䣾ὒ㩗㧎 㿪㻲 㔲㓺䎲 ῂ㿫㦚 㥚䟊㍲⓪ 㿪 㻲 㞢ἶⰂ㯮㦮 ㎇⓻ὒ 㿿⿚䞲 㟧㦮 ἶṳ,㌗䛞 ◆㧊䎆 Ṗ 䞚㣪䞮┺. 㞢ἶⰂ㯮㦮 ㎇⓻㦖 Ἒ㌆⨟㧊⋮ 㡆㌆ ㏣ ☚㈦ 㞚┞⧒ ゛◆㧊䎆⯒ 㻮Ⰲ䞶 㑮 㧞⓪ 䢮ἓ㧎㰖⯒ ἶ⩺䟊㟒䞲┺. 㡞⯒ ✺㠊, ゛◆㧊䎆 㔲㓺䎲㧊 ῂ㿫♮ 㠊 㧞┺Ⳋ CF(Collaborative Filtering)⋮ DL(Deep Learning)㦚 ㌂㣿䞶 㑮 㧞Ỷ㰖Ⱒ, ╖⨟㦮 㩫⽊Ṗ 㞚┞ ⧒Ⳋ 㡺䧞⩺ ㎇⓻㧊 ⟾㠊㰞㑮☚ 㧞₆ ➢ⶎ㧊┺[1]. ⁎ Ⰲ䞮㡂 㩚㩲㫆Ị㧊 ♮⓪ ộ㧊 ἶṳ ◆㧊䎆㢖 ㌗䛞 ◆ 㧊䎆㧊┺. Netflix, Amazon, Youtube ❇ 㿪㻲 㔲㓺䎲㦚 㩗⁏ ㌂㣿䞮ἶ 㧞⓪ ㎎Ἒ㩗㧎 ╖₆㠛✺㦖 ⁎✺Ⱒ㦮 ἶṳὒ ㌗䛞 ◆㧊䎆⻶㧊㓺⯒ ₆⹮㦒⪲ 㿪㻲 㞢ἶⰂ㯮 㦚 㩗㣿䞲┺. 㡂₆㍲ 㿿⿚䞲 ◆㧊䎆Ṗ 㠜⓪ 㓺䌖䔎㠛 ⹥ 㭧㏢₆㠛㦖 㿪㻲 㔲㓺䎲 ῂ㿫 㧊㩚㠦 䆲✲㓺䌖䔎 ⶎ㩲㠦 ⿖➁䂮Ợ ♲┺. Ⱒ㟓 㿪㻲 㔲㓺䎲㦚 㥚䞲 Ὃ Ὃ ◆㧊䎆㎡㧊 㫊㨂䞲┺Ⳋ ◆㧊䎆Ṗ 㠜⓪ 㓺䌖䔎㠛✺ 㧊 㿪㻲㔲㓺䎲 ῂ㿫㦮 䞲Ἒ⯒ ⁏⽋䞮⓪ ộ㦖 ⶒ⪶㧊 Ⳇ 㧊⹎ 㿪㻲㔲㓺䎲㦚 㩗⁏ ㌂㣿㭧㧎 ₆㠛㧊⧒☚ 㡆 ῂṲ⹲ ⳿㩗㦒⪲ ㌂㣿䞶 㑮 㧞㦚 ộ㧊┺. ⽎ ⏒ⶎ㠦㍲⓪ 㢍 㿪㻲 㔲㓺䎲㦚 㥚䞲 ㌗䛞 ⹥ ἶṳ 㩫⽊ ◆㧊䎆⻶㧊㓺 ῂ㿫㦚 㥚䟊 Scrapy ₆⹮㦮 㥏 䋂 ⪺⩂⯒ 䐋䟊 Amazon,Yoox,Shopbop ❇㦮 㦮⮮ 䗮㧊㰖 㠦㍲ ㌗䛞 ◆㧊䎆㢖 Ⰲう ◆㧊䎆⯒ 㑮㰧䞮㡖ἶ, 㧦㡆 㠊 㻮Ⰲ⯒ 䐋䞲 Ⰲう ⿚㍳㦚 䐋䟊 Ⰲう 㧧㎇㧦㦮 㩫 ⽊㢖 ㌗䛞㠦 ╖䞲 ㍶䢎☚⯒ 㞢㞚⌊㠊 㿪㻲 㔲㓺䎲㠦 㩗䞿䞲 ㌗䛞,ἶṳ ◆㧊䎆㎡㦚 ῂ㿫䞮⓪ ὒ㩫㦚 ㍺ⳛ䞲 ┺. 2. 洢橎 愯憛 㢍 㿪㻲 㔲㓺䎲㦚 㥚䞲 ◆㧊䎆 ㎡ ῂ㿫㦚 㥚䟊 ㏢゚ 㧦✺㧊 ㌗䛞 ῂⰺ⯒ ἆ㩫䞮₆₢㰖 㠊⟺ 㣪㧎✺㧊 㡗 䟻㦚 ⋒䂮⓪㰖 ἶ⹒䞮ἶ 㿪㻲 㔲㓺䎲㠦㍲ 䞚㣪⪲ 䞮 ⓪ ◆㧊䎆㦮 ㎇ỿ㦚 䕢㞛䟊㟒 䞲┺. ㏢゚㧦✺㦮 㦮⮮ 㩲䛞 ῂⰺ ἆ㩫 㣪㧎㠦 ╖䞲 ὖ⩾ 㡆ῂ㠦㍲⓪ 㩲䛞㦮 ㏣㎇㧊 ῂⰺ ἆ㩫㠦 㥶㦮⹎䞲 㡗䟻㦚 ⹎䂮Ⳇ ┺⯎ 㣪

393

-2020 온라인 춘계학술발표대회 논문집 제27권 제1호 (-2020. 5)

(2)

㧎✺㦖 㩲䛞㦮 䚲䡚㩗, ⶒⰂ㩗 ㏣㎇, ㌂㰚 䣾ὒ, 㡆⪏ ❇㦮 㑲㦒⪲ 㡗䟻⩻㧊 ⽊㧎┺ἶ Ⱖ䞲┺.[2] 㿪㻲 㞢ἶ Ⰲ㯮㦖 䔏㩫 ㌂㣿㧦Ṗ 䔏㩫 ㏣㎇✺㠦 ╖䟊 㠒Ⱎ㦮 䘟 㩦(㍶䢎☚)⯒ 㮂⓪㰖 Ἒ㌆䞮㡂 㞚㰗 ῂⰺ䞮㰖 㞠㦖 ㌗ 䛞㦚 ⁎ ㌂㣿㧦Ṗ 㠒Ⱎ⋮ ㍶䢎䞶㰖 ἆ㩫䞮⓪ ộ㧊 ₆ ⽎㧊Ⳇ, 㣪㯮㦖 ▪ ⋮㞚Ṗ ㌗䛞㦮 䔏㩫 䟃⳿㧊 ㍶䢎 㠦 㠊⟺ ὖἚṖ 㧞⓪㰖⯒ 㞢ἶⰂ㯮㩗㦒⪲ 㞢㞚⌊⓪ 㧶㨂 ⳾◎ ₆⹮ 㿪㻲 㞢ἶⰂ㯮㧊 Ⱔ㧊 ㌂㣿♲┺.[3] 㿪㻲 㞢ἶⰂ㯮㠦 㩗㣿䞶 ộ㦚 ἶ⩺䞮㡂 㿪⩺⌎ 㢍 㿪 㻲 ㏣㎇✺㦖 <⁎Ⱂ 1>㦮 䚲㢖 ṯ┺. <⁎Ⱂ1> 㿪㻲㠦 㡗䟻㦚 ⹎䂮⓪ ㏣㎇ G ㌗䛞㦮 ṗ ㏣㎇✺㠦 ╖䟊 ₆㫊 ㌂㣿㧦✺㦮 ῂⰺ⌊㡃 㧊 㧞ἶ, ⁎ ㌂㣿㧦✺㦮 䝚⪲䞚㧊 㫊㨂䞲┺Ⳋ 㔶′ ἶṳ㧊 㤦䞮⓪ ㌗䛞☚ ₆㫊 ㌂㣿㧦✺㦮 ◆㧊䎆⯒ ₆ ⹮㦒⪲ 㡞䁷䞶 㑮 㧞㦚ộ㧊┺. <⁎Ⱂ2> 㢍 㿪㻲 㔲㓺䎲 ◆㧊䎆 ㎡ ῂ㿫 䝚⪲㎎㓺G <⁎Ⱂ 2>⓪ 㢍 㿪㻲 㔲㓺䎲 ◆㧊䎆 ㎡ ῂ㿫 ὒ㩫㧊┺. 㥏 䋂⪺⩂⓪ Amazon,Yoox,Shopbop ❇㦮 㦮⮮ 䗮㧊㰖 㠦㍲ 㞴㍲ 㩫Ⰲ䞲 㿪㻲 㞢ἶⰂ㯮㦚 㥚䞲 ㏣㎇✺㦚 ㌗ 䛞 ◆㧊䎆⪲ 㑮㰧䟊 ㌗䛞 ◆㧊䎆 䎢㧊な㦚 ῂ㿫䞮ἶ ṗ ㌗䛞㠦 ╖䞲 Ⰲう ◆㧊䎆⯒ ⿚㍳䟊 ㌂㣿㧦㦮 ⋮㧊, ㌗䛞 ㍶䢎☚ ❇㦚 㿪㿲䟊 ㌂㣿㧦Ṗ ㌗䛞㦮 ṗ ㏣㎇㠦 ⰺ₊ 䘟㩦ὒ ㌂㣿㧦㦮 䝚⪲䞚㦚 Ṗ㰖ἶ 㧞⓪ ㌂㣿㧦 ◆㧊䎆 䎢㧊なὒ Ⰲう ◆㧊䎆 䎢㧊な㦚 Ⱒ✶┺. ㌗䛞, ㌂㣿㧦,Ⰲう ◆㧊䎆 䎢㧊な㦮 ὖἚ⓪ ㌂㣿㧦㦮 ῂⰺ ₆⪳, ㌗䛞㠦 ╖䞲 䘟㩦㧊 ♮Ⳋ㍲ <䚲 1>ὒ ṯ㧊 ἶṳ D 㦮 ㌗䛞 1 㠦 ╖䞲 ㍶䢎☚⯒ 㡞䁷䞶 ➢ ἶṳ✺㦮 ㌗ 䛞 ῂⰺ⌊㡃ὒ 䘟㩦㦚 ₆⹮㦒⪲ ἶṳ✺Ṛ㦮 㥶㌂㎇㦚 Ἒ㌆䞮㡂 ἶṳ D Ṗ ㌗䛞 1 㠦 ╖䟊 㠒Ⱎ Ⱒ䋒㦮 ㍶䢎 ☚⯒ Ṗ㰞㰖 㡞䁷䞶㑮 㧞⓪ CF ₆⹮ 㿪㻲 㞢ἶⰂ㯮㦚 㥚䞲 ₆⽎ ◆㧊䎆 ㎡㧊 ῂ㎇♲┺. ㌗䛞1 ㌗䛞2 ㌗䛞3 ㌗䛞4 ἶṳA 5 3 2 ἶṳB 4 4 5 ἶṳC 2 4 3 ἶṳD ? 4 3 3 <䚲 1> 㿪㻲㔲㓺䎲㦮 䘟㩦 䎢㧊な 㡞 2-1. 塶塶決瘶 朞滗 㽞₆ ◆㧊䎆⓪ Scrapy[4]⯒ ₆⹮㦒⪲ 㧧㎇䞲 㥏 䋂⪺ ⩂⯒ ㌂㣿䞮㡂 Amazon[5],Yoox[6],Shopbop[7]❇㠦㍲ 300,000 Ṳ Ṗ⨟㦮 ㌗䛞 ◆㧊䎆㢖 Ⰲう ◆㧊䎆⯒ 㑮㰧 䞮㡖┺. ㌗䛞 ◆㧊䎆⓪ ⼚☚㦮 ⿚㍳ὒ㩫㦚 Ệ䂮㰖 㞠 㦒⋮ <䚲 2>㢖 ṯ㧊 ㍲⪲ ┺⯎ 㥏 䗮㧊㰖㧎 Ⱒ䋒 ṯ 㦖 ◆㧊䎆☚ ┺⯊Ợ 䚲₆Ṗ ♮㠞㦒Ⳇ 㧒⿖ ㌗䛞 㩫⽊ 㠛⪲✲Ṗ ⁖ 㧧㎇㧦㦮 㭒ὖ㩗㧎 ㌳ṗ㦒⪲ 㧧㎇♮₆ ➢ⶎ㠦 ‘t shirt midi dress’ 㢖 ṯ㧊 䌖㧛㧊 ῂ㼊㩗㦒⪲ ⳛ㔲♮㰖 㞠Ệ⋮ ṯ㦖 ‘blue’㌟㌗㠦 ╖䟊㍲☚ ‘navy blue’,’cobalt blue’ ❇ 㥶㌂ ◆㧊䎆☚ Ⱔ㧊 㫊㨂䞲┺. 䟻 䤚 㿪㻲 㔲㓺䎲㠦㍲ 㓓Ợ ⿚㍳䞮ἶ Ỗ㌟㧊 㣿㧊䞮Ợ 㥶㌂ ◆㧊䎆⓪ 䐋䞿䞮ἶ ῂ㼊㩗㧊㰖 ⴑ䞲 ◆㧊䎆⓪ 䞮⋮㦮 䡫䌲⪲ ᾦ㩫䞮⓪❇ 㩫㩲⯒ Ệ䂲䤚 ◆㧊䎆⻶㧊 㓺㠦 㩖㧻䞲┺. 㥏䗮㧊㰖 ㏣㎇

Amazon Yoox Shopbop

㌂㧊㯞 Small,Medium,Large 46,48,50 S,M,L

Ṗỿ $ 90.00 $90.00 US$90.00

䌖㧛 T-Shirts t-shirt Tops

㌟㌗ Navy Blue Blue Cobalt blue

㢍 㧊⯚ High waist T shirt

midi dress with pockets PRADA SUNDRESSES <䚲 2> 㑮㰧䞲 㰗䤚㦮 ◆㧊䎆 䡫䌲 2-2. 庲拶 塶決瘶 把昣 㿪㻲㦚 㥚䟊 ἶṳ㦮 ◆㧊䎆⓪ ⁎☯㞞㦮 ῂⰺ⌊㡃ὒ ῂⰺ䞲 ㌗䛞㠦 ╖䞲 䘟㩦(㍶䢎☚), ㌗䛞㠦 ╖䞲 䘟㩦 㦖 ▪ ⋮㞚Ṗ ㌗䛞㦮 㠊⟺ ㏣㎇㧊 ㍶䢎㠦 㡗䟻㦚 ⹎ 䂮⓪㰖☚ ἶ⩺䞮₆ 㥚䟊 ㌗䛞㦮 ṗ ㏣㎇㠦 ╖䞲 ㍶䢎 ☚₢㰖 ἶ⩺䞶 㑮 㧞┺. ⡦䞲 㞴㍲ 㿪㻲 㞢ἶⰂ㯮㠦 ✺㠊Ṟ ㏣㎇㦒⪲ 㩫㦮䟞▮ ⋮㧊,㎇⼚,㼊䡫❇㦮 㩫⽊⯒ ㌳ṗ䞶 㑮 㧞┺. ⽎ 㡆ῂ㠦㍲⓪ ⋮㧊,㎇⼚,㼊䡫 ㎎Ṗ 㰖 ㏣㎇✺㭧 ㎇⼚㦖 ῂⰺ䞲 ㌗䛞⌊㡃㦮 ◆㧊䎆㠦㍲ 㿪㿲䞶㑮 㧞㠞ἶ 㼊䡫㦖 ἶṳ㧊 㰗㩧 㧊⹎㰖⯒ 㠛⪲ ✲ 䞮㰖 㞠㦒Ⳋ ⿚㍳㧊 ⿞Ṗ⓻䟞₆㠦 㩲㣎䞮ἶ 㧦㡆

394

-2020 온라인 춘계학술발표대회 논문집 제27권 제1호 (-2020. 5)

(3)

㠊 㻮Ⰲ⯒ ₆⹮㦒⪲ Ⰲう 㧧㎇㧦㦮 ⋮㧊⯒ ⿚㍳䞮ἶ ㌗䛞㠦 ╖䞲 ㍶䢎☚⯒ ⿚㍳䞮㡖┺. ❻ ⩂┳ ₆⹮ 㧦 㡆㠊 㻮Ⰲ ⿚㍳ ⳾◎㦚 䐋䟊 Ⰲう 䎣㓺䔎⪲⿖䎆 㤦䞮 ⓪ 㩫⽊⯒ 㿪㿲䞮Ⳇ <⁎Ⱂ 3> ὒ ṯ㦖 ὒ㩫㦚 Ệ䂲┺. <⁎Ⱂ 3> ἶṳ ◆㧊䎆 ⿚㍳ 㧦㡆㠊 㻮Ⰲ ὒ㩫 ⽎ 㡆ῂ㠦㍲⓪ 23485 Ṳ㦮 㦮⮮ ㌗䛞 Ⰲう,䘟㩦,䃊䎢 ἶⰂ,㧧㎇㧦 ⋮㧊 ❇㦮 㩫⽊Ṗ 㧞⓪ Women’s E-Commerce review data[8]ὒ 1,600,000 Ṳ㦮 䔎㥚䎆 ⁖ὒ 䔎㥭㠦 ╖䞲 㧧㎇㧦㦮 Ṧ㩫 㰖䚲Ṗ 㧞⓪ Sentiment 140 dataset with 1.6million tweets[9]⯒ 䤞⩾ ◆㧊䎆 ㎡㦒⪲ ㌂㣿䞮㡖┺. ࣮ ࣮ࣩ࣮ࣩ࣭ࣜ塶決瘶ࣜ洊熞庲ࣜࣜ ⿚㍳䞶 㣿☚㠦 ⰴỢ 䎣㓺䔎⯒ ㌂㩚 㻮Ⰲ䞮⓪ 㧧㠛 㦚 䞲┺. ◆㧊䎆 㩚㻮Ⰲ⯒ 㥚䟊 ❻ ⩂┳㦚 㥚䞲 Python ⧒㧊ぢ⩂Ⰲ Keras 㢖 㧦㡆㠊㻮Ⰲ⯒ ☫⓪ 㡂⩂ 䒊㦚 㩲Ὃ䞮⓪ NLTK ⧒㧊ぢ⩂Ⰲ⯒ ㌂㣿䞮㡖┺. ⿞㣿㠊㻮Ⰲ:’I’,’You’,’it’ὒ ṯ㧊 ⶎ㧻⌊㠦 ❇㧻 ゞ☚ Ṗ ⏨㦒⋮ ⽎ 㡆ῂ㦮 䎣㓺䔎 ⿚㍳ ⳿㩗㠦 㧞㠊㍲ 㦮 ⹎⯒ Ṭ㰖 㞠⓪ ┾㠊✺㧊 㧞┺. NLTK Ṗ 㩫㦮䞲 ⿞㣿 㠊⯒ ㌂㣿䞮㡂 㧊⩂䞲 ┾㠊✺㦚 㩲Ệ䞲┺. 㩫㩲/㩫′䢪: ⿚㍳㠦 㡗䟻㦚 㭒㰖 㞠⓪ 䔏㑮ⶎ㧦, ῂ⚦㩦❇㦚 㩲Ệ䞮ἶ ἆ䁷,㧊㌗㧊 㧞⓪ ◆㧊䎆⓪ 䤞 ⩾㠦 㧮ⴑ♲ 㡗䟻㦚 㭚 㑮 㧞㠊 㧊⩂䞲 ◆㧊䎆✺㦚 㩲Ệ䞲┺. 䏶䋆䢪: 㭒㠊㰚 䎣㓺䔎 ◆㧊䎆⯒ ⿚㍳㠦 㦮⹎⯒ Ṗ 㰖Ⳋ㍲ Ṗ㧻 㧧㦖 ┾㥚⪲ ⋮⑚⓪ ộ㦚 䏶䋆䢪⧒ἶ 䞲 ┺. 㡂₆㍲⓪ 䏶䋆㦮 ₆㭖㦚 ┾㠊⪲ 㩫䞮㡖┺. 㤢✲㧚⻶❿: 㧦㡆㠊㻮Ⰲ⯒ 㥚䟊 䞚㣪䞲 ὒ㩫㦒⪲ 䏶䋆䢪䞲 ┾㠊✺㠦 㔺㑮⯒ ⿖㡂䞮ἶ ⻷䎆䢪䞮⓪ ộ㦚 Ⱖ䞲┺. ࣮ࣩ࣮ࣩ࣮ࣜ沖櫶檺ࣜ熞庲ࣜ微塾ࣜ 䅊䜾䎆Ṗ ⿚㍳䞶 㑮 㧞☚⪳ 䞮₆ 㥚䟊㍶ ┾㠊⯒ 㒁 㧦䢪 㔲䋺⓪ 㤢✲㧚⻶❿ ὒ㩫㧊 䞚㣪䞮┺. Word2Vec ⳾◎㦖 㭒㠊㰚 ⶎ㧻㠦㍲ ⳾✶ ┾㠊㦮 㦮⹎⯒ ⻷䎆䢪 䞮㡂 ┾㠊Ṛ 㥶㌂☚⯒ ⹮㡗䞲┺.[10] ⻷䎆䢪♲ ◆㧊䎆 ⓪ ⿚㍳ ⳾◎㠦 㧚⻶❿ 䂋㦒⪲ ✺㠊Ṛ┺. <⁎Ⱂ 4> ⋮㧊⿚㍳,㍶䢎☚ ⿚㍳ ⳾◎ὒ 䅊䕢㧒 䎣㓺䔎 ⿚㍳ ⳾◎㦖 <⁎Ⱂ 4>㢖 ṯ㧊 ῂ㎇♲┺. embedding_layer ⓪ 㞴㍲ ῂ㎇䞲 ⻷䎆䢪♲ ◆㧊䎆⪲ 㧎Ὃ 㔶ἓⰳ㦮 䂋㦮 䞮⋮⪲㍲ 㿪Ṗ♲┺. Dense()⓪ 㩚 ἆ䞿䂋㦚 㿪Ṗ䞮⓪ ộ㧊┺. 㼁⻞㱎 㧎㧦⓪ 㿲⩻ Ⓤ⩾ 㦮 㑮, ⚦⻞㱎 㧎㧦⓪ 㿲⩻䂋㠦 ㌂㣿♮⓪ 䞾㑮⯒ 㦮 ⹎䞲┺. ⋮㧊 ⿚㍳㦮 ἓ㤆 ▪ Ṛἆ䞮ἶ 䢫㔺䞲 ἆὒ ⯒ 㿲⩻䞮₆ 㥚䟊 ⋮㧊⯒ 5 ㎎ ┾㥚⪲ ⩞㧊な㦚 ⋮⒊ ┺. ➆⧒㍲ ⩞㧊な Ⱒ䋒㦮 㑮Ṗ ✺㠊Ṛ┺. ㍶䢎☚ ⿚ ㍳㦮 ⳾◎㦮 ἓ㤆 0 㠦㍲ 1 ㌂㧊㦮 㔺㑮 ἆὒ⯒ 㿲⩻ 䞮Ⳇ 1 㠦 Ṗ₢㤎㑮⪳ ㍶䢎☚Ṗ ⏨㦢㦚 㦮⹎䞲┺. 㿲 ⩻♮⓪ ἆὒ⓪ 䞲Ṗ㰖㧊₆ ➢ⶎ㠦 1 㧊 㧛⩻♲┺. ⁎ Ⰲἶ ṗṗ ┺㭧 䋊⧮㓺 ⿚⮮㢖 㧊㰚 ⿚⮮ ⶎ㩲㠦㍲ 㭒⪲ ㌂㣿♮⓪ Softmax 䞾㑮㢖 sigmoid 䞾㑮⯒ 㩗㣿䞮 㡖┺. model.compile()ὒ model.fit()㦖 ṗṗ ⳾◎Ⱇ䞲 㔶 ἓⰳ㦚 䅊䕢㧒䞮ἶ 䤞⩾㔲䋺⓪ ὒ㩫㧊┺. ◆㧊䎆㦮 䋂₆⯒ ἶ⩺䞮㡂 ὒ㩗䞿 ⹿㰖⯒ 㥚䟊 䤞⩾䣢㑮⯒ ṗ ṗ 5, 8 ⪲ 㰚䟟䞮㡖┺. 3. 冶刂 䤞⩾㣿㦒⪲ ㌂㣿䞲 ◆㧊䎆 ㎡㦚 8:2 ゚㥾⪲ ⋮⑶ 8 㦖 䤞⩾㠦 ㌂㣿䟞㦒Ⳇ 2 ⓪ 㩫䢫☚ Ỗ㯳㦚 㥚䞲 䎢㓺 䔎 ◆㧊䎆⪲ ㌂㣿䞮㡖┺. <⁎Ⱂ 4>⓪ 䞯㔋䞲 ⳾◎㦮 䘟Ṗ ἆὒ⯒ ⋮䌖⌎┺. <⁎Ⱂ 5> ㍶䢎☚⿚㍳(㫢) ⋮㧊⿚㍳(㤆)㦮 䎢㓺䔎 㩫䢫☚ ㍶䢎☚ ⿚㍳ὒ ⋮㧊 ⿚㍳㦮 䘟Ṗ 㩫䢫☚⓪ ṗṗ 84%㢖 55%⪲ ⋮䌖⌂┺. 䋆 㹾㧊㦮 㩫䢫☚⯒ ⽊㧎 㤦 㧎㦖 ◆㧊䎆 ㎡㦮 ′⳾ 㹾㧊⪲ ㌳ṗ♲┺. ㍶䢎☚ ⿚ ㍳㦖 1,600,000 Ṳ㦮 㿿⿚䞲 㟧㦮 ◆㧊䎆 ㎡㦚 ⹪䌫㦒

395

-2020 온라인 춘계학술발표대회 논문집 제27권 제1호 (-2020. 5)

(4)

⪲ 84%㦮 㩫䢫☚⯒ ⽊㧎ộ㦒⪲ 㡂Ỿ㰖Ⳇ, ⋮㧊 ⿚㍳ 㦮 ἓ㤆 23,000 Ṳ Ṗ⨟㦮 ◆㧊䎆⓪ 䎣㓺䔎⪲⿖䎆 ┾ 㠊Ṗ ⌊䙂䞮⓪ 㡆⪏╖⼚ 䔏㰫 ⹥ 㥶㌂☚⯒ Ἒ㌆䞶Ⱒ 䞲 㿿⿚䞲 䋂₆㦮 ◆㧊䎆Ṗ 㞚┞㠞▮ ộ㦒⪲ ㌳ṗ♲ ┺. <⁎Ⱂ 6>㦖 ῂ䡚䞲 ⳾◎✺㦚 ㌂㣿䞮㡂 䎣㓺䔎⯒ 㧛⩻䟞㦚➢ ⳾◎㦮 㿲⩻Ṩ㦚 ⽊㡂㭖┺. <⁎Ⱂ 6> ⋮㧊 ⿚㍳ὒ ㍶䢎☚ ⿚㍳㦮 㿲⩻ἆὒ 4. 冶冶嵦 ⽎ ⏒ⶎ㠦㍲⓪ 㢍 㿪㻲 㔲㓺䎲㦚 㥚䟊 ◆㧊䎆⯒ 㑮 㰧䞮㡂 ㌗䛞 ◆㧊䎆 ㎡㦚 ῂ㿫䞮ἶ, 䎣㓺䔎 ⿚㍳㦚 䐋䞲 ἶṳ 䝚⪲䞚 ◆㧊䎆 ㎡ ῂ㿫 ⹿⻫㦚 㿪㻲 㔲㓺 䎲㦚 㥚䞲 ₆⹮ ◆㧊䎆 ㎡ ῂ㿫 ⹿⻫㦚 㩲㞞䟞┺. 䎣 㓺䔎⯒ 䐋䞲 ⋮㧊 ⿚㍳ ┾Ἒ㠦㍲ 㿿⿚䞲 㩫䢫☚㠦 ☚ ╂䞮㰖 ⴑ䞮㡂 ἶṳ ◆㧊䎆⯒ 㡾㩚䧞 ῂ㿫䞮㰖 ⴑ䞮 㡖┺. ⁎⩂⋮ ⋮㧊 ⿚㍳ ⳾◎㦮 㩫䢫☚Ṗ 㩗㦖 㟧㦮 ◆㧊䎆 ㎡㦒⪲ 㠑㦖 ἆὒ⧒⓪ 㩦㦚 Ṧ㞞䟞㦚➢, 䟻䤚 㿿⿚䞲 䤞⩾ ◆㧊䎆 ㎡㧊 䢫⽊♲┺Ⳋ ▪ 㥶㦮⹎䞲 ἆ ὒ⯒ ☚㿲䟊⌒ 㑮 㧞㦚 ộ㦒⪲ 䕦┾♲┺. 㞴㦒⪲⓪ SNS 㠦㍲ 㠛⪲✲䞲 㧊⹎㰖⯒ ₆⹮㦒⪲ 㧊⹎㰖 ◆㧊䎆 ⿚㍳㦚 㰚䟟䞮㡂 㿪㻲 㔲㓺䎲㦚 㥚䞲 㰞 ⏨㦖 ◆㧊䎆 㑮㰧㦚 Ἒ㏣ 䞶 Ἒ䣣㧊┺. 㡆ῂṖ Ἒ㏣ ㎇ὒ⯒ ⽊㧎 ┺Ⳋ 䆲✲㓺䌖䔎 ⶎ㩲㠦 㰗Ⳋ䟊㧞⓪ 㓺䌖䔎㠛☚ 䢲㣿 Ṗ⓻䞲 㿪㻲㔲㓺䎲 ὋὋ ◆㧊䎆 ㎡ ῂ㿫㧊 Ṗ⓻䟊㰞 ộ㧊⧒ ㌳ṗ䞲┺. 焾処怾竒

[1] Sanjeevan Sivapalan, Alireza Sadeghian, Hossein Rahnama, Asad M. Madni, Recommender systems in e-commerce, 2014 World Autumation Congress(WAC), p179-184,2014 [2] 㰖䡲ἓ, 㧎䎆⎍ 㑒䞧ⴆ㠦㍲㦮 㦮⮮㩲䛞 ῂⰺἆ㩫 㣪㧎, 䞲ῃ㦮㌗❪㧦㧎䞯䣢㰖, 14(2) p185-189 [3] 䞲ῃ䆮䎦䁶㰚䦻㤦, <⹿㏷ 䔎⩢✲ &㧎㌂㧊䔎> 2016 ⎚ 4,5 㤪䢎(vol.05): 䆮䎦䁶 㿪㻲 㞢ἶⰂ㯮㦮 㰚䢪 [4] Scrapy, https://docs.scrapy.org/en/latest/

[5] Amazon, Men’s Fashion, Women’s Fashion, https://www.amazon.com/ref=nav_logo [6] Yoox, https://www.yoox.com/kr [7] Shopbop, https://www.shopbop.com/ [8] Women’s E-commerce Clothing Reviews,

https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews

[9] Sentiment140 dataset with 1.6million tweets, https://www.kaggle.com/kazanova/sentiment140 [10] Justin Garten, Kenji Sagae, Volkan Ustun, Morteza

Dehghani, Combining Distributed Vector Representations for Words, Associations for Computational Linguistics, Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, Pages 95-101, 2015

396

-2020 온라인 춘계학술발표대회 논문집 제27권 제1호 (-2020. 5)

참조

관련 문서

 Given a minimum support s, then sets of items that appear in at least s baskets are called frequent itemsets.

 Learn the definition and properties of SVD, one of the most important tools in data mining!.  Learn how to interpret the results of SVD, and how to use it

 Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Com puting a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing,

 Communication cost = input file size + 2 × (sum of the sizes of all files passed from Map processes to Reduce processes) + the sum of the output sizes of the Reduce

 Because output is spread across R files (each reduce task creates one file in DFS).. Task

 Step 2: label each node by the # of shortest paths from the root E..

 Data stream consists of a universe of elements chosen from a set of size N.  Maintain a count of the number of distinct elements

 In fact, the minimum solution is given by y = 1 vector (the smallest eigenvector w/ eigenvalue 0); however, this does not say anything about the partition.  Thus, we find