SVM ὤⵌ㢌 Mention Pair Model㡸 㢨㟝䚐 䚐ạ㛨 ㇵ䝬㵬㦤䚨ᷤ
㺐ᷱ䝬O, ⵉ㷐㢀, 㢨㵱ὤ ᵉ㠄䚍Ẅ 䀨䘜䉤Ḱ䚍Ḱ
[email protected], [email protected], [email protected]
Coreference Resolution for Korean using Mention Pair with SVM
Kyoungho Choi
O, Cheoneum Park, Changki Lee Kangwon National University
㟈 㚱
㜵㛨㜄㉐⏈ ὤ᷸䚍㏩㡸 㢨㟝䚐 ㇵ䝬㵬㦤䚨ᷤⵝⷉ㢨 ␘㛅䚌᷀ ⊰㢌╌Ḕ 㢼␘. 䚌㫴⬀ 䚐ạ㛨㜄㉐⏈
ㇵ䝬㵬㦤䚨ᷤⱬ㥐⪰ ὤ᷸䚍㏩㡰⦐ 㥅ἰ䚐 㤸⥴⪰ 㵲㙸⸨ὤ 䣌☘ 㙸⏼⢰, 㜤Ạ⪰ 㠸䚨 ḩᵐ═
ㇵ䝬㵬㦤䚨ᷤ ⬄ⱽ㾌㦤㵜 㛺⏈ 㐘㥉㢨␘. ⸬ ⊰ⱬ㜄㉐⏈ 㐔ⱬὤ㇠㝴 Wiki㇠㤸㜄㉐ ὤⵌ䚐 197ᵐ㢌 ⱬ㉐⪰ ⺸㉑䚌㜠 ㇵ䝬㵬㦤䚨ᷤ 㥉⸨ᴴ Ὠ ⬄ⱽ㾌⪰ Ạ㻉䚼Ḕ, 㢌㦨Ạⱬ䏬⫠ᴴ 䔠䚜═ ⬄ⱽ㾌㜄㉐
Mention 㻈㻐, SVM㡸 ὤⵌ㡰⦐ 䚐 Mention Pair⯜⒬㡸 㢨㟝䚐 䚐ạ㛨 ㇵ䝬㵬㦤 䚨ᷤ 㐐㏘䊐㡸 Ạ㻉䚌㜴␘. 䚐ạ㛨 ㇵ䝬㵬㦤䚨ᷤ 㐐㏘䊐㢌 ㉥⏙㡸 MUC, B-cube, CEAFE⦐ ᴵᴵ 㽕㥉䚐 ᷤḰ 56.47%, 57.08%, 61.67% 㢌 F1 ㉥⏙㡸 ⸨㜴␘.
1. ㉐ ⦔
ㇵ䝬㵬㦤⢴ ⱬ㉐ ⇨㜄㉐ ␘⪬ ⮹㇠Ạᴴ ㉐⦐ ᵍ㡴 ᵐ⊄㢨⇌ ㇵ㡸 㫴㾡䚌⏈ ᶷ㡸 ⬄䚐␘. 䏭䢼 ⱬ㉐
⇨㜄㉐ 㙸㨰 ⽼ⶼ䚌᷀ ☥㣙䚌⏈ ⮹㇠㢌 ᷱ㟤 Ạⱬ⺸㉑ ㍌㨴㜄㉐ Ἤ ⮹㇠ᴴ 㫴㾡䚌⏈ ⵈᴴ ⱨ㛻㢬㫴 䞉㢬䚌ὤ 㛨⥩Ḕ, Ἤ ⯜䝬㉥ ⚀ⱬ㜄 㢨ⷘ䏬 㻈㻐, 㢌㜡ᷤ㥉 ☥㢌 ␘⪬ 㣄㜤㛬㛨㷌⫠ Ḱ㥉㜄㉐
㾌⮹㤵㢬 㝘⪌⪰ 㚰ὤ䚔 ㍌ 㢼␘. ㇵ䝬㵬㦤 䚨ᷤ㡴 ᵍ㡴
ㇵ㡸 㫴㾡䚌⏈ ⮹㇠Ạ☘㡸 䚌⇌㢌 Entity⦐ ⱪ㛨▄⏈
Ḱ㥉㡰⦐, ⮹㇠ ☥㢌 ⯜䝬㉥㡸 ㍌ⵌ䚐 ⮹㇠Ạᴴ ⵐ㈑㐐䇘⏈ ⱬ㥐⪰ 㨸㜠㨰Ḕ, ␘⪬ 㣄㜤㛬㛨㷌⫠
Ḱ㥉㜄㉐㢌 㝘⪌⪰ 㨸㜠㨴␘. 㜼⪰ ☘㛨 “[䂈↰⫠]䚌⮨
䢈䢼 [[Ỵ]ᴴ 䆤 䂈↰⫠]⪰ ⛔㝠⫤␘. 䚌㫴⬀ [㢬⓸ 䂈↰⫠]⏈ [Ỵ]ᴴ 㣅␘. [[㟤⫠ᴴ] ⛔㝠⫠⏈ Ỵᴴ 䆤 䂈↰⫠⏈] [㙸䘸⫠㾨 䂈↰⫠] 㢨␘.”⢰⏈ ⱬ㣙㢨 ㇵ䝬㵬㦤䚨ᷤ 㐐㏘䊐㡸 ᶤ㾌⮨, [䂈↰⫠], [Ỵᴴ 䆤 䂈↰⫠], [㙸䘸⫠㾨 䂈↰⫠]⪰ 䚌⇌㢌 Entity⦐ ⱪ㛨 ᵍ㡴 ㇵ㡸 㫴㾡䚐␘⏈ ᶷ㡸 ⵑ䣄 ㍌ 㢼␘.
⸬ ⊰ⱬ㜄㉐⏈ SVM[4]㡸 ὤⵌ㡰⦐ 䚐 Mention Pair⯜⒬㡸 ⵈ䈉㡰⦐ 䚐ạ㛨 ㇵ䝬㵬㦤 䚨ᷤ 㐐㏘䊐㡸 㥐㣅䚌Ḕ, 㐐㏘䊐㢌 䚍㏩Ḱ 䓽ᴴ⪰ 㠸䚨 ㇵ䝬㵬㦤䚨ᷤ
㥉⸨ᴴ Ὠ 197ⱬ㉐㢌 ⬄ⱽ㾌⪰ Ạ㻉䚼㡰⮤ MUC[1], B-cube[2], CEAFE[3]⪰ 㢨㟝䚌㜠 ㉥⏙㡸 㽕㥉䚌㜴␘.
2. Ḵ⥜㜤Ạ
ὤ㦨 㜤Ạ⦐ 㜵㛨㜄㉐⏈ multi-pass sieve 㐐㏘䊐㢨 㟤㍌䚐 ㉥⏙㡸 ⸨㢨⮤, ẠⱬẠ㦤㝴 㛨䡌㥉⸨⪰ 㵬㦤䚌⏈
ᷤ㥉⯜⒬㡸 㢨㟝䚌㜠 70.8%㢌 F1 B-cube㉥⏙㡸
⇼␘.[5] ❄ ὤ᷸䚍㏩ὤⵌ㢌 Mention Pair⯜⒬㡸 㢨㟝䚐 㜵ⱬ ㇵ䝬㵬㦤 䚨ᷤ 㐐㏘䊐㜄㉐⏈ 54.1% B-cube F1
㉥⏙㡸 ⸨㜴␘.[6]
3. Mention Pair Model㡸 㢨㟝䚐 䚐ạ㛨 ㇵ䝬㵬㦤䚨ᷤ
⸬ ⊰ⱬ㜄㉐⏈ SVM㡸 ὤ᷸䚍㏩㡰⦐ 㢨㟝䚌⏈
Mention Pair ⯜⒬ ὤⵌ㢌 䚐ạ㛨 ㇵ䝬㵬㦤䚨ᷤ
㐐㏘䊐㡸 Ạ㻉䚌㜴␘. 㐐㏘䊐㡴 䝉䈐㋀㝴 ᵐ㷨⮹ Ἤ⫠Ḕ 㢌㦨Ạⱬ⺸㉑ 㥉⸨⪰ 䔠䚜䚐 㣄㜤㛨 ⱬ㉐⪰ 㢹⥙㡰⦐
ⵏ㙸 Mention㡸 㻈㻐䚐␘. 㻈㻐═ Mention㡸 ⵈ䈉㡰⦐
Mention Pair㡸 ⬀☘㛨 㣄㫼☘㡸 㻈ᴴ, ㇠㤸㜄 䚍㏩═
SVM㡸 㢨㟝䚌㜠 Mention Pair㢌 㵬㦤㜠⺴⪰ ᷤ㥉䚐␘.
⫼㫴⫽㡰⦐ 㵬㦤╌㫴 㙾㡴 Mention☘㡸 㥐ᶤ䚌㜠 Entity⪰ Ạ㻉䚐␘.
3.1 Mention
Mention㡴 ὤ⸬㤵㡰⦐ Ạⱬ䏬⫠㜄㉐ ☥㣙䚌⏈ ⯜☔
⮹㇠Ạ⪰ 㾡䚐␘. ⸬ 㜤Ạ㜄㉐⏈ 㢌㦨 ⱬⷉ㡸 ὤⵌ㡰⦐
Ạ㻉═ ⬄ⱽ㾌⪰ ㇠㟝䚼␘.
2014년 한국컴퓨터종합학술대회 논문집
㢌㦨Ạⱬ䏬⫠㜄㉐⏈ Mention㡸 㻈㻐䚌ὤ㠸䚨 ⯜☔
⊬☐⪰ 㦤䟀䚐␘. Ἤ ␘㢀 䚨 ⊬☐ᴴ ⮹㇠Ạ㢨⮨ Ἤ
⊬☐㢌 㣄㐑⊬☐☘㡸 㦹␜ᾀ㫴 㻈㤵䚌㜠, ᴴ㣙 ⫠
⛜㛨㫸 ▄ 㛨㤼㡸 Mention㢌 㛅 ↑㡰⦐ 䚐␘. ⸬ 㜤Ạ㜄㉐⏈ Ạⱬ⺸㉑㢌 㝘⪌⦐ 㢬䚐 㣌⯯═
Mention㻈㻐㡸 ⵝ㫴䚌ὤ 㠸䚨 ␘㢀Ḱ ᵍ㡴 ⵝⷉ㡰⦐
ᵐ㷨⮹ 㥉⸨⪰ ㇠㟝䚐␘. Ạⱬ⺸㉑ᷤḰ⦐⺴䉤 㻈㻐═
Mention㢌 䚐㯱 ↑㢨 ᵐ㷨⮹㢨 䔠䚜═ 㛨㤼㜄 ᶬ㸄 㢼㡸 ᷱ㟤㜄 Mention㢌 ᶬ㸄㫸 ↑ 㛨㤼㡸 䚨
ᵐ㷨⮹㢌 ↑ 㛨㤼⦐ ⷴᷱ䚌㜠 㻈㻐䚐␘.
Ἤ⫰1. 㻈㻐═ Mention㢌 㜼㥐
Ἤ⫰1㷌⤰ “⁷㢌 ⯜㛅㢨 ㍌䇽㢌 ⸃Ḱ ⽸㏫䚌㜠 䚐㣄⦐⏈ ᷸Ḵ䞈(緑縯輈)⢰Ḕ 㘨␘.” ⱬ㣙㢌 Mention㡸 㻈㻐䚌⮨ Ἤ⫰1Ḱ ᵍ㢨 “⁷㢌”, “⁷㢌 ⯜㛅㢨”,
“㍌䇽㢌”, “㍌䇽㢌 ⸃Ḱ”, “䚐㣄⦐⏈”, “䚐㣄⦐⏈
᷸Ḵ䞈(緑縯輈)⢰Ḕ”, 㹑 6ᵐ㢌 ⮹㇠Ạ 㪽 Mention㢨 㻈㻐═␘.
3.2 Mention Pair
Mention Pair⏈ Soon 㢨 ὤ᷸䚍㏩㡰⦐ ㇵ䝬㵬㦤⪰
䚨ᷤ䚌ὤ 㠸䚨 㥐㙼䚐 ⵝⷉ㡰⦐[7], ⱬ㉐ ⇨㜄
☥㣙䚌⏈ ⯜☔ Mention㡸 ▄ ᵐ㙝 ⱪ㛨 Pair⪰ ⬀☘㛨 ㇵ䝬㵬㦤䚨ᷤ ⱬ㥐⪰ ⺸⪌ⱬ㥐⦐ 㷌⫠䚌⏈ ⵝⷉ㢨␘. 㢨 ⵝⷉ㜄 ♤⪨⮨ ᴵᴵ㢌 Pair⪰ ὤ㨴㡰⦐, 㵬㦤㜠⺴⪰
Label⦐ ㇠㟝䚌Ḕ, ᴵᴵ㢌 Mention㢌 㥉⸨㝴 ▄ Mentionᴸ㢌 Ḵ᷸㜄 䚐 㥉⸨⪰ 㣄㫼⦐ ㇠㟝䚐␘.
3.3 䚐ạ㛨 ㇵ䝬㵬㦤䚨ᷤ㡸 㠸䚐 㣄㫼 㻈㻐
㣄㜤㛬㛨 ⱬ㣙Ḱ 䚜 㢹⥙㡰⦐ ⵏ㡴 㢌㦨Ạⱬ⺸㉑ 㥉⸨㝴 ᵐ㷨⮹ 㥉⸨⪰ 㢨㟝䚌㜠 Mention Pair㜄 ♤⪬
㣄㫼㡸 㻈㻐䚐␘. 㢌㦨Ạⱬ 㥉⸨⪰ 㢨㟝䚌㜠 䚨
Mention㢌 㨰㛨 㜠⺴㝴, 㙼Ὠ Ạ 㜠⺴⪰ 䞉㢬䚌Ḕ, ᵐ㷨⮹ 㥉⸨㝴 ㉬㦹⬄ⱽ㾌㢌 ⮹㇠ ㇠㤸㡸 㵬Ḕ䚌㜠 Mention㢌 ㍌⣽, ⏙┍㉥, ⮹㇠, Ḕ㡔⮹㇠ 㜠⺴⪰
䞉㢬䚌㜠 㣄㫼⦐ ⬀☘㛼␘.
䖐1㡴 㐘䜌㜄 㫵㥅 ㇠㟝䚐 㣄㫼☘㢌 㢨⪸Ḱ ᴸ⣩䚐
㉘⮹㢨␘.
䖐1. Mention Pair㜄 ㇠㟝═ 㣄㫼
4. 㐘䜌
㐔ⱬὤ㇠㝴 Wikiⱬ㉐㜄㉐ 㛯㡴 㣄㜤㛨 ⱬ㉐⪰ ETRI㢌 㛬㛨⺸㉑ὤ⦐ 㷌⫠䚌㜠 䝉䈐㋀ ⺸㉑, ᵐ㷨⮹ 㢬㐑, 㢌㦨Ạⱬ⺸㉑ ᾀ㫴㢌 㣄㜤㛬㛨 㥉⸨⪰ 㷌⫠䚌Ḕ, 㚒㉐
㥉㢌䚐 ⦐ MentionḰ 㵬㦤⪰ 㜤Ạ㠄☘㢨 ㍌㣅㛹㡰⦐
㻈ᴴ䚨 ⬄ⱽ㾌⪰ ⬀☘㛼␘. 㹑 16ᵐ㢌 㐔ⱬὤ㇠㝴 181ᵐ㢌 Wikiⱬ㉐⪰ ⵈ䈉㡰⦐ Ạ㻉═ ⬄ⱽ㾌⏈ 456ᵐ㢌 Entity⦐ Ạ⺸═ 1206ᵐ㢌 Mention㡸 䔠䚜䚌Ḕ 㢼␘.
㐘䜌㡸 㠸䚨 ETRI㢌 䚐ạ㛨 㛬㛨㷌⫠ὤ⪰ 㢨㟝䚌㜠 䝉䈐㋀ ⺸㉑, 㢌㦨Ạⱬ⺸㉑, ᵐ㷨⮹ 㢬㐑㡸 ㍌䚽䚐 䟸, ᵐ㷨⮹ 㥉⸨㝴 㢌㦨Ạⱬ䏬⫠⪰ ⵈ䈉㡰⦐ Mention㡸 㻈㻐䚨⇬␘. 㻈㻐═ Mention㡸 ⵈ䈉㡰⦐ Mention Pair㡸
⬀☘Ḕ 㣄㫼㡸 㻈ᴴ䚨 1/5㡸 Test set㡰⦐ 4/5㡸 Train set㡰⦐ Ạ㻉䚐␘. Train set㡰⦐ Pegasos 㚀Ḕ⫠㫌[4]
㡰⦐ Ạ㻉═ SVM㡸 䚍㏩㐐䇘Ḕ, Test set㡰⦐ 㜼㽕䚼␘.
⫼㫴⫽㡰⦐ 㵬㦤╌㫴 㙾⏈ Mention☘㡸 ㇡㥐䚌㜠 䓽ᴴ㫴䖐㜄 㝘⪌ᴴ 㛺᷀ 䚼␘.
㐐㏘䊐㢨 Mention㡸 㣄┍㡰⦐ 㻈㻐䚌Ḕ 㵬㦤ᴴ 㢰㛨⇌㫴 㙾㡴 Mention㡸 ㇡㥐䚐 ᷤḰ⪰ 㥉㢌 MentionḰ ⽸Ẅ䚐 ㉥⏙㡸 䖐2⦐ ⇌䇴⇨㛼␘. 㐐㏘䊐㢌 ㇵ䝬㵬㦤 ㉥⏙㡸 ㇵ䝬㵬㦤䚨ᷤ 㐐㏘䊐㡸 㠸䚐 䓽ᴴ㫴䖐㢬 MUC[1]㝴 B-cube[2], Ἤ⫠Ḕ CEAFE[3]
2014년 한국컴퓨터종합학술대회 논문집
⦐ ⇌䇴⇨㛨 䖐3㜄 ᴵᴵ ⇌䇴⇨㛼␘.
䖐2. 㐐㏘䊐㢌 Mention㻈㻐 ㉥⏙
Recall Precision F1 Mention 56.59% 75.24% 64.60%
䖐3. 㐐㏘䊐㢌 ㇵ䝬㵬㦤䚨ᷤ ㉥⏙
MUC B-cube CEAFE Recall 48.00% 48.00% 56.61%
Precision 68.57% 70.40% 67.73%
F1 56.47% 57.08% 61.67%
Mention㻈㻐㢌 ㉥⏙㢨 ⛜㛨㫸 ᶷ㡴 㐐㏘䊐㜄㉐
ㇵ䝬㵬㦤⦐ 䓽ᴴ╌㫴 㙾㡴 Mention㢨 ㇡㥐═ 㜵䛙㢨 䆠␘. ❄䚐 ⯜☔ 㽕㥉㫴䖐㜄㉐ Precision㢨 Recall㜄
⽸䚨 ⋆᷀ ⇌䇴⇠Ḕ, 㢨⏈ 㐐㏘䊐㢌 㝘⺸⪌㡜㢨 ⇢㢀㡸 㢌䚐␘. Mention Pair㡸 㢨㟝䚐 㜵ⱬ ㇵ䝬㵬㦤䚨ᷤ㜄
⽸䚨 F1 ᵆ☘㢨 MUC⏈ 2.1% ⋆᷀ ⇌㞈Ḕ, B-cube⏈
2.98%, CEAFE⏈ 8.27% ⋆᷀ ⇌㞈␘. 䏭䢼 MUC㜄
⽸䚨 CEAFE㢌 ㉥⏙䛙ㇵ㢨 ▄☐⤠㫴⏈⒤, 㢨⏈
Mention㢨 㣌⯯═ Entity㜄 ㋁䚔 䞉⪔㢨 ⇢㙸㦀㢀㡸
⡯䚐␘.
5.ᷤ⦔
㢌㦨Ạⱬ 䏬⫠㜄㉐ MentionḰ 21ᵐ㢌 㣄㫼 㻈㻐䚌Ḕ, SVMὤⵌ㢌 Mention Pair ⯜⒬㡸 ㇠㟝䚌㜠 Ạ㻉䚐 ㇵ䝬㵬㦤䚨ᷤ 㐐㏘䊐㡸 䓽ᴴ䚼␘. 㜵ⱬ㜄㉐ ㍌䚽䚐 Mention Pair 㐐㏘䊐㜄 ⽸䚨 㤸ⵌ㤵 㣄㫼 ㍌ᴴ
⺴㦥䚜㜄⓸ ㉬ 㫴䖐㜄㉐ 㟤㍌䚐 ᷤḰ⪰ ⇌䇴⇼⏈⒤, 㢨⏈ 㜵ⱬ㜄 ⽸䚨 䚐ạ㛨ᴴ ⮹㇠㢌 ㇠㟝㢨 㤵Ḕ, 㵬㦤䚌⏈ ⮹㇠Ạ Head㢌 䝉䈐㢌 ⷴ䝉㢨 㤵ὤ ⚀ⱬ㢰 ᶷ㢨␘. 㻈䟸㜄⏈ 䚐ạ㛨㜄 㤵䚝䚐 㣄㫼㡸 ⒈ 㻈ᴴ䚌㜠
㉥⏙㡸 䛙ㇵ㐐䇘⏈ 㜤Ạ⪰ 㫸䚽䚌Ḕ, Mention-ranking, Cluster-ranking ⯜⒬☘⓸ 䚐ạ㛨 ㇵ䝬㵬㦤䚨ᷤ㜄 㤵㟝䚔 㜼㥉㢨␘.
ᵄ㇠㢌 ἴ
⸬ 㜤Ạ⏈ ⣌㵱㦤Ḱ䚍⺴ ⵃ 䚐ạ㛹ὤ㍔䓽ᴴḴ⫠㠄㢌
㛹㡩䚝㠄㷐ὤ㍔ ᵐⵐ㇠㛹(㥉⸨䋩㐔)㢌 㢰䞌㡰⦐ ㍌䚽䚌㜴㢀 [10044577, 䡨⭰ 㫴㐑㫑ᵉ ㉐⽸㏘⪰ 㠸䚐 㫴⏙㫸䞈䝉 WiseQA 䙀⣟䔰 ὤ㍔ᵐⵐ]
5.㵬Ḕ ⱬ䜀
[1] Marc Vilain, John Burger, John Aberdeen, Dennis Connolly, Lynette Hirschman, "A model-theoretic coreference scoring scheme.", Proceedings of the 6th conference on Message understanding. Association for Computational Linguistics, 1995.
[2] Bagga Amit, and Breck Baldwin. "Algorithms for scoring coreference chains.", The first international conference on language resources and evaluation workshop on linguistics coreference. Vol. 1. 1998.
[3] Luo Xiaoqiang. "On coreference resolution performance metrics.", Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2005.
[4] Lee, Changki, and Myung-Gil Jang. "Fast training of structured SVM using fixed-threshold sequential minimal optimization." , ETRI journal 31.2 , 121-128, 2009
[5] Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky ,"Stanford's multi-pass sieve coreference resolution system at the CoNLL-2011 shared task.", Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task.
Association for Computational Linguistics, 2011.
[6]Rahman Altaf, and Vincent Ng. "Supervised models for coreference resolution."Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2-Volume 2, 2009.
[7]Soon Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. "A machine learning approach to coreference resolution of noun phrases." Computational linguistics 27.4, 521-544, , 2001.