Discourse
- Discourse covers linguistic expression beyond the boundary of the sentence -> 문장의 밖에서도 의미가 전달 됨
1) Dialogues : the structure of turns in conversation -> 대화
2) Monologues : the structure of entire passages, documents -> 하나의 문장이 쭉 이어진 독백
Coreference
data:image/s3,"s3://crabby-images/7e9d1/7e9d1688a1fc86d33970c1169fe6cc3b137dcb91" alt=""
=> You!, your father, you, him, I, your father 이 각각 누구를 가르키는지를 알아내는 게 coreference 이다.
data:image/s3,"s3://crabby-images/f345f/f345f9a61995eefe1d848d4d859b478869369fd2" alt=""
data:image/s3,"s3://crabby-images/8e615/8e6156cffcb717ddd4b1ca1a573deb7d759e84c9" alt=""
data:image/s3,"s3://crabby-images/afd28/afd2883501985ad128be048b77bb8f2d15fea4ca" alt=""
=> she, her, it, that 등이 고유명사 entities(VICTORIA CHEN, MEGABUCKS, LOTSABUCKS) 중에서 뭐를 가리키는지
=> company, 37-year-old, president ... 등이 뭐를 가리키는지
Event Coreference
data:image/s3,"s3://crabby-images/9ed8f/9ed8fb61eb9f98c9a7efce3899c27f5869d774af" alt=""
Verb semantics
data:image/s3,"s3://crabby-images/46582/4658232c69d673109d0d2750046a2dda6e19d180" alt=""
=> 지칭하는 대상이 다를 수도 있다.
Selectional restrictions
data:image/s3,"s3://crabby-images/57981/57981f80e2b5d4ca95e87e9b3a04b38fbba39d21" alt=""
=> 파란색 동그라미 = mention
Mention Detection
- Mention 후보들을 다 뽑아놓기 (고유명사 후보를 뽑기)
- All NPs, possessive pronouns, and named entity mentions are candidate mentions. Recall is more important that precision -> 재현율이 정밀도 보다 더 중요하다.
Mention 방법 : rule-base
data:image/s3,"s3://crabby-images/7d745/7d745387068aab5f5de04fad6561c00a98665fd4" alt=""
=> 여러 단계의 filter 를 거쳐서 결과를 낸다.
=> Speaker Sieve : 화자, String Match : John-John, Relaxed String Match : 애칭, Strict Head Match B,C : 같은 문장 구조
Mention-ranking models
data:image/s3,"s3://crabby-images/d6d07/d6d07472f1d97e28719b3ddd071ed40394d18794" alt=""
=> 처음부터 끝까지 내려가면서 link 인지 not-link인지 확인하면서 classification 한다.
data:image/s3,"s3://crabby-images/58674/586747e3489171660b9bc96a00125c60def4ef99" alt=""
- The core machinery in a mention-ranking model is parameterizing the probability of a link between two mentions
Featurized
data:image/s3,"s3://crabby-images/95376/953765c583790d8e26c7be5b6938defe4e95ae71" alt=""
=> i : feature, ai : mention, x : input
- Features use information about the mention type(nominal, proper, pronoun), first/last word of mention, complete mention string, words immediately to left/right of mention, distance between mentions.
- Decision to link to antecedent ai is based on a linear scoring function involving a set of learned weights w and a feature function f.
- Mention 과 input의 연관성을 볼 건데, 여러 feature를 넣어주고 weight 를 조정해준다.
Neural coref
data:image/s3,"s3://crabby-images/a4537/a4537c8b6d6f2c171f74299dd38ebadf957cf2ba" alt=""
=> LSTM : 순차적으로, 두 Mention이 연결됐는지 아닌지 확인한다.
data:image/s3,"s3://crabby-images/b76f2/b76f2a7af525da4b57be2c420f5f4f09acc61308" alt=""
- Representation for mention =
- BiLSTM output for first token in mention
- BiLSTM output for last token in mention
- Attention over BiLSTM output for all tokens in mention
- Features : size of the mention
data:image/s3,"s3://crabby-images/4e4b6/4e4b6666337503f5be8a5639b03fa860900578b4" alt=""
- Representaion for mention pair (mi, mj) :
- mi representaiton gi
- mj representation gj
- elementwise
product of gi and gj
- Features scoped over pair : distance between mi and mj
=> LSTM의 여러 과정을 score 매겨서 softmax 로 classification 한다.
data:image/s3,"s3://crabby-images/22947/229477d971a12d3c385354f06962ca1f665e055a" alt=""
data:image/s3,"s3://crabby-images/61e4a/61e4afd0e05d780f6bd704a350a48bfd37dc3c62" alt=""
=> 0 ~ 8 로 갈수록 distance가 멀어짐 - 그 때마다 weight parameter
[참고]
data:image/s3,"s3://crabby-images/c31bd/c31bd1681c5fefea39452c02ad7ffcc6ea597ce0" alt=""
Evaluation
data:image/s3,"s3://crabby-images/790a5/790a57c8a5d9f4c3f764b4cc1d49f7676954b9b5" alt=""
data:image/s3,"s3://crabby-images/cd244/cd244d2a45f6c6e0ce56991faf98dbcf5bd3250c" alt=""
data:image/s3,"s3://crabby-images/e059c/e059cc0e78a844406ebee9f5670a646c1862c96e" alt=""
=> 왼쪽이 예측, 오른쪽이 정답 -> 하나씩 내려가면서 정확도/정밀도 등 평가