Stanford-OpenIE를 활용한 text -> triple (1)

이환희·2021년 5월 12일
0

Capstone Design

목록 보기
6/7

Stanford-OpenIE

https://github.com/philipperemy/Stanford-OpenIE-Python

https://nlp.stanford.edu/software/openie.html

  • 파이썬 버전은 pip install이 자꾸 오류남
  • java버전으로 진행

밑에 다운 받고

https://stanfordnlp.github.io/CoreNLP/

라이브러리에 추가해줌



wiki Korea 본문의 처음 6문장을 가지고 테스트함.

text = "Korea (officially the \"Korean Peninsula\") is a region in East Asia. Since 1945 it has been divided into the two parts which soon became the two sovereign states: North Korea (officially the \"Democratic People's Republic of Korea\") and South Korea (officially the \"Republic of Korea\"). Korea consists of the Korean Peninsula, Jeju Island, and several minor islands near the peninsula. It is bordered by China to the northwest and Russia to the northeast. It is separated from Japan to the east by the Korea Strait and the Sea of Japan (East Sea). During the first half of the 1st millennium, Korea was divided between the three competing states of Goguryeo, Baekje, and Silla, together known as the Three Kingdoms of Korea.";



Sentence #1: Korea (officially the "Korean Peninsula") is a region in East Asia.
1.0	<region>	<is in>	<East Asia> 
1.0	<Korea>	<is region in>	<East Asia>
1.0	<Korea>	<is>	<region> 

Sentence #2: Since 1945 it has been divided into the two parts which soon became the two sovereign states: North Korea (officially the "Democratic People's Republic of Korea") and South Korea (officially the "Republic of Korea").
1.0	<South Korea>	<Republic of>	<Korea>
1.0	<it>	<has>	<Since 1945 has divided into two parts>
1.0	<it>	<has>	<has divided>
1.0	<it>	<has>	<has divided into two parts>
1.0	<North Korea>	<Republic 's>	<Democratic People>
1.0	<it>	<has>	<Since 1945 has divided> 
1.0	<North Korea>	<Republic of>	<Korea>

Sentence #3: Korea consists of the Korean Peninsula, Jeju Island, and several minor islands near the peninsula.
1.0	<Korea>	<consists of>	<Korean Peninsula>

Sentence #4: It is bordered by China to the northwest and Russia to the northeast.
1.0	<It>	<is>	<bordered by China to northwest to northeast>
1.0	<It>	<is>	<bordered by China to northeast>
1.0	<It>	<is bordered by>	<China>
1.0	<It>	<is>	<bordered by China to northwest>

Sentence #5: It is separated from Japan to the east by the Korea Strait and the Sea of Japan (East Sea).
1.0	<It>	<is>	<separated from Japan by Korea Strait>
1.0	<It>	<is>	<separated to east by Korea Strait>
1.0	<It>	<is separated by>	<East Sea>
1.0	<It>	<is separated by>	<Korea Strait>
1.0	<It>	<is>	<separated from Japan to east by Korea Strait>

Sentence #6: During the first half of the 1st millennium, Korea was divided between the three competing states of Goguryeo, Baekje, and Silla, together known as the Three Kingdoms of Korea.
1.0	<Korea>	<was>	<During half of millennium divided between three states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was divided between>	<three competing states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three competing states>
1.0	<Korea>	<was>	<During first half of millennium divided between three states>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three states of Goguryeo known>
1.0	<Korea>	<was>	<During first half divided between three competing states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half divided between three competing states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half divided between three competing states of Goguryeo together known>
1.0	<Korea>	<was>	<During first half divided between three competing states of Goguryeo together known>
1.0	<Korea>	<was>	<During first half of millennium divided between three competing states of Goguryeo>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half of millennium divided between three competing states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three competing states>
1.0	<Korea>	<was divided During>	<half>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three competing states of Goguryeo together known>
1.0	<Korea>	<was divided between>	<three competing states>
1.0	<Korea>	<was>	<During first half divided between three states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three competing states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half divided between three competing states>
1.0	<Korea>	<was>	<During half divided between three states of Goguryeo>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three competing states of Goguryeo together known>
1.0	<Korea>	<was>	<During half of millennium divided between three competing states of Goguryeo known>
1.0	<Korea>	<was>	<During first half divided between three states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was divided between>	<three states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During first half divided between three competing states of Goguryeo known>
1.0	<Korea>	<was>	<During first half of millennium divided between three states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was divided between>	<three competing states of Goguryeo together known>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three competing states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half divided between three states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half divided between three competing states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half divided between three competing states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half of millennium divided between three states of Goguryeo>
1.0	<Korea>	<was>	<During first half of millennium divided between three competing states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During half divided between three states>
1.0	<Korea>	<was>	<During first half of millennium divided between three competing states of Goguryeo together known>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three competing states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was divided During>	<first half>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three states of Goguryeo>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three competing states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half of millennium divided between three competing states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three states of Goguryeo known>
1.0	<Korea>	<was>	<During first half of millennium divided between three states of Goguryeo known>
1.0	<Korea>	<was>	<During half divided between three competing states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During first half of millennium divided between three states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During half of millennium divided between three competing states>
1.0	<Korea>	<was divided During>	<first half of 1st millennium>
1.0	<Korea>	<was>	<During first half of millennium divided between three states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During first half divided between three states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half of millennium divided between three states of Goguryeo together known>
1.0	<Korea>	<was divided between>	<three states of Goguryeo>
1.0	<Korea>	<was divided During>	<half of 1st millennium>
1.0	<Korea>	<was>	<During half of millennium divided between three states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three states of Goguryeo>
1.0	<Korea>	<was>	<During first half divided between three states of Goguryeo together known>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was divided During>	<first half of millennium>
1.0	<Korea>	<was divided between>	<three states>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three competing states of Goguryeo>
1.0	<Korea>	<was divided between>	<three competing states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half of millennium divided between three competing states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half divided between three competing states>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three states>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three competing states of Goguryeo known>
1.0	<Korea>	<was>	<During first half divided between three competing states of Goguryeo>
1.0	<Korea>	<was>	<During first half of millennium divided between three competing states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half of millennium divided between three states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During half divided between three competing states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During half divided between three states of Goguryeo known>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three competing states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During first half of millennium divided between three competing states>
1.0	<Korea>	<was>	<During first half divided between three competing states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was divided between>	<three competing states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During first half of millennium divided between three states of Goguryeo together known>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three competing states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three states of Goguryeo together known>
1.0	<Korea>	<was>	<divided>
1.0	<Korea>	<was>	<During half divided between three states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was divided between>	<three states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was divided During>	<half of millennium>
1.0	<Korea>	<was divided between>	<three states of Goguryeo together known>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half of millennium divided between three competing states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During half of millennium divided between three competing states of Goguryeo together known>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three competing states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was divided between>	<three competing states of Goguryeo known>
1.0	<Korea>	<was>	<During half divided between three states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During half of millennium divided between three states of Goguryeo>
1.0	<Korea>	<was>	<During half of millennium divided between three competing states of Goguryeo>
1.0	<Korea>	<was divided between>	<three competing states of Goguryeo together known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During first half of millennium divided between three states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was divided between>	<three states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three competing states of Goguryeo known>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half divided between three states of Goguryeo known>
1.0	<Korea>	<was>	<During first half divided between three states>
1.0	<Korea>	<was>	<During first half divided between three states of Goguryeo>
1.0	<Korea>	<was>	<During half divided between three states of Goguryeo together known>
1.0	<Korea>	<was>	<During half of millennium divided between three competing states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During half of 1st millennium divided between three competing states of Goguryeo>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was divided between>	<three competing states of Goguryeo>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three states>
1.0	<Korea>	<was>	<During half divided between three competing states of Goguryeo known>
1.0	<Korea>	<was>	<During half of millennium divided between three states of Goguryeo known>
1.0	<Korea>	<was divided between>	<three states of Goguryeo known>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three competing states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During half divided between three states of Goguryeo together known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half of 1st millennium divided between three states of Goguryeo together known>
1.0	<Korea>	<was>	<During first half of millennium divided between three competing states of Goguryeo known>
1.0	<Korea>	<was divided between>	<three states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half divided between three states of Goguryeo known as Three Kingdoms of Korea>
1.0	<Korea>	<was>	<During half of millennium divided between three states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During first half divided between three competing states of Goguryeo known as Three Kingdoms>
1.0	<Korea>	<was>	<During half divided between three competing states of Goguryeo>
1.0	<Korea>	<was>	<During half of millennium divided between three states>
1.0	<Korea>	<was>	<During half of millennium divided between three competing states of Goguryeo known as Three Kingdoms of Korea>

여러개의 트리플이 뽑아져 나오는데

어떤걸 트리플로 선정해서 매칭시켜야 할까.. 고민중



프로퍼티 값을 변경

이런식으로 프로퍼티값을 조정할 수 있음

Properties props = PropertiesUtils.asProperties(
                "annotators", "tokenize,ssplit,pos,lemma,parse,natlog,openie"
        );
        props.setProperty("openie.max_entailments_per_clause","100");
        props.setProperty("openie.triple.strict","false");
Sentence #1: Korea (officially the "Korean Peninsula") is a region in East Asia.
1.0	<region>	<is in>	<East Asia>
1.0	<Korea>	<is region in>	<East Asia>
1.0	<Korea>	<is>	<region>

Sentence #2: Since 1945 it has been divided into the two parts which soon became the two sovereign states: North Korea (officially the "Democratic People's Republic of Korea") and South Korea (officially the "Republic of Korea").
1.0	<it>	<has>	<has divided>
1.0	<it>	<has divided Since>	<1945>
1.0	<it>	<has divided into>	<two parts>

Sentence #3: Korea consists of the Korean Peninsula, Jeju Island, and several minor islands near the peninsula.
1.0	<Korea>	<consists of>	<Korean Peninsula>

Sentence #4: It is bordered by China to the northwest and Russia to the northeast.
1.0	<It>	<is bordered by>	<China>
1.0	<It>	<is bordered to>	<northeast>

Sentence #5: It is separated from Japan to the east by the Korea Strait and the Sea of Japan (East Sea).
1.0	<It>	<is separated from>	<Japan>
1.0	<It>	<is separated to>	<east by Korea Strait>
1.0	<It>	<is separated to>	<east>
1.0	<It>	<is>	<separated>

Sentence #6: During the first half of the 1st millennium, Korea was divided between the three competing states of Goguryeo, Baekje, and Silla, together known as the Three Kingdoms of Korea.
1.0	<Korea>	<was divided During>	<first half>
1.0	<Korea>	<was>	<divided>
1.0	<Korea>	<was divided between>	<three competing states of Goguryeo>
1.0	<Korea>	<was divided During>	<half of millennium>
1.0	<Korea>	<was divided During>	<half>
1.0	<Korea>	<was divided During>	<first half of 1st millennium>
1.0	<Korea>	<was divided between>	<three states of Goguryeo>
1.0	<Korea>	<was divided between>	<three competing states>
1.0	<Korea>	<was divided During>	<half of 1st millennium>
1.0	<Korea>	<was divided During>	<first half of millennium>
1.0	<Korea>	<was divided between>	<three states>

여러 프로퍼티들을 수정해본 결과 어느 정도 잘 수행되는 것 같으나



Korea consists of the Korean Peninsula, Jeju Island, and several minor islands near the peninsula.
1.0	<Korea>	<consists of>	<Korean Peninsula>

쉽표 처리를 못해주는점



Sentence #5: It is separated from Japan to the east by the Korea Strait and the Sea of Japan (East Sea).
1.0	<It>	<is separated from>	<Japan>
1.0	<It>	<is separated to>	<east by Korea Strait>
1.0	<It>	<is separated to>	<east>
1.0	<It>	<is>	<separated>

it 과 같은 대명사 처리



동음이의어 처리등 아직 해결해야할게 많이 남아있다.

입력으로 들어가는 텍스트의 전처리과정으로 정제작업을 하고

stanfordOpenIE의 코드를 분석해 더 정확한 트리플 매칭을 할 수 있도록 수정하는게 좋을듯

0개의 댓글