Legal Bert(2020) & LCube(2022)

Minhan Cho·2022년 8월 9일

LegalAI Papers Review

목록 보기

1/4

Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., & Androutsopoulos, I. (2020). LEGAL-BERT: The muppets straight out of law school. arXiv preprint arXiv:2010.02559.

Hwang, W., Lee, D., Cho, K., Lee, H., & Seo, M. (2022). A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction. arXiv preprint arXiv:2206.05224.

Legal-BERT

BERT는 pretrain에 대한 비용을 줄였으나, Wikipedia, Children's Books 등 generic corpora로 학습한 탓에 specialized domain(e.g. biomedical, scientific)에 대해서는 underperform하는 단점 있음
극복방안: (1) further pretrain BERT on domain specific corpora(FP), (2) pretrain BERT from scratch on domain specific corpora(SC, new vocab of su-word units)
특기할 만한 점: specialized domain에서는 smaller BERT 또한 BERT-base와 비슷한 performance 기록

Model Architecture

training corpora: English legal text(12GB) from legislation, court cases, contracts
- EU legislation, UK legislation, European Court of Justice(ECJ) cases, European Court of Human Rights(ECHR) cases, US court cases, US contracts
LEGAL-BERT-FP: BERT-base 이용하였으나, domain-specific corpora에 대해 500k step까지 pretrain
LEGAL-BERT-SC: BERT-base와 동일한 architecture(12 layers, 12 attention heads, 768 hidden units, 110M parameters), vocab size(30,000) 동일
- LEGAL-BERT-Small: 6 layers, 8 attention heads, 512 hidden units, 35M parameters

Training Session & Evaluation

LEGAL-BERT: 1M training steps, batch size 256
Legal NLP Task: text classification & sequence tagging(EURLEX57K, ECHR-CASES, Contracts-NER)

Result

Legal-BERT-Small이 Legal-BERT와 비슷한 performance 보임
BERT-BASE에 tuning 진행한 것보다는 further pretrain(BERT-BASE-FP), pretrian from scratch(BERT-BASE-SC)가 나았음

LCube

Model Architecture

based on GPT-2
training corpora: LBox-Open precedent corpus, Modu(Book Corpus), Wiki(Wikipedia)
tokenizing: BPE
LCube-base: 50k steps training, 12 layers, hidden units 768, # of attention head 12, batch size 512, parameter 124M
task: classification, summarization

Minhan Cho

multidisciplinary

다음 포스트

Legal Bert(2020) & LCube(2022)

LegalAI Papers Review

Legal-BERT

Model Architecture

Training Session & Evaluation

Result

LCube

Model Architecture

Interactive Mongolian Question Answer Matching Model Based on Attention Mechanism in the Law Domain (2022)

0개의 댓글

관련 채용 정보