Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., & Androutsopoulos, I. (2020). LEGAL-BERT: The muppets straight out of law school. arXiv preprint arXiv:2010.02559.
Hwang, W., Lee, D., Cho, K., Lee, H., & Seo, M. (2022). A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction. arXiv preprint arXiv:2206.05224.
Legal-BERT
- BERT는 pretrain에 대한 비용을 줄였으나, Wikipedia, Children's Books 등 generic corpora로 학습한 탓에 specialized domain(e.g. biomedical, scientific)에 대해서는 underperform하는 단점 있음
- 극복방안: (1) further pretrain BERT on domain specific corpora(FP), (2) pretrain BERT from scratch on domain specific corpora(SC, new vocab of su-word units)
- 특기할 만한 점: specialized domain에서는
smaller BERT
또한 BERT-base
와 비슷한 performance 기록
Model Architecture
- training corpora: English legal text(12GB) from legislation, court cases, contracts
- EU legislation, UK legislation, European Court of Justice(ECJ) cases, European Court of Human Rights(ECHR) cases, US court cases, US contracts
- LEGAL-BERT-FP:
BERT-base
이용하였으나, domain-specific corpora에 대해 500k step까지 pretrain
- LEGAL-BERT-SC:
BERT-base
와 동일한 architecture(12 layers, 12 attention heads, 768 hidden units, 110M parameters), vocab size(30,000) 동일
- LEGAL-BERT-Small
: 6 layers, 8 attention heads, 512 hidden units, 35M parameters
Training Session & Evaluation
LEGAL-BERT
: 1M training steps, batch size 256
- Legal NLP Task: text classification & sequence tagging(EURLEX57K, ECHR-CASES, Contracts-NER)
Result
Legal-BERT-Small
이 Legal-BERT
와 비슷한 performance 보임
BERT-BASE
에 tuning 진행한 것보다는 further pretrain(BERT-BASE-FP
), pretrian from scratch(BERT-BASE-SC
)가 나았음
LCube
Model Architecture
- based on GPT-2
- training corpora: LBox-Open precedent corpus, Modu(Book Corpus), Wiki(Wikipedia)
- tokenizing: BPE
- LCube-base: 50k steps training, 12 layers, hidden units 768, # of attention head 12, batch size 512, parameter 124M
- task: classification, summarization