A Multi-Task Benchmark for Korean Legal Langhage Understanding and Judgement Prediction

임재석·2025년 1월 2일

paper-study

목록 보기

21/23

Structure of Korean Precedent
- meta information
- gist of claim from plaintiffs in a civil case
- ruling
- reasoning
  - facts
  - claims
  - reasoning
  - decisions
The Redaction Process
- Anonymizing
Precedent Disclosure Status
- Courts' decision should be pubshed via online service

Precedent Corpus
- AI Hub 6k + LAW OPEN DATA 82k + Internal 65k
- 57% of LAW OPEN DATA consist of the trials of the Supreme Court (no factual issues)
Case Name
- 10k facts + case name
Statute
- facts + statute
LJP-Criminal
- facts + punishments(fine, imprisonment with labor, imprisonment without labor)
- Level 0 (type of punishment)
- Level 1 (degree of punishment in 3-scale, null/low/high)
- Level 2 (5-scale for fine, 6-scale for imprisonment)
- Level 3 (exact number) $\rightarrow$ Regression!
LJP-Civil
- fact + gist of claim + degrees of claim acceptance
- claim acceptance degree
  - claimed money from the gist of claim
  - approved money from ruling section
  - approved money / claimed money
- Level 1 (rejection / partial approval / full approval)
- Level 2 (13 categories)
- mt5-small + prompt-tuning for parsing expression (money provider / receiver / amount / litigation cost)
Summarization
- Supreme Court Decisions Report + Summary of Decision
- Ruling and Reasoning section

Domain specific corpus is critical in the classification and the summarization tasks
- pretrain with Precedent Corput only also performed well in domain adaptation
- in summarization task, LCUBE doesn't have an advantage over other models
  - this might be from the architecture difference between encoder-decoder model and decoder only model
  - LCUBE generated ~40% fewer tokens $\rightarrow$ ROUGE score is low
Domain adaptation is not helpful on legal judgement prediction tasks
- In LJP-Civil, without the facts, the model performance is close to a dummy baseline
Legal judgement prediction is challenging
- There is no one superior model

the first large-scale Korean legal AI benchmark and legal language model LCUBE
only considered precedents from the first level courts
- for simplicity in legal reasoning
didn't used plaintiffs and defendants claims
difficult to separate the claims from reasoning sections without error
didn't consider many important legal applications of AI