h 태그: 숫자에 따른 글자 크기 변화와 의미
b 태그와 strong 태그의 기능적 차이
html에서 a 태그와 href 속성의 역할은?
<a href="url">: 하이퍼링크를 만들 때 사용하는 태그<a href="https://www.naver.com">네이버로 이동</a>라이브 서버 단축키: alt+L+O

https://ko.wikipedia.org/wiki/%EA%B4%91%EC%A3%BC%EA%B4%91%EC%97%AD%EC%8B%9C 사용
p*2 하면 <p></p> 두 개 생성됨
1교시 정리
- 핵심 개념
- html은 웹페이지의 구조를 만드는 마크업 언어
- 태그와 속성으로 구성
- a 태그는 하이퍼링크를 만듦
- 반드시 href 속성을 가지고 있어야 함 (url 포함)
- 핵심 단어
- a 태그
- href
- 하이퍼링크
- 요약
- html 기본 구조와 주요 태그 복습
- 하이퍼링크: a 태그와 href 속성
- url 작성 시 http/https 프로토콜 필수
- html 문서 작성 시 가독성 좋은 스타일
- 기본 페이지 구성법
ol 태그와 ul 태그의 차이점
이미지 태그에서 절대 경로와 상대 경로의 차이
부모, 자식, 형재 태그 관계
<ol>: Ordered list<li>: list<ul>: Unordered list<li>: list<li>와 함께 써야 함추가: 태그 가독성 문제 - 코딩 스타일과 가독성
- 영역부터 먼저 만들고 내용을 채우는 방식 권장
- 코드 작성 시 탭과 들여쓰기를 통해 가독성 향상
- 들여쓰기를 하지 않아도 오류가 발생하지 않으나 한 줄로 적으면 가독성이 매우 떨어짐
- 들여쓰기를 마음대로 해도 오류는 발생하지 않으나 역시 가독성이 매우 떨어짐
- 따라서 태그 포함 관계를 명확히 하기 위해 반드시 들여쓰기를 하는 것이 권장됨
추가: 부모-자식-형제 태그 관계
- 부모 태그는 자신을 감싸고 있는 태그, 자식 태그는 안에 포함된 태그를 의미
- 같은 부모를 가진 태그는 형제 태그 관계
- 위에서 ul이 부모, li가 자식, li끼리는 형제 관계



추가: 웹 디자인 팁
- 웹 디자인에서 심플함의 중요성
- 배경은 흰색 또는 미색 사용
- 글자는 검은색
- 흰색 배경에 검은색 글자 조합이 가장 깔끔하고 무난
- 포인트 색상만 사용하는 것이 깔끔
- 배경 이미지 사용은 권장하지 않음
- 단색 배경(흰색 혹은 옅은 색) 사용하는 것이 좋음

2교시 정리
- 핵심 개념
- ol, ul은 순서 유무에 따른 리스트 태그
- 이미지 태그는 src 속성으로 파일 경로를 지정해야 함
- 핵심 단어
- ol, ul, li
- 부모-자식 관계
- img, src
- 절대 경로, 상대 경로
- CDN(Content Delivery Network)
- 강의 요약
- ol 태그와 ul 태그의 차이점
- 태그 안에 태그가 포함되어 있을 때 들여쓰기의 중요성
- 이미지 태그의 구조와 src 속성
- 파일 로드 방식과 CDN
- 절대 경로롸 상대 경로의 개념
- 폴더 구조에 따른 경로 지정 방법
HTML>HTML2>HTML3 폴더가 있고 HTML3에서 HTML까지 나가서 이미지를 찾는 경로라면 ../../HTML/image.jpg 형태
VisualStudioCode 설정: 폴더 경로 표시 형식 변경

<table>, 행 → <tr> , 열: 컬럼명 → <th>, 열: 데이터 → <td><table><tr><th><td>

3교시 정리
- 핵심 개념
- 상대 경로는 현재 파일 위치 기준으로 폴더 진입과 나가기를 점과 슬래시로 표현
- 절대 경로는 루트 폴더 기준으로 고정된 경로를 사용
- html 표는
<table>,<tr>,<th>,<td>네 가지 태그로 구성되며 행과 열로 데이터를 구조화함
- 표의 테두리와 배경색 등 디자인 요소를 다루는 속성은 옛날 방식으로 현대적 방식은 CSS를 이용해 처리
- 핵심 단어
- 상대 경로
- 절대 경로
- 루트 폴더
- html 표
- 요약
- 상대 경로는 현재 파일 위치 기준으로 폴더 진입과 나가기를 점과 슬래시로 표현
- 절대 경로는 루트 폴더 기준으로 고정된 경로를 사용
- html 표는
<table>,<tr>,<th>,<td>네 가지 태그로 구성되며 행과 열로 데이터를 구조화함- 표 디자인은 과거 방식인 html 속성 대신 CSS 사용을 권장
start_positions, end_positionoffset_mapping: 현재 단어가 본문에서 몇 번째 문자 범위인지 알려주는 정보| 항목 | 설명 |
|---|---|
| 🎯 목적 | 질문 + 지문(context)을 토큰화하고, 정답의 시작/끝 위치를 토큰 인덱스로 변환 |
| 📎 입력 구성 | [CLS] question [SEP] context [SEP] 형태 |
| 🧭 위치 변환 | 정답의 문자 위치 → 토큰 위치 변환 (offset_mapping 사용) |
| ⚠️ 주의점 | context가 잘릴 경우 정답이 사라질 수 있음 → 이때 start=0, end=0 처리 |
# offset_mapping
[
(0, 0), # [CLS] → 특수 토큰
(0, 5), # "대한민국"
(5, 6), # "의"
(7, 9), # "수도"
(9, 11), # "는"
(12, 15), # "어디"
(15, 17), # "인가"
(17, 18), # "?"
(0, 0), # [SEP] → 특수 토큰
(0, 5), # "대한민국"
(5, 6), # "의"
(7, 9), # "수도"
(9, 11), # "는"
(12, 14), # "서울"
(14, 16), # "이다"
(16, 17), # "."
(0, 0), # [SEP]
(0, 0), (0, 0), ... # 패딩
]
| Key | 설명 |
|---|---|
input_ids | 토큰 ID로 변환된 문장 |
token_type_ids | 0(question), 1(context)로 구분 |
attention_mask | 실제 토큰이면 1, 패딩이면 0 |
offset_mapping | 각 토큰이 원래 문장의 어느 문자 구간인지 (start, end 인덱스) |
def preprocess_function (example):
# 질문의 앞뒤 공백 제거
question = [q.strip() for q in example["question"]]
inputs = tokenizer(
question # 질문
, example["context"] # 본문
, max_length=384
, truncation="only_second" # question은 자르지 않고 context만 자르겠다는 뜻
, padding="max_length"
, return_offsets_mapping=True # 각 토큰이 원문에서 어느 문자 범위였는지 반영
)
# offset_mapping: 원본 본문 → 실제 문자 위치 정보
offset_mapping = inputs.pop("offset_mapping") # offset_mapping은 학습에서는 사용 x → 오류 발생 최소화를 위해 제거
answers = example["answers"]
# 모델이 학습할 수 있도록 정답의 시작 위치, 끝 위치 list에 저장하기
start_positions = []
end_positions = []
# 정답의 시작 위치, 끝 위치 담기
for i, offset in enumerate(offset_mapping): # i는 미니배치 내에서 몇 번째 샘플인지를 의미
answer = answers[i] # i번째 샘플의 정답
# 정답의 시작 위치
start_char = answer["answer_start"][0]
# 정답의 끝 위치
end_char = start_char + len(answer["text"][0])
# sequence_ids → 각 토큰이 question(0)인지 context(!)인지 정보를 가지고 있음
# sequence_ids = [None, 0, 0, 0, 0, 0, 0, 0, None, 1, 1, 1, 1, 1, 1, 1, 1, 1, None, None, ..., None]
sequence_ids = inputs.sequence_ids(i)
idx = 0
# context 시작 위치를 찾기
while sequence_ids[idx] != 1: # question
idx += 1
context_start = idx
# context 끝 위치를 찾기
while sequence_ids[idx] == 1: # context
idx += 1
context_end = idx - 1
# 정답에 해당하는 시작, 끝 토큰 인덱스(start_positions, end_positions)를 찾아서 담아주기
# 정답이 토큰화 후 결과 context 안에 들어있어야지 positioning을 할 수 있음
# 정답이 context 영역 안에 존재하는지 확인 (존재하지 않으면 (0,0)값 넣기)
if offset[context_start][0] > end_char or offset[context_end][1] < start_char: # offset → i번째 샘플의 "토큰별 (시작, 끝) 문자 위치 튜플 리스트"
start_positions.append(0)
end_positions.append(0)
else: # 정답이 context 안에 있는 경우
idx = context_start
# 정답의 시작 값 찾기(start_positions)
while idx <= context_end and offset[idx][0] <= start_char:
idx += 1
start_positions.append(idx-1) # 반복문 종료 시점은 정답보다 한 칸 후 → 이전 토큰 시점을 저장해야 함
# 정답의 끝 값 찾기(end_positions)
idx = context_end # 끝에서부터 앞으로 계산
while idx >= context_start and offset[idx][1] >= end_char:
idx -= 1
end_positions.append(idx+1) # 반복문 종료 시점은 정답보다 한 칸 앞: 역방향 계산 → 이전 토큰 시점을 저장
# Hugging Face 모델 학습용 딕셔너리 형태로 넣어주기
inputs["start_positions"] = start_positions
inputs["end_positions"] = end_positions
return inputs
추가: while 루프로 인덱스 찾기 말고도 파이썬에서 다른 간단하고 직관적인 방법들이 존재함
리스트 컴프리헨션 + enumerate 사용
indices 리스트에 1의 위치만 모으고, 시작/끝을 바로 뽑으면 됩니다.sequence_ids = [None, 0, 0, 0, 0, 0, 0, 0, None, 1, 1, 1, 1, 1, 1, 1, 1, 1, None, None]
indices = [i for i, x in enumerate(sequence_ids) if x == 1]
context_start = indices[0]
context_end = indices[-1]
index(), rindex() 사용
index(): 처음 등장하는 인덱스 list.reverse() + index() 또는 len-1-list[::-1].index(1)로 마지막 인덱스 구하기context_start = sequence_ids.index(1)
context_end = len(sequence_ids) - 1 - sequence_ids[::-1].index(1)
numpy 사용
np.where 이용 가능합니다.import numpy as np
arr = np.array(sequence_ids)
indices = np.where(arr == 1)[0]
context_start = indices[0]
context_end = indices[-1]
itertools 사용
from itertools import dropwhile
# 시작 인덱스
context_start = next(i for i, x in enumerate(sequence_ids) if x == 1)
# 끝 인덱스
context_end = len(sequence_ids) - 1 - next(i for i, x in enumerate(reversed(sequence_ids)) if x == 1)
요약: 원하는 스타일에 따라 선택하자
start_char = answer["answer_start"][0] 에서 [0]의 숫자를 0, 1, 2로 증가시키는 for 문!오프셋 매핑(offset_mapping)이 학습에 사용되지 않는 이유는?
정답의 시작 위치와 끝 위치를 찾는 과정에서 컨텍스트 내 포함 여부를 어떻게 판단하나요?
while 문 반복 조건에서 idx 값과 정답 위치의 관계는 어떻게 설정되나요?
입력 데이터에서 앞뒤 공백 제거 후 토크나이저에 적용하여 토큰화 진핸: question = [q.strip() for q in example["question"]]
오프셋 매핑과 정답 위치 처리
| 입력 구성 | offset_mapping | idx | sequence_ids |
|---|---|---|---|
[CLS] → 특수 토큰 | (0, 0) | 0 | None |
| "대한민국" | (0, 5) | 1 | 0 |
| "의" | (5, 6) | 2 | 0 |
| "수도" | (7, 9) | 3 | 0 |
| "는" | (9, 11) | 4 | 0 |
| "어디" | (12, 15) | 5 | 0 |
| "인가" | (15, 17) | 6 | 0 |
| "?" | (17, 18) | 7 | 0 |
[SEP] → 특수 토큰 | (0, 0) | 8 | None |
| "대한민국" | (0, 5) | 9 | 1 |
| "의" | (5, 6) | 10 | 1 |
| "수도" | (7, 9) | 11 | 1 |
| "는" | (9, 11) | 12 | 1 |
| "서울" | (12, 14) | 13 | 1 |
| "이다" | (14, 16) | 14 | 1 |
| "." | (16, 17) | 15 | 1 |
[SEP] | (0, 0) | 16 | None |
| 패딩 | (0, 0) | 17 | None |
start_position, end_position 키로 저장더 알아보기:
inputs가 딕셔너리인 이유와,inputs["start_positions"] = start_positions와 같이 데이터를 할당하는 의미
1. Hugging Face의 사전처리 결과 (inputs)
Hugging Face의tokenizer를 호출하면 반환값은 파이썬 딕셔너리 타입입니다.
예를 들어, 아래와 같이 나옵니다(예시):inputs = tokenizer( question, example["context"], max_length=384, truncation="only_second", padding="max_length", return_offsets_mapping=True ) print(inputs.keys()) # dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'offset_mapping'])각 키는:
input_ids: 토큰화된 숫자 IDtoken_type_ids: segment 구분 정보(question/context)attention_mask: 마스킹 정보offset_mapping: 원본 문자 위치 정보즉, tokenizer의 Output은 딕셔너리 구조입니다.
- start_positions / end_positions를 딕셔너리에 넣는 이유
inputs["start_positions"] = start_positions는
기존의 tokenizer 결과 딕셔너리에
정답의 시작위치, 끝위치 인덱스 정보를 추가하는 것입니다.
이렇게 하는 목적은?
- 모델 학습 시 Hugging Face Trainer 같은 자동화된 학습 도구는 “start_positions”, “end_positions” 등이 필드로 있는 딕셔너리 구조를 기대합니다.
- 즉, 모델은
- “input_ids”, “attention_mask”, … (입력값)
- “start_positions”, “end_positions” (타겟 레이블값)를 함께 필요로 합니다.
- 요약
- tokenizer 출력이 이미 딕셔너리이므로 필요한 추가 정보(정답 위치)도 key-value쌍으로 넣는 것.
- 여러 필드(입력/정답)들을 한 번에 모델로 넘기려면 딕셔너리에 다 담아서 Trainer/모델에 넘깁니다.
즉, inputs는 tokenizer와 이후 처리 결과를 담은 딕셔너리이기 때문에, start_positions, end_positions도 키-값 쌍으로 추가하는 게 자연스럽습니다.
tokenized_sq = dataset.map(preprocess_function, batched=True)
# 훈련 준비
# 모델 불러오기
from transformers import AutoModelForQuestionAnswering
model = AutoModelForQuestionAnswering.from_pretrained(checkpoint)
# 트레이너 생성, 학습 파라미터 설정
from transformers import TrainingArguments, Trainer
# 학습 파라미터
training_args = TrainingArguments(
output_dir="./results/klue-mrc_koelectra_qa_model"
, eval_strategy="epoch"
, learning_rate=2e-5
, per_device_train_batch_size=16
, per_device_eval_batch_size=16
, num_train_epochs=3
, weight_decay=0.01
, push_to_hub=False
)
# 트레이너
trainer = Trainer(
model=model
, args=training_args
, train_dataset=tokenized_sq["train"]
, eval_dataset=tokenized_sq["test"]
, processing_class=tokenizer
)
# 학습
trainer.train()

5교시 정리
- 핵심 개념
- 오프셋 매핑은 토크나이저 처리 후 원문 내 정답 위치를 정확히 찾기 위한 시작과 끝 위치 정보를 제공
- 핵심 단어
- 오프셋 매핑
- 컨텍스트 위치와 정답 위치
- 요약
- 오프셋 매핑 정보를 통해 토크나이저 처리 후에도 원문 내 정답 위치를 정확히 파악하고 학습에 필요한 시작과 끝 위치를 저장
- 정답 위치가 컨텍스트 내에 포함되는지 확인 후 포함 시 반복문으로 시작과 끝 위치를 탐색하여 저장
# 학습한 결과를 허깅페이스에 업로드
%cd /content/drive/MyDrive/Colab Notebooks/NLP
# 허깅페이스 로그인
from huggingface_hub import login
# 파일 형태의 api_key 불러오기
with open("./key/huggingface_api_key", 'r') as f:
api_key = f.read().strip()
login(token=api_key)
# 허깅페이스 업로드
repo_id = "유저명/klue-mrc-koelectra-qa-mode"
trainer.save_model(repo_id)
model.save_pretrained(repo_id)
tokenizer.save_pretrained(repo_id)
trainer.push_to_hub(repo_id)
# task="question-answering"
from transformers import pipeline
checkpoint_mymodel="be2be2/klue-mrc-koelectra-qa-mode"
question_answerer = pipeline(
task="question-answering"
, model=checkpoint_mymodel
, tokenizer=checkpoint_mymodel
)
question = "RAG의 장점은 ?"
context = """RAG는 또한 새로운 데이터로 LLM을 재훈련할 필요성을 줄여 계산 및 재정 비용을 절감한다.
효율성 향상 외에도 RAG는 LLM이 응답에 출처를 포함할 수 있도록 하여 사용자가 인용된 출처를 확인할 수
있도록 한다. 이는 사용자가 검색된 콘텐츠를 교차 확인하여 정확성과 관련성을 확인할 수 있으므로
투명성이 향상된다."""
result = question_answerer(question=question, context=context)
result
{'score': 0.0010380235617049038, 'start': 64, 'end': 73, 'answer': 'RAG는 LLM이'}
| split 이름 | 설명 |
|---|---|
"train" | 미국 연방 법안으로 구성된 훈련용 데이터 (대부분의 모델 학습에 사용됨) |
"ca_test" | 캘리포니아 주 법안으로 구성된 테스트용 데이터 (도메인 일반화 성능 평가용) |
from datasets import load_dataset
load_dataset("billsum") # 일반화 정도를 확인하기 좋은 데이터셋
# 우리는 캘리포니아 주 법안만 활용하여 학습/평가로 분리
billsum = load_dataset("billsum", split="ca_test") # 1237개 데이터 활용
billsum["train"][0]
{'text': 'The people of the State of California do enact as follows:\n\n\nSECTION 1.\n(a) The Legislature finds and declares that the oversight boards to individual successor agencies were established pursuant to the Redevelopment Agency Dissolution Act, which prescribes that all oversight boards in the County of Los Angeles will be consolidated into a single countywide oversight board by July 1, 2016.\n(b) The Legislature further finds that collapsing all functions of the 71 oversight boards in the County of Los Angeles into a single countywide oversight board would create administrative gridlock and be a severe impediment to the expeditious disposition of properties owned by former redevelopment agencies.\n(c) In recognition of these findings and to ensure that the duties of the 71 oversight boards and successor agencies in the County of Los Angeles will be met in a timely manner, it is the intent of the Legislature to continue all oversight boards in the County of Los Angeles in existence until the respective successor agency requests dissolution of its oversight board and transfer of fiduciary duties to the countywide oversight board.\nSEC. 2.\nSection 34179 of the Health and Safety Code is amended to read:\n34179.\n(a) Each successor agency shall have an oversight board composed of seven members. The members shall elect one of their members as the chairperson and shall report the name of the chairperson and other members to the Department of Finance on or before May 1, 2012. Members shall be selected as follows:\n(1) One member appointed by the county board of supervisors.\n(2) One member appointed by the mayor for the city that formed the redevelopment agency.\n(3) (A) One member appointed by the largest special district, by property tax share, with territory in the territorial jurisdiction of the former redevelopment agency, that is of the type of special district that is eligible to receive property tax revenues pursuant to Section 34188.\n(B) On or after the effective date of this subparagraph, the county auditor-controller may determine which is the largest special district for purposes of this section.\n(4) One member appointed by the county superintendent of education to represent schools, if the superintendent is elected. If the county superintendent of education is appointed, then the appointment made pursuant to this paragraph shall be made by the county board of education.\n(5) One member appointed by the Chancellor of the California Community Colleges to represent community college districts in the county.\n(6) One member of the public appointed by the county board of supervisors.\n(7) One member representing the employees of the former redevelopment agency appointed by the mayor or chair of the board of supervisors from the recognized employee organization representing the largest number of former redevelopment agency employees employed by the successor agency at that time. If city or county employees performed administrative duties of the former redevelopment agency, the appointment shall be made from the recognized employee organization representing those employees. If a recognized employee organization does not exist for either the employees of the former redevelopment agency or the city or county employees performing administrative duties of the former redevelopment agency, the appointment shall be made from among the employees of the successor agency. In voting to approve a contract as an enforceable obligation, a member appointed pursuant to this paragraph shall not be deemed to be interested in the contract by virtue of being an employee of the successor agency or community for purposes of Section 1090 of the Government Code.\n(8) If the county or a joint powers agency formed the redevelopment agency, the largest city by acreage in the territorial jurisdiction of the former redevelopment agency may select one member. If there are no cities with territory in a project area of the redevelopment agency, the county superintendent of education may appoint an additional member to represent the public.\n(9) If there are no special districts of the type that are eligible to receive property tax pursuant to Section 34188 within the territorial jurisdiction of the former redevelopment agency, the county may appoint one member to represent the public.\n(10) If a redevelopment agency was formed by an entity that is both a charter city and a county, the oversight board shall be composed of seven members selected as follows: three members appointed by the mayor of the city, if that appointment is subject to confirmation by the county board of supervisors; one member appointed by the largest special district, by property tax share, with territory in the territorial jurisdiction of the former redevelopment agency, that is the type of special district that is eligible to receive property tax revenues pursuant to Section 34188; one member appointed by the county superintendent of education to represent schools; one member appointed by the Chancellor of the California Community Colleges to represent community college districts; and one member representing employees of the former redevelopment agency appointed by the mayor of the city, if that appointment is subject to confirmation by the county board of supervisors, to represent the largest number of former redevelopment agency employees employed by the successor agency at that time.\n(b) The Governor may appoint individuals to fill any oversight board member position described in subdivision (a) that has not been filled by May 15, 2012, or any member position that remains vacant for more than 60 days.\n(c) The oversight board may direct the staff of the successor agency to perform work in furtherance of the oversight board’s duties and responsibilities under this part. The successor agency shall pay for all of the costs of meetings of the oversight board and may include those costs in its administrative budget. Oversight board members shall serve without compensation or reimbursement for expenses.\n(d) Oversight board members are protected by the immunities applicable to public entities and public employees governed by Part 1 (commencing with Section 810) and Part 2 (commencing with Section 814) of Division 3.6 of Title 1 of the Government Code.\n(e) A majority of the total membership of the oversight board shall constitute a quorum for the transaction of business. A majority vote of the total membership of the oversight board is required for the oversight board to take action. The oversight board shall be deemed to be a local entity for purposes of the Ralph M. Brown Act, the California Public Records Act, and the Political Reform Act of 1974. All actions taken by the oversight board shall be adopted by resolution.\n(f) All notices required by law for proposed oversight board actions shall also be posted on the successor agency’s Internet Web site or the oversight board’s Internet Web site.\n(g) Each member of an oversight board shall serve at the pleasure of the entity that appointed that member.\n(h) The Department of Finance may review an oversight board action taken pursuant to this part. Written notice and information about all actions taken by an oversight board shall be provided to the department by electronic means and in a manner of the department’s choosing. An action shall become effective five business days after notice in the manner specified by the department is provided unless the department requests a review. Each oversight board shall designate an official to whom the department may make those requests and who shall provide the department with the telephone number and email contact information for the purpose of communicating with the department pursuant to this subdivision. Except as otherwise provided in this part, if the department requests a review of a given oversight board action, it shall have 40 days from the date of its request to approve the oversight board action or return it to the oversight board for reconsideration and the oversight board action shall not be effective until approved by the department. If the department returns the oversight board action to the oversight board for reconsideration, the oversight board shall resubmit the modified action for department approval and the modified oversight board action shall not become effective until approved by the department. If the department reviews a Recognized Obligation Payment Schedule, the department may eliminate or modify any item on that schedule prior to its approval. The county auditor-controller shall reflect the actions of the department in determining the amount of property tax revenues to allocate to the successor agency. The department shall provide notice to the successor agency and the county auditor-controller as to the reasons for its actions. To the extent that an oversight board continues to dispute a determination with the department, one or more future recognized obligation schedules may reflect any resolution of that dispute. The department may also agree to an amendment to a Recognized Obligation Payment Schedule to reflect a resolution of a disputed item, however, this shall not affect a past allocation of property tax or create a liability for any affected taxing entity.\n(i) Oversight boards shall have fiduciary responsibilities to holders of enforceable obligations and the taxing entities that benefit from distributions of property tax and other revenues pursuant to Section 34188. Further, the provisions of Division 4 (commencing with Section 1000)\nof Title 1\nof the Government Code shall apply to oversight boards. Notwithstanding Section 1099 of the Government Code, or any other law, any individual may simultaneously be appointed to up to five oversight boards and may hold an office in a city, county, city and county, special district, school district, or community college district.\n(j)\nCommencing\nExcept as specified in subdivision (q), commencing\non and after July 1, 2016, in each county where more than one oversight board was created by operation of the act adding this part, there shall be\nonly\none oversight board appointed as follows:\n(1) One member may be appointed by the county board of supervisors.\n(2) One member may be appointed by the city selection committee established pursuant to Section 50270 of the Government Code. In a city and county, the mayor may appoint one member.\n(3) One member may be appointed by the independent special district selection committee established pursuant to Section 56332 of the Government Code, for the types of special districts that are eligible to receive property tax revenues pursuant to Section 34188.\n(4) One member may be appointed by the county superintendent of education to represent schools if the superintendent is elected. If the county superintendent of education is appointed, then the appointment made pursuant to this paragraph shall be made by the county board of education.\n(5) One member may be appointed by the Chancellor of the California Community Colleges to represent community college districts in the county.\n(6) One member of the public may be appointed by the county board of supervisors.\n(7) One member may be appointed by the recognized employee organization representing the largest number of successor agency employees in the county.\n(k) The Governor may appoint individuals to fill any oversight board member position described in subdivision (j) that has not been filled by July 15, 2016, or any member position that remains vacant for more than 60 days.\n(l) Commencing on and after July 1, 2016, in each county where only one oversight board was created by operation of the act adding this part,\nthen\nthere will be no change to the composition of that oversight board as a result of the operation of subdivision (b).\n(m) Any oversight board for a given successor agency shall cease to exist when all of the indebtedness of the dissolved redevelopment agency has been repaid or a successor agency has dissolved the oversight board pursuant to subdivision (q).\n(n) An oversight board may direct a successor agency to provide legal or financial advice in addition to that provided by agency staff.\n(o) An oversight board is authorized to contract with the county or other public or private agencies for administrative support.\n(p) On matters within the purview of the oversight board, decisions made by the oversight board supersede those made by the successor agency or the staff of the successor agency.\n(q) Notwithstanding subdivision (j), an oversight board within the County of Los Angeles shall continue to independently operate until its successor agency adopts a resolution dissolving its oversight board and its oversight board approves that resolution, after which time the successor agency shall be overseen by the oversight board established pursuant to subdivision (j).\nSEC. 3.\nThe Legislature finds and declares that a special law is necessary and that a general law cannot be made applicable within the meaning of Section 16 of Article IV of the California Constitution because of the unique circumstances of the County of Los Angeles.',
'summary': 'Existing law dissolved redevelopment agencies and community development agencies as of February 1, 2012, and provides for the designation of successor agencies to wind down the affairs of the dissolved redevelopment agencies, subject to review by oversight boards, and to, among other things, make payments due for enforceable obligations and to perform obligations required pursuant to any enforceable obligation. Existing law authorizes, in each county where more than one oversight board was created, only one oversight board to be appointed on and after July 1, 2016.\nThis bill would require an oversight board within the County of Los Angeles to continue to independently operate past the July 1, 2016, consolidation date, until its successor agency adopts a resolution dissolving the board and the board approves that resolution, as provided.\nThis bill would make legislative findings and declarations as to the necessity of a special statute for the County of Los Angeles.',
'title': 'An act to amend Section 34179 of the Health and Safety Code, relating to redevelopment.'}
from transformers import AutoTokenizer
checkpoint="google-t5/t5-small"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# T5 모델을 사용하기 위해서는 프롬프트를 미리 정의해 주어야 함
# T5 모델의 특징: 입력 데이터 앞쪽에 prefix를 붙여 토큰화
prefix = "summarize long sentences"
# 전처리
def preprocess_function (examples):
# 요약 task는 원본 문장과 요약 후 문장의 max_length가 다르므로 별도로 토큰화
# list 안으로 각 문장 앞에 prefix 붙이는 작업
inputs = [prefix + doc for doc in examples["text"]]
model_inputs = tokenizer(
inputs
, max_length=1024
, truncation=True
)
# 요약글의 토큰화 도구 생성
# 요약글 == 정답 → 레이블을 토큰화할 때는 text_target 인수 이름에 대입
labels = tokenizer(
text_target=examples["summary"]
, max_length=128
, truncation=True
)
model_inputs["labels"] = labels["input_ids"]
return model_inputs
tokenized_billsum = billsum.map(preprocess_function, batched=True)
6교시 정리
- 핵심 개념
- 생성적 요약은 관련 정보를 바탕으로 새로운 텍스트를 생성하는 테스크이며 시퀀스 투 시퀀스(Seq2Seq) 모델을 사용함
- 핵심 단어
- 요약
- 추출적 요약
- 생성적 요약
- T5 모델
- ROUGE
- 요약
- 생성적 요약의 개념: 관련 정보를 바탕으로 '새로운 텍스트'를 생성
- BillSum 데이터셋은 일반화 성능 평가 및 모델의 실제 환경 적응도 확인에 적합
- T5 모델과 ROUGE 평가 지표