Cost Function
회귀를 통해 이해하는 Cost Function
만약 주택의 넓이와 가격이라는 데이터가 있고 주택가격을 예측한다면
![](https://velog.velcdn.com/images/tim0902/post/0ccbf369-9a8f-4810-b646-62ab0bb95a62/image.png)
머신러닝 모델 만들기
![](https://velog.velcdn.com/images/tim0902/post/7b2c0427-9a76-407d-81b4-1e3f0c66aa49/image.png)
![](https://velog.velcdn.com/images/tim0902/post/13a5a80a-dd76-494c-aaa0-d0ef26164c53/image.png)
만약 1차 함수라면
![](https://velog.velcdn.com/images/tim0902/post/37c3acf0-81f0-4990-a3a4-6c19e0139952/image.png)
선형회귀
![](https://velog.velcdn.com/images/tim0902/post/c4d1b11c-3093-4c95-9636-ee8188239788/image.png)
모델을 구성하는 파라미터
![](https://velog.velcdn.com/images/tim0902/post/c2f504fc-0c45-4912-aed3-e647be3296f1/image.png)
![](https://velog.velcdn.com/images/tim0902/post/f9182758-4de4-4420-8eee-770d17fb1b0f/image.png)
![](https://velog.velcdn.com/images/tim0902/post/0e175fd7-dc83-4171-ba03-2758c28a5f55/image.png)
![](https://velog.velcdn.com/images/tim0902/post/687ab992-a0ca-41c7-ba70-735cab203ddf/image.png)
![](https://velog.velcdn.com/images/tim0902/post/019fd189-5c3e-4f6c-8c15-2d16e515be59/image.png)
![](https://velog.velcdn.com/images/tim0902/post/311cbf96-08de-4626-ad3c-617a5eeae735/image.png)
![](https://velog.velcdn.com/images/tim0902/post/cefcaab3-3826-498a-92eb-ea36082cb3b5/image.png)
![](https://velog.velcdn.com/images/tim0902/post/9ae42cba-c91d-4c88-be5c-0d5cc320ad39/image.png)
![](https://velog.velcdn.com/images/tim0902/post/ce52dbae-a4f9-45fd-918b-53bc0433a024/image.png)
실제 데이터(점 세개)와 구해야 할 모델 (h)
![](https://velog.velcdn.com/images/tim0902/post/14f2ffdc-3165-4bdd-b462-65373030ebaa/image.png)
먼저 각각의 에러를 구함
![](https://velog.velcdn.com/images/tim0902/post/5812abcc-89ef-4ec9-855c-c56813ebd15d/image.png)
각각의 에러를 제곱하고 평균을 구함
![](https://velog.velcdn.com/images/tim0902/post/dcc25e9d-1f97-4d6c-8293-14a0d64b663c/image.png)
![](https://velog.velcdn.com/images/tim0902/post/d3434526-5e04-4cfa-9ee0-6e0af665c0e3/image.png)
Cost Function
![](https://velog.velcdn.com/images/tim0902/post/e060527a-79ac-4237-95b4-ac7e418e6d1d/image.png)
Cost Fnc을 최소화할 수 있다면 최적의 직선을 찾을수 있다
![](https://velog.velcdn.com/images/tim0902/post/314f899f-c3d1-4ca9-894c-419e7e34352b/image.png)
계산해 보면
![](https://velog.velcdn.com/images/tim0902/post/12a70b1f-468f-4ad9-aa9d-cdd223034fde/image.png)
J를 최소로 만들기
![](https://velog.velcdn.com/images/tim0902/post/14700501-f2df-4ff2-8a18-2dc6dbaaa776/image.png)
![](https://velog.velcdn.com/images/tim0902/post/22d6f328-4036-464f-a15b-1865651e7721/image.png)
최솟값 찾기
![](https://velog.velcdn.com/images/tim0902/post/2d3e8cc2-0f56-41cf-8e61-80c294cf38f2/image.png)
최솟값 지점 구하기
![](https://velog.velcdn.com/images/tim0902/post/2645c9b4-a453-4f32-9ec8-37dd47c1a608/image.png)
import sympy as sym
theta = sym.Symbol('theta')
diff_th = sym.diff(38*theta**2 - 94*theta + 62, theta)
diff_th
![](https://velog.velcdn.com/images/tim0902/post/0e07f5ca-ab4e-4150-b627-a239fbf36d1d/image.png)
![](https://velog.velcdn.com/images/tim0902/post/32d503d2-d5f3-4455-8079-215f245cdc57/image.png)
![](https://velog.velcdn.com/images/tim0902/post/dd7ae6a2-a3b2-46de-9fd6-36d945374148/image.png)
Cost Fnc - 데이터와 모델이 완전 일치하면
![](https://velog.velcdn.com/images/tim0902/post/852130cc-48bd-4c29-9f3d-ccdd47fbbd76/image.png)
Cost Fnc - 조금 빗나가면
![](https://velog.velcdn.com/images/tim0902/post/54778ffd-315a-48ca-b4c6-70796e9c6f07/image.png)
Cost Fnc - 더 빗나가면
![](https://velog.velcdn.com/images/tim0902/post/fdb9d314-b57f-4af6-ba62-c85675cd0aa6/image.png)
![](https://velog.velcdn.com/images/tim0902/post/e17340a0-63b4-415e-b5cd-379e7505a17d/image.png)
![](https://velog.velcdn.com/images/tim0902/post/dfa9cb8b-483c-4646-9248-5bde6505554a/image.png)
![](https://velog.velcdn.com/images/tim0902/post/f81e0fdb-5807-428e-89c8-d954aaa2cf38/image.png)
Gradient Descent
랜덤하게 임의의 점 선택
![](https://velog.velcdn.com/images/tim0902/post/9d8548c2-fa1d-4bb6-9840-2352b882f373/image.png)
임의의 점에서 미분(or 편미분)값을 계산해서 업데이트
![](https://velog.velcdn.com/images/tim0902/post/1af67c34-f070-48a9-80b6-cad4d8e99b5e/image.png)
목표점의 오른쪽이라면
![](https://velog.velcdn.com/images/tim0902/post/f5458109-6582-4c1b-8d7e-991b71d77b2f/image.png)
목표점의 왼쪽이라면
![](https://velog.velcdn.com/images/tim0902/post/c7d25044-7954-4f48-8f98-bd4b48dd92c2/image.png)
학습률 Learning Rate
![](https://velog.velcdn.com/images/tim0902/post/9f698729-2b1a-4178-b19a-4d711c6c12d5/image.png)
학습률이 작다면
![](https://velog.velcdn.com/images/tim0902/post/3f745bfa-3162-41af-99f2-d6e2fcf9817d/image.png)
학습률이 크다면
![](https://velog.velcdn.com/images/tim0902/post/26d40caf-5b69-4ffb-8ecf-6b80a97bf198/image.png)
다변수 데이터에 대한 회귀
여러개의 특성 (feature)
![](https://velog.velcdn.com/images/tim0902/post/0e2d8944-04a5-4947-bae7-dab7ceaa4d42/image.png)
행렬식으로 표현
![](https://velog.velcdn.com/images/tim0902/post/3642c65e-9f69-49fe-accb-3e101e52c4b4/image.png)
Boston 집값 예측
Boston 집 가격 데이터
![](https://velog.velcdn.com/images/tim0902/post/6b0ebd1d-7c9f-498b-afab-1806de473770/image.png)
데이터 읽기
![](https://velog.velcdn.com/images/tim0902/post/1eb35290-8df8-4845-a70d-79910323b07f/image.png)
각 특성의 의미
![](https://velog.velcdn.com/images/tim0902/post/42c19e5c-7673-4ce1-a92e-7d365f004d58/image.png)
Pandas로 정리하여 데이터 파악
- PRICE 컬럼은 Label이므로 이후 과정에서 잘 다루어야 한다
![](https://velog.velcdn.com/images/tim0902/post/ca5197ac-aba1-45a7-8552-8d41c273d7f7/image.png)
Price에 대한 histogram
![](https://velog.velcdn.com/images/tim0902/post/94cb47ac-5628-494b-b352-2d1944f5e003/image.png)
집값에 대한 히스토그램
![](https://velog.velcdn.com/images/tim0902/post/0390fbd5-e48b-47f9-a2a3-3805dc0dad4c/image.png)
각 특성별 상관계수 확인
![](https://velog.velcdn.com/images/tim0902/post/93c0f001-f55a-4cba-bfde-97ed9bf507e5/image.png)
![](https://velog.velcdn.com/images/tim0902/post/bb337c5d-05d7-4a91-b862-8e5c4922de14/image.png)
RM과 LSTAT와 PRICE의 관계에 대해 좀 더 관찰
![](https://velog.velcdn.com/images/tim0902/post/b6eb4ef0-5f51-4ccc-81d2-4d3afcdf6003/image.png)
저소득층 인구가 낮을수록, 방의 개수가 많을 수록 집 값이 높아자는걸까?
![](https://velog.velcdn.com/images/tim0902/post/a3b4f20c-c0c8-4219-8c95-29e94841b64c/image.png)
데이터를 나누기
![](https://velog.velcdn.com/images/tim0902/post/5a03ba64-e105-4ecf-9e6f-61efc0544670/image.png)
LinearRegression으로
![](https://velog.velcdn.com/images/tim0902/post/2ab6a4e1-3952-47d5-9a0b-ab2005edc5cd/image.png)
모델 평가는 RMS로
![](https://velog.velcdn.com/images/tim0902/post/1434c54f-851f-4d80-829a-9130a985f132/image.png)
성능 확인
![](https://velog.velcdn.com/images/tim0902/post/d516b74c-fe5d-4ea6-93d6-ef1d476316d2/image.png)
결과
![](https://velog.velcdn.com/images/tim0902/post/398f19b2-86e6-4e2f-918d-b233d150d2d6/image.png)
LSTAT 사용의 정확성
![](https://velog.velcdn.com/images/tim0902/post/76269bcc-789d-4b1d-8889-3400cd361510/image.png)
성능의 저하
![](https://velog.velcdn.com/images/tim0902/post/ed203261-e1ee-45f4-9807-46110ccf95dc/image.png)
![](https://velog.velcdn.com/images/tim0902/post/cadffefa-70a3-492c-8581-f957fd40d2dd/image.png)