시리즈

[Coursera] How to win a data science competition

1.[Coursera]How to win a data science competition - 1주차 1강

(1) Data(2) Model(3) Submission(4) Evaluation(5) LeaderboardData is what the organizers give us as training materialcsv, txt, archive with pictures, d

2021년 7월 5일

2.[Coursera]How to win a data science competition - 1주차 2강

공간을 두 부분으로 나누는 선 찾기2차원인 경우 두 부분으로 나누는 선을 찾는 것은 매우 직관적이러한 접근법이 고차원의 공간으로 일반화될 수 있는데, 이것이 선형 모델의 주요 아이디어Example : Logestic Regression, Support Vector Ma

2021년 7월 6일

3.[Coursera]How to win a data science competition - 1주차 3강

Feature Preprocessing : 데이터의 전처리Feature Generation : 특징 생성Their dependence on a model type : 전처리와 특징 생성 모두 사용할 모델에 따라 달라짐.numericcategoricalordinaldat

2021년 7월 6일

4.[Coursera]How to win a data science competition - 2주차 1강

(1) Getting domain knowledge일반적으로 주어지는 주제에 대해서 도메인 지식을 가지지 않음너무 깊이 있는 지식은 필요 없으나, 우리의 목표, 데이터, 사람들이 어떻게 문제를 다루며 기준을 만드는지에 대한 이해는 필요위키피디아에 검색, 구글링(2) C

2021년 7월 6일

5.[Coursera]How to win a data science competition - 2주차 2강

(1) Private Leaderboardpublic과 private 순위가 바뀌는 경우가 존재1) 경쟁자가 validation을 무시하고 public leaderboard에서 가장 좋은 결과물을 제출2) 경쟁자가 public, private 데이터를 일치시키지 않은

2021년 7월 7일

6.[Coursera]How to win a data science competition - 2주차 3강

의도하지 않은 실수나 사고의 결과(1) Leaks in time seriessplit should be done on timeEven when split by time, features may contain information about future(2) Unexpe

2021년 7월 7일

7.[Coursera] How to win a data science competition - 3주차 1강

텍스트

2021년 7월 8일

8.[Coursera] How to win a data science competition - 3주차 2강

1) why does it worklabel encoding gives random order, no correlation with targetmean encoding helps to seperate zero from onesreaching a better loss w

2021년 7월 8일

9.[Coursera] How to win a data science competition - 4주차 1강

1) Select the most influential parameters2) Understand, how exactly they influence the training3) Tune them! by manually, or automatically: Hyperopt1)

2021년 7월 9일

10.[Coursera] How to win a data science competition - 4주차 2강

lightGBM, XGBoost 등과 같은 알고리즘과 비교하였을 때 lightGBM이 튜닝되지 않은 Catboost 보다 좋은 경우를 제외하고, 만약 매개 변수를 튜닝하는 경우 데이터 세트의 품질 측면에서 다른 모든 라이브러리를 능가한다.Problems(1) Categ

2021년 7월 21일

11.[Coursera] How to win a data science competition - 4주차 3강

1. Ensemble Method 더 강력한 예측을 얻기 위해 다양한 기계 학습 모델을 조합하는 것. 간단한 평균의 방법부터 시작하여 여러 가중 평균의 방법 존재 2. Bagging Means averaging slightly different versions of

2021년 7월 26일