[Ensemble] 이론적 배경

임혜림·2024년 1월 3일

머신러닝

목록 보기

3/5

No Free Lunch Theorem

어떤 알고리즘도 모든 상황에서 다른 알고리즘보다 우월하다는 결론을 내릴 수 없다.
문제의 목적, 데이터 형태 등을 종합적으로 고려하여 최적의 알고리즘을 선택할 필요가 있다.
Do we need hundreds of classifiers to solve real world classification problems? 논문의 결론
121개의 공개 데이터셋에 대한 실험 결과 RandomForest와 SVM 계열이 상대적으로 분류 성능이 높게 나타남

"ensemble로 결합된 모델은 단일 모델만 사용하는 것보다 성능이 높아진다"는 것이 실험적으로 나왔다.

Goal: Reduce the error through constructing multiple learners to
- Reduce the variance: Bagging, Random Forests
- Reduce the bias: AdaBoost
- Both: Mixture of experts
Two key questions on the ensemble construction
- Q1: How to generate individual components of the ensemble systems to achieve sufficient degree of diversity?: 어떻게 다른 모형을 만들까: 이게 더 중요
- Q2: How to cobine the outputs of individual classifiers?: 어떻게 잘 결합할 것인가
Ensemble Diversity
- Ensemble will have no gain from combining a set of identical models: 완벽히 동일한 모델을 결합하는건 의미없다.
- a certain element of diversity + retaining good performance individually
- Independent(implict): 병렬처리 가능, 수행시간 길다. vs Model guided(explict): 순차처리만 가능, 수행시간 빠르다.

hello world