

train.head()
| user_id | subscription_duration | recent_login_time | average_login_time | average_time_per_learning_session | monthly_active_learning_days | total_completed_courses | recent_learning_achievement | abandoned_learning_sessions | community_engagement_level | preferred_difficulty_level | subscription_type | customer_inquiry_history | payment_pattern | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | b919c29d | 13 | 14 | 14.946163 | 8.427187 | 18 | 16 | 68.360455 | 3 | 4 | Low | Basic | 4 | 5 | 0 |
| 1 | a0a60abb | 16 | 18 | 18.453224 | 72.646087 | 16 | 13 | 97.567322 | 2 | 3 | Medium | Basic | 1 | 6 | 1 |
| 2 | b9f171ae | 22 | 1 | 16.195228 | 21.774492 | 13 | 14 | 94.358763 | 3 | 4 | Medium | Premium | 0 | 7 | 1 |
| 3 | 5dc0ba8b | 1 | 19 | 17.628656 | 42.659066 | 19 | 18 | 70.153228 | 0 | 3 | Low | Basic | 1 | 0 | 1 |
| 4 | 65c83654 | 4 | 5 | 21.390656 | 30.744287 | 19 | 10 | 81.917908 | 2 | 4 | Medium | Basic | 3 | 0 | 1 |
train.info()

1만개 데이터, null값 없음, object 유형 2개, 나머지는 수치형 변수
train.target.value_counts()

계속 구독하는 경우 6199건, 구독 해지하는 경우 3801건
columns = [
'subscription_duration', 'recent_login_time', 'average_login_time',
'average_time_per_learning_session', 'monthly_active_learning_days',
'total_completed_courses', 'recent_learning_achievement', 'abandoned_learning_sessions',
'community_engagement_level', 'customer_inquiry_history'
]
train[columns].describe()
| subscription_duration | recent_login_time | average_login_time | average_time_per_learning_session | monthly_active_learning_days | total_completed_courses | recent_learning_achievement | abandoned_learning_sessions | community_engagement_level | |
|---|---|---|---|---|---|---|---|---|---|
| count | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 |
| mean | 11.897400 | 15.013200 | 14.994076 | 54.917720 | 12.545400 | 12.227500 | 75.029513 | 3.043600 | 3.886100 |
| std | 6.600896 | 8.362573 | 3.001869 | 56.024310 | 6.932239 | 3.634125 | 9.968529 | 1.755052 | 1.262175 |
| min | 1.000000 | 1.000000 | 2.366189 | 0.011515 | 1.000000 | 1.000000 | 35.941755 | 0.000000 | 1.000000 |
| 25% | 6.000000 | 8.000000 | 13.025597 | 15.276611 | 7.000000 | 10.000000 | 68.278054 | 2.000000 | 3.000000 |
| 50% | 12.000000 | 15.000000 | 14.979228 | 37.578818 | 13.000000 | 12.000000 | 75.126061 | 3.000000 | 4.000000 |
| 75% | 18.000000 | 22.000000 | 16.995340 | 75.584200 | 19.000000 | 15.000000 | 81.718976 | 4.000000 | 5.000000 |
| max | 23.000000 | 29.000000 | 26.998490 | 503.372616 | 24.000000 | 27.000000 | 112.643828 | 12.000000 | 5.000000 |
average_time_per_learning_session에 대한 예시
평균값 : 54.92, 중앙값 : 27.58 로 오른쪽으로 긴 꼬리를 가진 분포 예상됨
해결방안)