빅데이터분석기사 예시 실기 문제 풀이

hyereen·2021년 11월 18일
0

Today I Learned

목록 보기
2/6

예시 문제 출처: 한국데이터산업진흥원 공지사항 https://www.dataq.or.kr/www/board/view.do

작업형 1

데이터셋의 qsec에 최소최대 척도로 변환한 후 0.5보타 큰 값을 가지는 레코드 수 구하기

  • 코드
# 데이터 파일 읽기 예제
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# 데이터 불러오기
df = pd.read_csv('./mtcars.csv', index_col=0)

# 최소최대 척도 변환
scaler = MinMaxScaler()
df[['qsec']] = scaler.fit_transform(df[['qsec']])

# 답안
print(len(df[df['qsec'] > 0.5]))

  • 실행 결과
9

작업형 2

  • 코드
# 데이터 파일 읽기 예제
import pandas as pd
x_test = pd.read_csv("./X_test.csv", encoding="cp949")
x_train = pd.read_csv("./X_train.csv", encoding="cp949")
y_train = pd.read_csv("./y_train.csv", encoding="cp949")

# 데이터 탐색
#print(x_test.head())
#print(x_train.head())
#print(y_train.head())
#print(x_test.describe)
#print(x_train.describe)

# 결측치
# print(x_train.isnull().sum())
# print(x_test.isnull().sum())

# 결측치 채우기
x_train.fillna(0, inplace=True)
x_test.fillna(0, inplace=True)

#print(x_train.isnull().sum())
#print(x_test.isnull().sum())

# 원핫인코딩 주구매상품, 주구매지점
item = pd.get_dummies(x_train['주구매상품'], prefix='주구매상품')
store = pd.get_dummies(x_train['주구매지점'], prefix='주구매지점')
x_train = pd.concat([x_train, item, store], axis=1)
x_train.drop(['주구매상품', '주구매지점'], axis=1, inplace=True)

item = pd.get_dummies(x_test['주구매상품'], prefix='주구매상품')
store = pd.get_dummies(x_test['주구매지점'], prefix='주구매지점')
x_test = pd.concat([x_test, item, store], axis=1)
x_test.drop(['주구매상품', '주구매지점'], axis=1, inplace=True)

# train에만 있는 '주구매상품_소형가전' 삭제
x_train.drop(['주구매상품_소형가전'], axis=1, inplace=True)

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(x_train)
x_train_sc = sc.transform(x_train)
x_test_sc = sc.transform(x_test)

from sklearn.linear_model import LogisticRegression
 
model = LogisticRegression()
model.fit(x_train, y_train)
print('Logistic socre: ',model.score(x_train, y_train))

# Knn
from sklearn.neighbors import KNeighborsClassifier
 
model = KNeighborsClassifier(n_neighbors=4, metric='euclidean')
model.fit(x_train, y_train)
print('KNN socre: ', model.score(x_train, y_train))

# XGB
from xgboost import XGBClassifier
model = XGBClassifier()
model.fit(x_train, y_train)
print('XGB score:', model.score(x_train,y_train))

# DT
from sklearn.tree import DecisionTreeClassifier
 
model = DecisionTreeClassifier(random_state=1, max_depth=10)
model.fit(x_train, y_train)
print('DTree score: ', model.score(x_train,y_train))

# RF
from sklearn.ensemble import RandomForestClassifier
 
model = RandomForestClassifier(max_depth=10, n_estimators=100)
model.fit(x_train, y_train)
print('RF score: ', model.score(x_train,y_train))

predict = model.predict_proba(x_test)
output = pd.DataFrame({'cust_id':x_test_id, 'gender':predict[:,0]})
output.to_csv('1234.csv', index=False)
  • 실행 결과
Logistic socre:  0.6237142857142857
KNN socre:  0.624
XGB score: 0.7114285714285714
DTree score:  0.7222857142857143
RF score:  0.7605714285714286

출처
https://5ohyun.tistory.com/108
https://hobby-weighted.tistory.com/156
https://blog.naver.com/PostView.naver?blogId=da0097&logNo=222390408292&categoryNo=0&parentCategoryNo=0

https://yogyui.tistory.com/entry/%EB%B9%85%EB%8D%B0%EC%9D%B4%ED%84%B0%EB%B6%84%EC%84%9D%EA%B8%B0%EC%82%AC-%EC%8B%A4%EA%B8%B0-%EC%98%88%EC%8B%9C%EB%AC%B8%EC%A0%9C-%EC%9C%A0%ED%98%95-%EB%B6%84%EC%84%9D?category=972551

profile
안녕하세요. 피드백은 언제나 감사합니다.

0개의 댓글