# Logistic Regression

jesy0412·2021년 6월 18일
0

목록 보기
5/11

# 1. 데이터 탐색

R 기본 데이터인 ‘Iris’ 데이터를 활용하여 Logistic Regression 분류기를 사용하기에 앞서 데이터의 기본적인 구조를 탐색한다.
Target Variable은 Species로 설정하고, setosa와 nonsetosa로 나눈다.

iris$Species <- as.character(iris$Species)
iris$Species[iris$Species !="setosa"] <- "non setosa"
iris$Species <- as.factor(iris$Species)

# 2. Logistic Regression 모델적합, error rate

set.seed(150)
train_sample = sample(150, 100)
str(train_sample)

iris_train = iris[train_sample, ]
iris_test  = iris[-train_sample, ]

prop.table(table(iris_train$Species)) prop.table(table(iris_test$Species))

# Logistic Regression 모델적합

iris_model_logistic = glm(Species~., family=binomial, data=iris_train)
summary(iris_model_logistic)
iris_model_logistic_step = step(iris_model_logistic)
summary(iris_model_logistic_step)

iris_logistic_pred = predict(iris_model_logistic, iris_test,type="response")

iris_logistic_tmp = predict(iris_model_logistic, iris_test)
exp(iris_logistic_tmp)/(1+exp(iris_logistic_tmp))

iris_logistic_pred = ifelse(iris_logistic_pred>0.5,"yes","no")

#Test error 계산
CrossTable(iris_test$Species, iris_logistic_pred, prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE, dnn = c('actual Species', 'predicted Species')) # accuracy : 1.0, error rate = 0 # Logistic Regression ROC, AUC  Logistic Regression을 수행한 결과는, accuracy = 1.0, error rate = 0이다. # 3. ROC, AUC library(ROCR) iris_logistic_pred = predict(iris_model_logistic, iris_test,type="response") pred = prediction(iris_logistic_pred, iris_test$Species)
irislg_roc = performance(pred, measure = 'tpr', x.measure = 'fpr')
plot(irislg_roc, col='red', lty = 1, lwd = 3, main = 'ROC curve')
iris_auc <- performance(pred, measure = "auc")
unlist(iris_auc@y.values)

Logistic Regression model의 Error rate는 0으로, ROC 커브는 위와 같이 직각형태로 나타나며, AUC =1이다.

잡학꾸러기