Logistic Regression

Jesy·2021년 6월 18일
0

1. 데이터 탐색

R 기본 데이터인 ‘Iris’ 데이터를 활용하여 Logistic Regression 분류기를 사용하기에 앞서 데이터의 기본적인 구조를 탐색한다.
Target Variable은 Species로 설정하고, setosa와 nonsetosa로 나눈다.

iris$Species <- as.character(iris$Species)
iris$Species[iris$Species !="setosa"] <- "non setosa"
iris$Species <- as.factor(iris$Species)

2. Logistic Regression 모델적합, error rate

set.seed(150) 
train_sample = sample(150, 100)
str(train_sample)

iris_train = iris[train_sample, ]
iris_test  = iris[-train_sample, ]

prop.table(table(iris_train$Species))
prop.table(table(iris_test$Species))


# Logistic Regression 모델적합

iris_model_logistic = glm(Species~., family=binomial, data=iris_train)
summary(iris_model_logistic)
iris_model_logistic_step = step(iris_model_logistic)
summary(iris_model_logistic_step)


iris_logistic_pred = predict(iris_model_logistic, iris_test,type="response")

iris_logistic_tmp = predict(iris_model_logistic, iris_test)
exp(iris_logistic_tmp)/(1+exp(iris_logistic_tmp))

iris_logistic_pred = ifelse(iris_logistic_pred>0.5,"yes","no")

#Test error 계산
CrossTable(iris_test$Species, iris_logistic_pred,
           prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE,
           dnn = c('actual Species', 'predicted Species'))

# accuracy : 1.0, error rate = 0

# Logistic Regression ROC, AUC


Logistic Regression을 수행한 결과는, accuracy = 1.0, error rate = 0이다.

3. ROC, AUC

library(ROCR)

iris_logistic_pred = predict(iris_model_logistic, iris_test,type="response")
pred = prediction(iris_logistic_pred, iris_test$Species)
irislg_roc = performance(pred, measure = 'tpr', x.measure = 'fpr')
plot(irislg_roc, col='red', lty = 1, lwd = 3, main = 'ROC curve')
iris_auc <- performance(pred, measure = "auc")
unlist(iris_auc@y.values)


Logistic Regression model의 Error rate는 0으로, ROC 커브는 위와 같이 직각형태로 나타나며, AUC =1이다.

profile
잡학꾸러기

0개의 댓글