Regression Analysis
Linear Regression
Regression Assumptions
- No Multicollinearity
- Variance Inflation Factor (VIF) < 10
- Homoskedasticity
- Linearity : Residual Distribution
- Breusch-Pagan Test : if p-value > 0.05, samples have homoskedasticity.
- Correction:
sm.OLS(y, x).fit(cov_type="HC3")
- Normality of Error
- QQ Plot
- Normality Tests: if p-value > 0.05, samples are normally distributed.
(Kolmogorov-Smirnov, Shapiro-Wilk, Jarque-Bera etc.)
Non-Linear Regression
- Logistic Regression
- Probit Regression
Machine Learning
Decision Tree
Confusion Matrix
| Predicted (y=1) | Predicted (y=0) |
---|
True (y=1) | True Positive | False Negative (Type II Error) |
False (y=0) | False Positive (Type I Error) | True Negative |
- Accuracy = (TP + TN) / Total
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- F1 Score = 2 (Precision x Recall) / (Precision + Recall)
Random Forest
- Ensemble learning method: a multitude of decision trees
- Data Preprocessing (Encoding, Categorizing, Normalizing, Scaling)
- Balancing Dataset (Up/Down Sampling)
- Defining Variables (Dependent/Independent)
- Modeling (Supervised Learning) & Cross Validation
- Evaluation (Accuracy Scores, Feature Importances)
Neural Networks
MLPClassifier(activation='relu', hidden_layer_sizes=10, max_iter=100)
Support Vector Machine
- Linear SVM
SVC(kernel='linear')
- Non-linear SVM
- Kernel: Polynomial(
'poly'
), Gaussian: Radial Basis Fuction('rbf'
), Sigmoid('sigmoid'
)
Naive Bayes
GaussianNB()
K-Nearest Neighbor
KNeighborsClassifier(n_neighbors=10)