- Nominal -- has no order
- Ordinal -- has order
One-Hot Encoding is encoding the categorical/qualitative data into numerical numbers.
Cardinality- the number of elements in the set
When One-Hot Encoding is performed, it is applied to all features are added into the categories within the variable. Therefore, if the number of categories are too many, it has high cardinality, and thus inappropriate for use.
!pip install category_encoders ## OneHotEncoder 가져오기
from category_encoders import OneHotEncoder
encoder = OneHotEncoder(use_cat_names = True)
X_train = encoder.fit_transform(X_train)
X_test = encoder.transform(X_test)
Feature Engineering is the process in which the programmer engineers a new feature that is best fit for use (for the given project).
- Dimension Reduction - can mean feature extraction in the short sense, and feature selection in the wider sense.
- Feature Selection - select feature best for use.
- Feature Extraction - not a mere selection, but rather creates a new feature by combination of certain features (ex. PCA).
- Scaling - Transforming the parameters in order to reveal the correlation/relationship between variables (ex. applying the square root to the entire value)
- Transformation - Utilizing the characteristics of existing variables to create different variables.
- Binning - Transforming continuous variables into categorical variables.
- Dummy - Transforming categorical variables into continuous variables.