Encoding/Feature Selection

kobeisfree94·2022년 6월 13일

One-Hot Encoding

Categorical/Qualitative Data:

  • Nominal -- has no order
  • Ordinal -- has order

One-Hot Encoding is encoding the categorical/qualitative data into numerical numbers.

Cardinality- the number of elements in the set

When One-Hot Encoding is performed, it is applied to all features are added into the categories within the variable. Therefore, if the number of categories are too many, it has high cardinality, and thus inappropriate for use.

!pip install category_encoders ## OneHotEncoder 가져오기
from category_encoders import OneHotEncoder

encoder = OneHotEncoder(use_cat_names = True)
X_train = encoder.fit_transform(X_train)
X_test = encoder.transform(X_test)

Feature Selection

Feature Engineering is the process in which the programmer engineers a new feature that is best fit for use (for the given project).

  • Dimension Reduction - can mean feature extraction in the short sense, and feature selection in the wider sense.
    - Feature Selection - select feature best for use.
    - Feature Extraction - not a mere selection, but rather creates a new feature by combination of certain features (ex. PCA).
    - Scaling - Transforming the parameters in order to reveal the correlation/relationship between variables (ex. applying the square root to the entire value)
    - Transformation - Utilizing the characteristics of existing variables to create different variables.
    - Binning - Transforming continuous variables into categorical variables.
    - Dummy - Transforming categorical variables into continuous variables.
a Philosopher aspiring to become an AI/ML/DL Engineer and Data Scientist.

0개의 댓글