Data Understanding (GitHub)

Hyungseop Lee·2023년 4월 20일
0

Dataset

  • Instance == object, record, sample, entity, observation
  • Attribute == characteristic, field, feature, dimension

Data Categorization

  • Numerical : Made of numbers
    1. Continuous : Infinite options (Age, weight, blood pressure)
    2. Discrete : Finite options (Shoe size, number of children)
  • Categorical : Made of words
    1. Ordinal : Data has a hierachy (Satisfaction rating, mood)
    2. Nominal : Data has no hierachy (Eye color, blood type)

Data Types

  1. Record Data

    • The most widely used data type
    • Consist of a collection of records
    • Each record is compsed of a fixed number of attributes
  2. Transaction data

    • Consist of a buyer and a list of purchased items dataset
  3. Graph-based data


Data Preprocessing

  • An important step in the data mining process
  • Contain the cleaning, transforming, and integrating of data in order to analysis
  • (goal) Improve the quality of data and make the data suitable for the specific task
    • Garbage in, garbage out... (Importance of data preprocessing)
  1. Data Cleaning : Remove noise and correct inconsistencies in data

  2. Data integration : Merge data from multiple sources into a coherent data store such as a data warehouse

  3. Data transformation / Discretization : Data are scaled to fall within a smaller range like 0 ~ 1 (Normalization)

  1. Data reduction : Reduce data size by aggregating, eliminating redundant features, or clustering

(Jupyter Notebook) 헷갈린 부분

  1. Data Understanding Jupyter Notebook
  2. Data Understanding Jupyter Notebook

profile
model compression

0개의 댓글