Data Understanding (GitHub)

Hyungseop Lee·2023년 4월 20일
0

Dataset

  • Instance == object, record, sample, entity, observation
  • Attribute == characteristic, field, feature, dimension

Data Categorization

  • Numerical : Made of numbers
    1. Continuous : Infinite options (Age, weight, blood pressure)
    2. Discrete : Finite options (Shoe size, number of children)
  • Categorical : Made of words
    1. Ordinal : Data has a hierachy (Satisfaction rating, mood)
    2. Nominal : Data has no hierachy (Eye color, blood type)

Data Types

  1. Record Data

    • The most widely used data type
    • Consist of a collection of records
    • Each record is compsed of a fixed number of attributes
  2. Transaction data

    • Consist of a buyer and a list of purchased items dataset
  3. Graph-based data


Data Preprocessing

  • An important step in the data mining process
  • Contain the cleaning, transforming, and integrating of data in order to analysis
  • (goal) Improve the quality of data and make the data suitable for the specific task
    • Garbage in, garbage out... (Importance of data preprocessing)
  1. Data Cleaning : Remove noise and correct inconsistencies in data

  2. Data integration : Merge data from multiple sources into a coherent data store such as a data warehouse

  3. Data transformation / Discretization : Data are scaled to fall within a smaller range like 0 ~ 1 (Normalization)

  1. Data reduction : Reduce data size by aggregating, eliminating redundant features, or clustering

(Jupyter Notebook) 헷갈린 부분

  1. Data Understanding Jupyter Notebook
  2. Data Understanding Jupyter Notebook

profile
Efficient Deep Learning Model

0개의 댓글