Data Understanding (GitHub)

Hyungseop Lee·2023년 4월 20일

0

Dataset

Instance == object, record, sample, entity, observation
Attribute == characteristic, field, feature, dimension

Data Categorization

Numerical : Made of numbers
1. Continuous : Infinite options (Age, weight, blood pressure)
2. Discrete : Finite options (Shoe size, number of children)
Categorical : Made of words
1. Ordinal : Data has a hierachy (Satisfaction rating, mood)
2. Nominal : Data has no hierachy (Eye color, blood type)

Data Types

Record Data
- The most widely used data type
- Consist of a collection of records
- Each record is compsed of a fixed number of attributes
Transaction data
- Consist of a buyer and a list of purchased items dataset
Graph-based data

Data Preprocessing

An important step in the data mining process
Contain the cleaning, transforming, and integrating of data in order to analysis
(goal) Improve the quality of data and make the data suitable for the specific task
- Garbage in, garbage out... (Importance of data preprocessing)

Data Cleaning : Remove noise and correct inconsistencies in data
Data integration : Merge data from multiple sources into a coherent data store such as a data warehouse
Data transformation / Discretization : Data are scaled to fall within a smaller range like 0 ~ 1 (Normalization)

Data reduction : Reduce data size by aggregating, eliminating redundant features, or clustering

(Jupyter Notebook) 헷갈린 부분

Efficient Deep Learning

이전 포스트

Pandas Library for Data Science (GitHub)

다음 포스트

[Chap 5] 딥러닝 최적화

0개의 댓글

관련 채용 정보