A process of reordering and restructuring data in a manner that is fit for analysis. An essential process that helps the user understand the data he/she is working with. Also useful in setting hypotheses as well as providing visual aid about the data at hand.
Main Processes:
1. Data Visualization -- Finding Patterns
2. Checking for Anomalies -- missing values, duplicates etc.
3. Setting Hypotheses -- using statistics and graphic aid.
Two Types of EDA:
1. Graphical - checks/explains data using charts or figures
2. Non-Graphical - checks/explains data using summary statistics
The Object of EDA:
Univariate - 1 variable
Multivariate - multiple variables
The Cycle of Data Preprocessing:
1. Data Cleaning - Handling Missing Values, Noise, Anomalies etc.
2. Data Merging - Combining Data
3. Data Transformation - Scaling. Converting from one format to another
4. Dimensionality Reduction - Picking out the most meaningful data (ex. Principle Component Analysis)