1. Why data mining? 1.1 Challenge > we are drowning in data, but starving for knowledge the key problem is not collecting data, but extracting meanin
1. Pattern Discovery in Data 데이터에서 규칙성을 찾는 것은 고전 역학부터 양자 역학에 이르기까지 과학 발전의 근간이 되어온 중요한 문제이다. 1.1 Pattern Recognition automatically discover regualari
Many data mining tasks involve making predictions under uncertainty, such as in classification where the goal is to predict a class label $Y$ from an
a data analysis task learns a model classifier predicts categorical (discreate) class labels loan approaval medical diagnosis spam detection autonomou
1. Decision Tree Pruning why do we need pruning? overfitting complex poor performance on unseen data a very detailed tree memorizes the training dat
weighted coordinates are combined to form a 'credit score' the resulting score is then compared to a threshold valuefor input $x = (x_1, ..., x_d)$, a
1. Regression 1.1 Definition a statical method to study relationship between $\mathbf{x}$ and y $\mathbf{x}$: covariate / predictor variable / indep
$X \\in \\R^{N \\times (d+1)}$rows: inputs $\\mathbf{x}\_n$ as row vectors 각 개별 데이터 벡터$\\mathbf{x}\_n$에 1(bias)이라는 항목을 추가한 뒤, 데이터 행렬 $X$를 만들 때는 개별 벡터들
1. Linear Models 1.1 Core of Linear Models signal $s = w^Tx$ combines input variables linearly we have seen two models based on this 입력에 대한 가중치 내적은 각
frequent patterns they reveal hidden regularities and relationship in data association, correlation, casuality sequential patterns partial periodicity
Cluster analysis partitions a set of data objects into subsets called clustersObjects within a cluster are similar to each other, while objects in dif
In many real scenarios, data naturally forms groups at different levels E.g., organization charts or handwriting stylesThe result is a tree structure