Dissecting the Practice Dataset

Ji Kim·2021년 1월 5일

machine learning scikit learn

Machine Learning

목록 보기

2/15

Analyze how the default practice datasets are organized using keys.

Keys

Keys are normally composed of data, target, target_name, feature_names & DESCR

data : feature data-set
target : label data-set in classification, number data-set in regression
target_names : names of label data (only in classification)
feature_names : names of feature data
DESCR : explanation on dataset and each features

Input

import sklearn
from sklearn.datasets import load_iris

iris_data = load_iris()
print(type(iris_data))

Output

<class 'sklearn.utils.Bunch'>

Input

keys = iris_data.keys()
print('Keys of Iris Data : ', keys)

Ouput

Keys of Iris Data :  dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

Input

print('Type of Feature Names : ', type(iris_data.feature_names))
print('Shape of Feature Names : ', len(iris_data.feature_names))
print(iris_data.feature_names)

print('\n Type of Target Names : ', type(iris_data.target_names))
print('Shape of Target Names : ', len(iris_data.target_names))
print(iris_data.target_names)

print('\n Type of Data : ', type(iris_data.data))
print('Shape of Data : ', iris_data.data.shape)
print(iris_data['data'][:5]) #print first five feature data

print('Type of Target Data : ', type(iris_data.target))
print('Shape of Target Data : ', iris_data.target.shape)
print(iris_data.target)

Output

Type of Feature Names :  <class 'list'>
Shape of Feature Names :  4
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

 Type of Target Names :  <class 'numpy.ndarray'>
Shape of Target Names :  3
['setosa' 'versicolor' 'virginica']

 Type of Data :  <class 'numpy.ndarray'>
Shape of Data :  (150, 4)
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]
Type of Target Data :  <class 'numpy.ndarray'>
Shape of Target Data :  (150,)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

Ji Kim

if this then that

이전 포스트

Introducing Scikit-Learn

다음 포스트

Dissecting the Practice Dataset

Machine Learning

Introducing Scikit-Learn

Importance of Splitting Train & Test Set

0개의 댓글