Machine Learning & Deep Learning practice code.4

AI Engineering Course Log·2023년 7월 12일

road to AI Engineering

목록 보기

58/83

read files and save as df

df = pd.read_csv('data_v1.csv')

df

EDA(Exploratory Data Analysis)

df.head()

df.tail()

df.info()

df.index

df.columns

df.values

check if null data exists

df.isnull().sum()

get the information of statistics

df.describe()

grasp the structure of the data

df.info()

delete the columns

df.drop('customerID', axis=1, inplace=True)

df.info()

change of the types of columns

df['TotalCharges']

change the colomn type to float

df['TotalCharges'].astype(float) ---> WRONG

search with boolean indexing

(df[TotalCharges'] == '') | (df[TotalCharges'] == ' ')

cond = (df['TotalCharges'] == '') | (df[Totalcharges'] == ' ')
df[cond]

change df['TotalCharges'] to Zero

df['TotalCharges'].replace([' '], ['0'], inplace=True)

check change of TotalCharges column to float

df['TotalCharges'] = df['TotalCharges'].astype(float)

cond = (df['TotalCharges'] == '') | (df['TotalCharges'] == ' ')
df[cond]

checking

df.info()

change column 'Churn's value format to numbers

df['Churn'].value_counts()

change 'Churn's Yes, No to 1, 0

df['churn'].replace[replace([Yes', 'No'], [1, 0], inplace=True)

check the column's distribution

df['Churn'].value_counts()

check existence of null data

df.isnull().sum()

delete columns that has many null data with drop

df.drop('DeviceProtection', axis=1, inplace=True)
df.dropna(inplace=True)

check if there's another null

df.isnull().sum()

df.info()

\< Visualization>

import matplotlib.pyplot as plt
%matplotlib inline

df['gender'].value_counts()

df['gender'].value_counts().plot(kind='bar')

column patner's distribution. bar chart

df['Partner'].value_counts().plot(kind='bar')

make a bar chart of 'object'column at once using select_dtype() function

df.select_dtypes('O').head(3)

select only Object column names

df.select_dtypes('O').columns.values

draw bar chart of object column one by one
dependents, phoneService -> has unbalance -> delete needed

object_list = df.select_dtypes('object').columns.values

for col in object_list:
	df[col].value_counts().plot(kind='bar')
    plt.title(col)
    plt.show()

deleting unbalanced columns

df.drop('PhoneService', axis=1, inplace=True)

visualize columns that has number type(int, float)

df.select_dtypes('number').head(3)

checkgin Churn column

df['Churn'].value_counts()

checking bar chart of 'Churn' column

df['Churn'].value_counts().plot(kind='bar')

same process for 'SeniorCitizen' Column

df['SeniorCitizen'].value_counts()

df['SeniorCitizen'].value_counts().plot(kind='bar')

df.drop('SeniorCitizen', axis=1, inplace=True)

df.info()

Histogram

sns.histplot(data=df, x='tenure')

sns.histplot(data=df, x='tenure', hue='Churn')

make it curve gragh

sns.kdeplot(data=df, x='tenure', hue='Churn')

sns.histplot(data=df, x='TotalCharges')

sns.kdeplot(data=df, x='TotalCharges', hue='Churn')

sns.countplot(data=df, x='MultipleLines', hue='Churn')

Heatmap
correalation between columns

df[['tenure', 'MonthlyCharges', 'TotalCharges']].corr()

sns.heatmap(df[['tenure', 'MonthlyCharges', 'TotalCharges']].corr(), annot=True)

-Boxplot

sns.boxplot(data=df, x='Churn', y='TotalCharges')

restore the result as csv file

df.to_csv('data_v1_save.csv', index=False)

pd.read_csv('data_v1_save.csv').head()

AI Engineering Course Log

이전 포스트

Machine Learning & Deep Learning practice code.3

다음 포스트

Machine Learning & Deep Learning practice code.4

road to AI Engineering

Machine Learning & Deep Learning practice code.3

Machine Learning & Deep Learning practice code.5

0개의 댓글

관련 채용 정보