[HUFSTUDY] Kaggle Getting Started 데이터 분석 - House Prices - Advanced Regression Techniques

Uomnf97·2022년 8월 21일
0

House Prices - Advanced Regression Techniques


Predict sales prices and practice feature engineering, RFs, and gradient boosting

  • Final Goal : With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.
  • Summary :
    • This Data Analysis is done by Juwon Kim and for ML Modeling
    • Using Pandas(Histogram), Heatmap to check Correlation, and Seaborn for Visualization
    • Data Pre-processing was done using One-hot Encoding/Eliminating Missing Value/Standardization

1. Data Analysis

  • Accurate data analysis is required to learn from the correct ML Model.
  • For Machine Leraning, train data, and test data were loaded . The number of columns, names, and target data and the relationship between each variable was analyzed using various data analysis techniques such as histogram, heat map, and clustering.

import Library

  • pandas, numpy, seaborn, matplotblib.pyplot, seaborn
# Data Analyze
import pandas as pd
import numpy as np
# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# 데이터 tqdm으로 살피기
import tqdm.notebook as tqdm
from sklearn.preprocessing import LabelEncoder

File descriptions

  • train.csv - the training set
  • test.csv - the test set
  • data_description.txt - full description of each column, originally prepared by Dean De Cock but lightly edited to - - match the column names used here
  • sample_submission.csv - a benchmark submission from a linear regression on year and month of sale, lot square footage, and number of bedrooms

Data fields

Here's a brief version of what you'll find in the data description file.

  • SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.
  • MSSubClass: The building class
  • MSZoning: The general zoning classification
  • LotFrontage: Linear feet of street connected to property
  • LotArea: Lot size in square feet
  • Street: Type of road access
  • Alley: Type of alley access
  • LotShape: General shape of property
  • LandContour: Flatness of the property
  • Utilities: Type of utilities available
  • LotConfig: Lot configuration
  • LandSlope: Slope of property
  • Neighborhood: Physical locations within Ames city limits
  • Condition1: Proximity to main road or railroad
  • Condition2: Proximity to main road or railroad (if a second is present)
  • BldgType: Type of dwelling
  • HouseStyle: Style of dwelling
  • OverallQual: Overall material and finish quality
  • OverallCond: Overall condition rating
  • YearBuilt: Original construction date
  • YearRemodAdd: Remodel date
  • RoofStyle: Type of roof
  • RoofMatl: Roof material
  • Exterior1st: Exterior covering on house
  • Exterior2nd: Exterior covering on house (if more than one material)
  • MasVnrType: Masonry veneer type
  • MasVnrArea: Masonry veneer area in square feet
  • ExterQual: Exterior material quality
  • ExterCond: Present condition of the material on the exterior
  • Foundation: Type of foundation
  • BsmtQual: Height of the basement
  • BsmtCond: General condition of the basement
  • BsmtExposure: Walkout or garden level basement walls
  • BsmtFinType1: Quality of basement finished area
  • BsmtFinSF1: Type 1 finished square feet
  • BsmtFinType2: Quality of second finished area (if present)
  • BsmtFinSF2: Type 2 finished square feet
  • BsmtUnfSF: Unfinished square feet of basement area
  • TotalBsmtSF: Total square feet of basement area
  • Heating: Type of heating
  • HeatingQC: Heating quality and condition
  • CentralAir: Central air conditioning
  • Electrical: Electrical system
  • 1stFlrSF: First Floor square feet
  • 2ndFlrSF: Second floor square feet
  • LowQualFinSF: Low quality finished square feet (all floors)
  • GrLivArea: Above grade (ground) living area square feet
  • BsmtFullBath: Basement full bathrooms
  • BsmtHalfBath: Basement half bathrooms
  • FullBath: Full bathrooms above grade
  • HalfBath: Half baths above grade
  • Bedroom: Number of bedrooms above basement level
  • Kitchen: Number of kitchens
  • KitchenQual: Kitchen quality
  • TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
  • Functional: Home functionality rating
  • Fireplaces: Number of fireplaces
  • FireplaceQu: Fireplace quality
  • GarageType: Garage location
  • GarageYrBlt: Year garage was built
  • GarageFinish: Interior finish of the garage
  • GarageCars: Size of garage in car capacity
  • GarageArea: Size of garage in square feet
  • GarageQual: Garage quality
  • GarageCond: Garage condition
  • PavedDrive: Paved driveway
  • WoodDeckSF: Wood deck area in square feet
  • OpenPorchSF: Open porch area in square feet
  • EnclosedPorch: Enclosed porch area in square feet
  • 3SsnPorch: Three season porch area in square feet
  • ScreenPorch: Screen porch area in square feet
  • PoolArea: Pool area in square feet
  • PoolQC: Pool quality
  • Fence: Fence quality
  • MiscFeature: Miscellaneous feature not covered in other categories
  • MiscVal: $Value of miscellaneous feature
  • MoSold: Month Sold
  • YrSold: Year Sold
  • SaleType: Type of sale
  • SaleCondition: Condition of sale
train_data = pd.read_csv('./dataset/train.csv')
train_data

test_data = pd.read_csv('./dataset/train.csv')
test_data

  • There are too many Columns, so we made a list that can check column names in train_data
print(train_data.columns)
columns = list(train_data.columns)

Check Distribution

  • Before eliminating the missing value, I checked distribution
  • The scale of the data were too different, and only 32 columns of data were numeric.
  • I will use Standiarzation for this code
train_data.hist(figsize=(30,20))

IF_null = dict()
for column in columns:
    print(column, train_data[column].isna().sum())
    if train_data[column].isna().sum() > 0 :
        IF_null[column] = train_data[column].isna().sum()

  • Check which columns have missing value
for key in IF_null :
    print("Missing Value Column name :", key)
    print("Number of Missing Value :", IF_null[key])
    print("-------------------------------------------")
print("Collumn number with missing value :", len(IF_null))


dtypes = dict()
for column in columns:
    print(column, train_data[column].dtype)
    dtypes[column]=train_data[column].dtype
Id int64
MSSubClass int64
MSZoning object
LotFrontage float64
LotArea int64
Street object
Alley object
LotShape object
LandContour object
Utilities object
LotConfig object
LandSlope object
Neighborhood object
Condition1 object
Condition2 object
BldgType object
HouseStyle object
OverallQual int64
OverallCond int64
YearBuilt int64
YearRemodAdd int64
RoofStyle object
RoofMatl object
Exterior1st object
Exterior2nd object
MasVnrType object
MasVnrArea float64
ExterQual object
ExterCond object
Foundation object
BsmtQual object
BsmtCond object
BsmtExposure object
BsmtFinType1 object
BsmtFinSF1 int64
BsmtFinType2 object
BsmtFinSF2 int64
BsmtUnfSF int64
TotalBsmtSF int64
Heating object
HeatingQC object
CentralAir object
Electrical object
1stFlrSF int64
2ndFlrSF int64
LowQualFinSF int64
GrLivArea int64
BsmtFullBath int64
BsmtHalfBath int64
FullBath int64
HalfBath int64
BedroomAbvGr int64
KitchenAbvGr int64
KitchenQual object
TotRmsAbvGrd int64
Functional object
Fireplaces int64
FireplaceQu object
GarageType object
GarageYrBlt float64
GarageFinish object
GarageCars int64
GarageArea int64
GarageQual object
GarageCond object
PavedDrive object
WoodDeckSF int64
OpenPorchSF int64
EnclosedPorch int64
3SsnPorch int64
ScreenPorch int64
PoolArea int64
PoolQC object
Fence object
MiscFeature object
MiscVal int64
MoSold int64
YrSold int64
SaleType object
SaleCondition object
SalePrice int64
count=0
for column in columns:
    if dtypes[column] == 'object':
        print(column,dtypes[column])
        count += 1
print("object 칼럼수:",count)
MSZoning object
Street object
Alley object
LotShape object
LandContour object
Utilities object
LotConfig object
LandSlope object
Neighborhood object
Condition1 object
Condition2 object
BldgType object
HouseStyle object
RoofStyle object
RoofMatl object
Exterior1st object
Exterior2nd object
MasVnrType object
ExterQual object
ExterCond object
Foundation object
BsmtQual object
BsmtCond object
BsmtExposure object
BsmtFinType1 object
BsmtFinType2 object
Heating object
HeatingQC object
CentralAir object
Electrical object
KitchenQual object
Functional object
FireplaceQu object
GarageType object
GarageFinish object
GarageQual object
GarageCond object
PavedDrive object
PoolQC object
Fence object
MiscFeature object
SaleType object
SaleCondition object
object 칼럼수: 43

Data Preprocessing

  • We need to Preprocess 43 columns which is an object,
  • We need to preprocess 19 column which has missing values

Handling Missing Value with numeric column

  • fill missing value with mean
MV_numeric = list()

for key in IF_null:
    if dtypes[key] != 'object':
        print(key, dtypes[key], IF_null[key])
        MV_numeric.append(key)

train_data[MV_numeric].hist(figsize=(30,20))

train_data[MV_numeric]=train_data[MV_numeric].fillna(train_data[MV_numeric].mean())
train_data[MV_numeric].isna().sum()
LotFrontage    0
MasVnrArea     0
GarageYrBlt    0
dtype: int64

Handling Object Data

unprocessed_object = []

for key in dtypes:
    if dtypes[key] == 'object':
        unprocessed_object.append(key)
print(unprocessed_object)
print(len(unprocessed_object))
['MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive', 'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition']
43
train_data[unprocessed_object] = train_data[unprocessed_object].fillna("0")
check_data = dict()

for unprocessed in tqdm.tqdm(unprocessed_object):
    check_data[unprocessed]=dict()
    temp=dict()
    for i in range(len(train_data['MSZoning'])):
        if train_data.iloc[i][unprocessed] in temp :
            temp[train_data.iloc[i][unprocessed]]+=1
        else :
            temp[train_data.iloc[i][unprocessed]]=1
    check_data[unprocessed]=temp

for data in check_data :
    print(data,end=" <")
    for key in check_data[data]:
        print(key,":", check_data[data][key],end ="| ")
    print(">",)
    print("Uniques:",len(check_data[data]))
MSZoning <RL : 1151| RM : 218| C (all) : 10| FV : 65| RH : 16| >
Uniques: 5
Street <Pave : 1454| Grvl : 6| >
Uniques: 2
Alley <0 : 1369| Grvl : 50| Pave : 41| >
Uniques: 3
LotShape <Reg : 925| IR1 : 484| IR2 : 41| IR3 : 10| >
Uniques: 4
LandContour <Lvl : 1311| Bnk : 63| Low : 36| HLS : 50| >
Uniques: 4
Utilities <AllPub : 1459| NoSeWa : 1| >
Uniques: 2
LotConfig <Inside : 1052| FR2 : 47| Corner : 263| CulDSac : 94| FR3 : 4| >
Uniques: 5
LandSlope <Gtl : 1382| Mod : 65| Sev : 13| >
Uniques: 3
Neighborhood <CollgCr : 150| Veenker : 11| Crawfor : 51| NoRidge : 41| Mitchel : 49| Somerst : 86| NWAmes : 73| OldTown : 113| BrkSide : 58| Sawyer : 74| NridgHt : 77| NAmes : 225| SawyerW : 59| IDOTRR : 37| MeadowV : 17| Edwards : 100| Timber : 38| Gilbert : 79| StoneBr : 25| ClearCr : 28| NPkVill : 9| Blmngtn : 17| BrDale : 16| SWISU : 25| Blueste : 2| >
Uniques: 25
Condition1 <Norm : 1260| Feedr : 81| PosN : 19| Artery : 48| RRAe : 11| RRNn : 5| RRAn : 26| PosA : 8| RRNe : 2| >
Uniques: 9
Condition2 <Norm : 1445| Artery : 2| RRNn : 2| Feedr : 6| PosN : 2| PosA : 1| RRAn : 1| RRAe : 1| >
Uniques: 8
BldgType <1Fam : 1220| 2fmCon : 31| Duplex : 52| TwnhsE : 114| Twnhs : 43| >
Uniques: 5
HouseStyle <2Story : 445| 1Story : 726| 1.5Fin : 154| 1.5Unf : 14| SFoyer : 37| SLvl : 65| 2.5Unf : 11| 2.5Fin : 8| >
Uniques: 8
RoofStyle <Gable : 1141| Hip : 286| Gambrel : 11| Mansard : 7| Flat : 13| Shed : 2| >
Uniques: 6
RoofMatl <CompShg : 1434| WdShngl : 6| Metal : 1| WdShake : 5| Membran : 1| Tar&Grv : 11| Roll : 1| ClyTile : 1| >
Uniques: 8
Exterior1st <VinylSd : 515| MetalSd : 220| Wd Sdng : 206| HdBoard : 222| BrkFace : 50| WdShing : 26| CemntBd : 61| Plywood : 108| AsbShng : 20| Stucco : 25| BrkComm : 2| AsphShn : 1| Stone : 2| ImStucc : 1| CBlock : 1| >
Uniques: 15
Exterior2nd <VinylSd : 504| MetalSd : 214| Wd Shng : 38| HdBoard : 207| Plywood : 142| Wd Sdng : 197| CmentBd : 60| BrkFace : 25| Stucco : 26| AsbShng : 20| Brk Cmn : 7| ImStucc : 10| AsphShn : 3| Stone : 5| Other : 1| CBlock : 1| >
Uniques: 16
MasVnrType <BrkFace : 445| None : 864| Stone : 128| BrkCmn : 15| 0 : 8| >
Uniques: 5
ExterQual <Gd : 488| TA : 906| Ex : 52| Fa : 14| >
Uniques: 4
ExterCond <TA : 1282| Gd : 146| Fa : 28| Po : 1| Ex : 3| >
Uniques: 5
Foundation <PConc : 647| CBlock : 634| BrkTil : 146| Wood : 3| Slab : 24| Stone : 6| >
Uniques: 6
BsmtQual <Gd : 618| TA : 649| Ex : 121| 0 : 37| Fa : 35| >
Uniques: 5
BsmtCond <TA : 1311| Gd : 65| 0 : 37| Fa : 45| Po : 2| >
Uniques: 5
BsmtExposure <No : 953| Gd : 134| Mn : 114| Av : 221| 0 : 38| >
Uniques: 5
BsmtFinType1 <GLQ : 418| ALQ : 220| Unf : 430| Rec : 133| BLQ : 148| 0 : 37| LwQ : 74| >
Uniques: 7
BsmtFinType2 <Unf : 1256| BLQ : 33| 0 : 38| ALQ : 19| Rec : 54| LwQ : 46| GLQ : 14| >
Uniques: 7
Heating <GasA : 1428| GasW : 18| Grav : 7| Wall : 4| OthW : 2| Floor : 1| >
Uniques: 6
HeatingQC <Ex : 741| Gd : 241| TA : 428| Fa : 49| Po : 1| >
Uniques: 5
CentralAir <Y : 1365| N : 95| >
Uniques: 2
Electrical <SBrkr : 1334| FuseF : 27| FuseA : 94| FuseP : 3| Mix : 1| 0 : 1| >
Uniques: 6
KitchenQual <Gd : 586| TA : 735| Ex : 100| Fa : 39| >
Uniques: 4
Functional <Typ : 1360| Min1 : 31| Maj1 : 14| Min2 : 34| Mod : 15| Maj2 : 5| Sev : 1| >
Uniques: 7
FireplaceQu <0 : 690| TA : 313| Gd : 380| Fa : 33| Ex : 24| Po : 20| >
Uniques: 6
GarageType <Attchd : 870| Detchd : 387| BuiltIn : 88| CarPort : 9| 0 : 81| Basment : 19| 2Types : 6| >
Uniques: 7
GarageFinish <RFn : 422| Unf : 605| Fin : 352| 0 : 81| >
Uniques: 4
GarageQual <TA : 1311| Fa : 48| Gd : 14| 0 : 81| Ex : 3| Po : 3| >
Uniques: 6
GarageCond <TA : 1326| Fa : 35| 0 : 81| Gd : 9| Po : 7| Ex : 2| >
Uniques: 6
PavedDrive <Y : 1340| N : 90| P : 30| >
Uniques: 3
PoolQC <0 : 1453| Ex : 2| Fa : 2| Gd : 3| >
Uniques: 4
Fence <0 : 1179| MnPrv : 157| GdWo : 54| GdPrv : 59| MnWw : 11| >
Uniques: 5
MiscFeature <0 : 1406| Shed : 49| Gar2 : 2| Othr : 2| TenC : 1| >
Uniques: 5
SaleType <WD : 1267| New : 122| COD : 43| ConLD : 9| ConLI : 5| CWD : 4| ConLw : 5| Con : 2| Oth : 3| >
Uniques: 9
SaleCondition <Normal : 1198| Abnorml : 101| Partial : 125| AdjLand : 4| Alloca : 12| Family : 20| >
Uniques: 6
  • Label Encoding
l_encoder =  LabelEncoder()
for ele in tqdm.tqdm(unprocessed_object):
    train_data[ele] = l_encoder.fit_transform(train_data[ele])
train_data

for column in columns:
    print(column, train_data[column].isna().sum())
Id 0
MSSubClass 0
MSZoning 0
LotFrontage 0
LotArea 0
Street 0
Alley 0
LotShape 0
LandContour 0
Utilities 0
LotConfig 0
LandSlope 0
Neighborhood 0
Condition1 0
Condition2 0
BldgType 0
HouseStyle 0
OverallQual 0
OverallCond 0
YearBuilt 0
YearRemodAdd 0
RoofStyle 0
RoofMatl 0
Exterior1st 0
Exterior2nd 0
MasVnrType 0
MasVnrArea 0
ExterQual 0
ExterCond 0
Foundation 0
BsmtQual 0
BsmtCond 0
BsmtExposure 0
BsmtFinType1 0
BsmtFinSF1 0
BsmtFinType2 0
BsmtFinSF2 0
BsmtUnfSF 0
TotalBsmtSF 0
Heating 0
HeatingQC 0
CentralAir 0
Electrical 0
1stFlrSF 0
2ndFlrSF 0
LowQualFinSF 0
GrLivArea 0
BsmtFullBath 0
BsmtHalfBath 0
FullBath 0
HalfBath 0
BedroomAbvGr 0
KitchenAbvGr 0
KitchenQual 0
TotRmsAbvGrd 0
Functional 0
Fireplaces 0
FireplaceQu 0
GarageType 0
GarageYrBlt 0
GarageFinish 0
GarageCars 0
GarageArea 0
GarageQual 0
GarageCond 0
PavedDrive 0
WoodDeckSF 0
OpenPorchSF 0
EnclosedPorch 0
3SsnPorch 0
ScreenPorch 0
PoolArea 0
PoolQC 0
Fence 0
MiscFeature 0
MiscVal 0
MoSold 0
YrSold 0
SaleType 0
SaleCondition 0
SalePrice 0
column_sets = [0]*9
for i in range(0,81,10):
    column_sets[i//10] = columns[i:i+10]
for column_set in column_sets:
    train_data[column_set].hist(figsize=(30,20))





for column_set in column_sets:
    print(train_data[column_set].describe())
Id   MSSubClass     MSZoning  LotFrontage        LotArea  \\
count  1460.000000  1460.000000  1460.000000  1460.000000    1460.000000
mean    730.500000    56.897260     3.028767    70.049958   10516.828082
std     421.610009    42.300571     0.632017    22.024023    9981.264932
min       1.000000    20.000000     0.000000    21.000000    1300.000000
25%     365.750000    20.000000     3.000000    60.000000    7553.500000
50%     730.500000    50.000000     3.000000    70.049958    9478.500000
75%    1095.250000    70.000000     3.000000    79.000000   11601.500000
max    1460.000000   190.000000     4.000000   313.000000  215245.000000

            Street        Alley     LotShape  LandContour    Utilities
count  1460.000000  1460.000000  1460.000000  1460.000000  1460.000000
mean      0.995890     0.090411     1.942466     2.777397     0.000685
std       0.063996     0.372151     1.409156     0.707666     0.026171
min       0.000000     0.000000     0.000000     0.000000     0.000000
25%       1.000000     0.000000     0.000000     3.000000     0.000000
50%       1.000000     0.000000     3.000000     3.000000     0.000000
75%       1.000000     0.000000     3.000000     3.000000     0.000000
max       1.000000     2.000000     3.000000     3.000000     1.000000
         LotConfig    LandSlope  Neighborhood   Condition1   Condition2  \\
count  1460.000000  1460.000000   1460.000000  1460.000000  1460.000000
mean      3.019178     0.062329     12.251370     2.031507     2.008219
std       1.622634     0.276232      6.013735     0.868515     0.259040
min       0.000000     0.000000      0.000000     0.000000     0.000000
25%       2.000000     0.000000      7.000000     2.000000     2.000000
50%       4.000000     0.000000     12.000000     2.000000     2.000000
75%       4.000000     0.000000     17.000000     2.000000     2.000000
max       4.000000     2.000000     24.000000     8.000000     7.000000

          BldgType   HouseStyle  OverallQual  OverallCond    YearBuilt
count  1460.000000  1460.000000  1460.000000  1460.000000  1460.000000
mean      0.493151     3.038356     6.099315     5.575342  1971.267808
std       1.198277     1.911305     1.382997     1.112799    30.202904
min       0.000000     0.000000     1.000000     1.000000  1872.000000
25%       0.000000     2.000000     5.000000     5.000000  1954.000000
50%       0.000000     2.000000     6.000000     5.000000  1973.000000
75%       0.000000     5.000000     7.000000     6.000000  2000.000000
max       4.000000     7.000000    10.000000     9.000000  2010.000000
       YearRemodAdd    RoofStyle     RoofMatl  Exterior1st  Exterior2nd  \\
count   1460.000000  1460.000000  1460.000000  1460.000000  1460.000000
mean    1984.865753     1.410274     1.075342     9.624658    10.339726
std       20.645407     0.834998     0.599127     3.197659     3.540570
min     1950.000000     0.000000     0.000000     0.000000     0.000000
25%     1967.000000     1.000000     1.000000     8.000000     8.000000
50%     1994.000000     1.000000     1.000000    12.000000    13.000000
75%     2004.000000     1.000000     1.000000    12.000000    13.000000
max     2010.000000     5.000000     7.000000    14.000000    15.000000

        MasVnrType   MasVnrArea    ExterQual    ExterCond   Foundation
count  1460.000000  1460.000000  1460.000000  1460.000000  1460.000000
mean      2.745890   103.685262     2.539726     3.733562     1.396575
std       0.646987   180.569112     0.693995     0.731807     0.722394
min       0.000000     0.000000     0.000000     0.000000     0.000000
25%       2.000000     0.000000     2.000000     4.000000     1.000000
50%       3.000000     0.000000     3.000000     4.000000     1.000000
75%       3.000000   164.250000     3.000000     4.000000     2.000000
max       4.000000  1600.000000     3.000000     4.000000     5.000000
          BsmtQual     BsmtCond  BsmtExposure  BsmtFinType1   BsmtFinSF1  \\
count  1460.000000  1460.000000   1460.000000   1460.000000  1460.000000
mean      3.178767     3.715753      3.180137      3.637671   443.639726
std       0.998402     0.884346      1.246138      1.895727   456.098091
min       0.000000     0.000000      0.000000      0.000000     0.000000
25%       3.000000     4.000000      2.000000      2.000000     0.000000
50%       3.000000     4.000000      4.000000      3.000000   383.500000
75%       4.000000     4.000000      4.000000      6.000000   712.250000
max       4.000000     4.000000      4.000000      6.000000  5644.000000

       BsmtFinType2   BsmtFinSF2    BsmtUnfSF  TotalBsmtSF      Heating
count   1460.000000  1460.000000  1460.000000  1460.000000  1460.000000
mean       5.559589    46.549315   567.240411  1057.429452     1.036301
std        1.296332   161.319273   441.866955   438.705324     0.295124
min        0.000000     0.000000     0.000000     0.000000     0.000000
25%        6.000000     0.000000   223.000000   795.750000     1.000000
50%        6.000000     0.000000   477.500000   991.500000     1.000000
75%        6.000000     0.000000   808.000000  1298.250000     1.000000
max        6.000000  1474.000000  2336.000000  6110.000000     5.000000
         HeatingQC   CentralAir   Electrical     1stFlrSF     2ndFlrSF  \\
count  1460.000000  1460.000000  1460.000000  1460.000000  1460.000000
mean      1.538356     0.934932     4.678767  1162.626712   346.992466
std       1.739524     0.246731     1.058385   386.587738   436.528436
min       0.000000     0.000000     0.000000   334.000000     0.000000
25%       0.000000     1.000000     5.000000   882.000000     0.000000
50%       0.000000     1.000000     5.000000  1087.000000     0.000000
75%       4.000000     1.000000     5.000000  1391.250000   728.000000
max       4.000000     1.000000     5.000000  4692.000000  2065.000000

       LowQualFinSF    GrLivArea  BsmtFullBath  BsmtHalfBath     FullBath
count   1460.000000  1460.000000   1460.000000   1460.000000  1460.000000
mean       5.844521  1515.463699      0.425342      0.057534     1.565068
std       48.623081   525.480383      0.518911      0.238753     0.550916
min        0.000000   334.000000      0.000000      0.000000     0.000000
25%        0.000000  1129.500000      0.000000      0.000000     1.000000
50%        0.000000  1464.000000      0.000000      0.000000     2.000000
75%        0.000000  1776.750000      1.000000      0.000000     2.000000
max      572.000000  5642.000000      3.000000      2.000000     3.000000
          HalfBath  BedroomAbvGr  KitchenAbvGr  KitchenQual  TotRmsAbvGrd  \\
count  1460.000000   1460.000000   1460.000000  1460.000000   1460.000000
mean      0.382877      2.866438      1.046575     2.339726      6.517808
std       0.502885      0.815778      0.220338     0.830161      1.625393
min       0.000000      0.000000      0.000000     0.000000      2.000000
25%       0.000000      2.000000      1.000000     2.000000      5.000000
50%       0.000000      3.000000      1.000000     3.000000      6.000000
75%       1.000000      3.000000      1.000000     3.000000      7.000000
max       2.000000      8.000000      3.000000     3.000000     14.000000

        Functional   Fireplaces  FireplaceQu   GarageType  GarageYrBlt
count  1460.000000  1460.000000  1460.000000  1460.000000  1460.000000
mean      5.749315     0.613014     1.969178     3.097260  1978.506164
std       0.979659     0.644666     2.037956     1.890815    23.994583
min       0.000000     0.000000     0.000000     0.000000  1900.000000
25%       6.000000     0.000000     0.000000     2.000000  1962.000000
50%       6.000000     1.000000     2.000000     2.000000  1978.506164
75%       6.000000     1.000000     3.000000     6.000000  2001.000000
max       6.000000     3.000000     5.000000     6.000000  2010.000000
       GarageFinish   GarageCars   GarageArea   GarageQual   GarageCond  \\
count   1460.000000  1460.000000  1460.000000  1460.000000  1460.000000
mean       2.062329     1.767123   472.980137     4.594521     4.628082
std        0.934939     0.747315   213.804841     1.262078     1.231595
min        0.000000     0.000000     0.000000     0.000000     0.000000
25%        1.000000     1.000000   334.500000     5.000000     5.000000
50%        2.000000     2.000000   480.000000     5.000000     5.000000
75%        3.000000     2.000000   576.000000     5.000000     5.000000
max        3.000000     4.000000  1418.000000     5.000000     5.000000

        PavedDrive   WoodDeckSF  OpenPorchSF  EnclosedPorch    3SsnPorch
count  1460.000000  1460.000000  1460.000000    1460.000000  1460.000000
mean      1.856164    94.244521    46.660274      21.954110     3.409589
std       0.496592   125.338794    66.256028      61.119149    29.317331
min       0.000000     0.000000     0.000000       0.000000     0.000000
25%       2.000000     0.000000     0.000000       0.000000     0.000000
50%       2.000000     0.000000    25.000000       0.000000     0.000000
75%       2.000000   168.000000    68.000000       0.000000     0.000000
max       2.000000   857.000000   547.000000     552.000000   508.000000
       ScreenPorch     PoolArea       PoolQC        Fence  MiscFeature  \\
count  1460.000000  1460.000000  1460.000000  1460.000000  1460.000000
mean     15.060959     2.758904     0.010274     0.467123     0.107534
std      55.757415    40.177307     0.158916     1.029191     0.555437
min       0.000000     0.000000     0.000000     0.000000     0.000000
25%       0.000000     0.000000     0.000000     0.000000     0.000000
50%       0.000000     0.000000     0.000000     0.000000     0.000000
75%       0.000000     0.000000     0.000000     0.000000     0.000000
max     480.000000   738.000000     3.000000     4.000000     4.000000

            MiscVal       MoSold       YrSold     SaleType  SaleCondition
count   1460.000000  1460.000000  1460.000000  1460.000000    1460.000000
mean      43.489041     6.321918  2007.815753     7.513014       3.770548
std      496.123024     2.703626     1.328095     1.552100       1.100854
min        0.000000     1.000000  2006.000000     0.000000       0.000000
25%        0.000000     5.000000  2007.000000     8.000000       4.000000
50%        0.000000     6.000000  2008.000000     8.000000       4.000000
75%        0.000000     8.000000  2009.000000     8.000000       4.000000
max    15500.000000    12.000000  2010.000000     8.000000       5.000000
           SalePrice
count    1460.000000
mean   180921.195890
std     79442.502883
min     34900.000000
25%    129975.000000
50%    163000.000000
75%    214000.000000
max    755000.000000
  • Check_Correlation
# z-정규화( x-평균/표준편차)
train_data_normed = (train_data- train_data.mean())/train_data.std()
train_data_normed

target_column = train_data['SalePrice']
train_data = train_data.drop('SalePrice',axis=1)
column_sets.pop(8)
# 선형성 확인
for ele in column_sets:
    analysis = pd.merge(train_data_normed[ele], target_column,
                left_index = True, right_index=True)
    plt.figure(figsize=(16,16))
    sns.heatmap(analysis.corr(), linewidths=.5, cmap = 'Blues', annot=True)





for ele in column_sets:
    analysis = pd.merge(train_data_normed[ele], target_column,
                    left_index = True, right_index=True)
    sns.pairplot(analysis,x_vars=ele[:5],y_vars=["SalePrice"],hue="SalePrice")
    sns.pairplot(analysis,x_vars=ele[5:],y_vars=["SalePrice"],hue="SalePrice")
    plt.show()

profile
사회적 가치를 실현하는 프로그래머

2개의 댓글

comment-user-thumbnail
2023년 6월 4일

Woodworkers, rejoice! Which Shop Fox tools have become your go-to choices for enhancing your woodworking projects? Join the conversation and discover the must-have Shop Fox tools that can streamline your woodworking process.

답글 달기
comment-user-thumbnail
2023년 6월 4일

Woodworkers, it's time to rejoice! Share your go-to choices from the Shop Fox tool lineup that have enhanced your woodworking projects. Join the conversation and help fellow woodworkers discover the must-have Shop Fox tools that can streamline their woodworking process.

Shop Fox offers a wide range of woodworking tools known for their quality and performance. From table saws and jointers to planers and sanders, their tools cater to various woodworking needs. By sharing your experiences and recommendations, you can assist others in finding the right Shop Fox tools to optimize their woodworking projects.

Consider factors such as precision, reliability, versatility, and specific woodworking tasks when discussing your preferred Shop Fox tools. Whether it's a particular table saw model, a top-notch router, or a versatile band saw, your insights can guide woodworkers in making informed choices.

For more in-depth recommendations and expert advice on Shop Fox tools, I recommend visiting the Power Tool Institute's website at site. There, you can find informative articles and guides that cover various power tool topics, including recommendations for specific woodworking tools from Shop Fox. This resource can offer additional insights and help you further explore the Shop Fox tool lineup. Visit https://powertoolinstitute.net/category/recommendations/ to access valuable content and expand your knowledge of Shop Fox tools.

답글 달기