법정동 정보가 포함된 생활인구 데이터에서 연령대와 성별 컬럼을 생성한 뒤 특정 지역의 연령대별, 성별 생활인구 분포를 확인
# 연령대
select_loc['10대'] = select_loc['m10'] + select_loc['f10']
select_loc['20대'] = select_loc['m20'] + select_loc['f20']
select_loc['30대'] = select_loc['m30'] + select_loc['f30']
select_loc['40대'] = select_loc['m40'] + select_loc['f40']
select_loc['50대'] = select_loc['m50'] + select_loc['f50']
select_loc['60대'] = select_loc['m60'] + select_loc['f60']
select_loc['70대'] = select_loc['m70'] + select_loc['f70']
# 성별
select_loc['여성'] = select_loc.iloc[:,4:11].sum(axis=1)
select_loc['남성'] = select_loc.iloc[:,11:18].sum(axis=1)
gen_agg = select_loc.groupby('법정동')[['여성','남성']].agg(['mean','sum']).reset_index()
gen_agg
# output
법정동 여성 남성
mean sum mean sum
0 당산동 17.404937 2584650.55 19.108124 2837575.54
1 당산동1가 30.739663 1859380.75 33.834365 2046573.07
2 당산동2가 26.394831 1235278.10 24.996037 1169814.55
3 당산동3가 30.969687 3162686.33 33.308881 3401569.57
4 당산동4가 45.533573 3868623.39 54.115329 4597746.60
5 당산동5가 30.401843 2500369.16 35.958794 2957395.02
6 당산동6가 39.533652 1878204.26 42.832377 2034923.42
7 대림동 29.375065 16829885.27 31.079543 17806433.92
8 도림동 24.259191 4035152.61 26.439045 4397738.63
9 문래동1가 25.679849 1224286.82 25.551138 1218150.49
10 문래동2가 14.732221 594003.14 13.029197 525337.21
11 문래동3가 25.510510 4913706.89 28.013079 5395739.17
12 문래동4가 18.566977 909039.17 17.839680 873430.71
13 문래동5가 17.767139 1273388.63 17.608673 1262031.17
14 문래동6가 27.349650 2316405.96 29.102588 2464872.78
15 신길동 22.338304 22728799.94 25.166831 25606772.14
16 양평동 8.547349 885838.69 8.454930 876260.53
17 양평동1가 28.300617 1885104.11 29.492957 1964525.88
18 양평동2가 27.219579 1368001.60 29.711336 1493232.32
19 양평동3가 24.889915 3231980.36 25.985540 3374248.29
20 양평동4가 27.103771 2841965.89 26.792024 2809277.72
21 양평동5가 31.103663 1993122.75 31.365771 2009918.62
22 양평동6가 24.762714 891457.72 23.403869 842539.29
23 양화동 5.292033 1105807.44 4.361001 911261.77
24 여의도동 26.672933 29676572.38 23.154221 25761618.17
25 영등포동 21.884118 3688283.52 23.681413 3991194.30
26 영등포동1가 56.608654 1263505.15 46.869292 1046122.60
27 영등포동2가 16.775238 1382598.33 14.239823 1173631.97
28 영등포동3가 38.555388 1331587.45 34.345339 1186184.99
29 영등포동4가 35.329965 1859946.03 45.929269 2417946.36
30 영등포동5가 21.485027 603299.55 19.696241 553070.45
31 영등포동6가 40.605435 1052492.87 38.897074 1008212.15
32 영등포동7가 26.469895 2118994.52 24.641619 1972635.49
33 영등포동8가 87.697098 3914798.46 101.314230 4522667.22
gen_male = gen_agg.set_index('법정동')[['남성']].stack(level=0).reset_index()
gen_female = gen_agg.set_index('법정동')[['여성']].stack(level=0).reset_index()
level_df = gen_female.merge(gen_male, on=['법정동','level_1','mean','sum'],how='outer')
level_df
histplot
def gen(x):
return ['Female' if x == '여성' else 'Male'][0]
level_df['level_1'] = level_df['level_1'].map(gen)
plt.figure(figsize=(15,5))
sns.histplot(level_df, x='mean', hue='level_1', kde=True)
plt.legend(title='gender',labels=level_df.level_1.unique().tolist()[::-1])
plt.savefig('hist_plot.png')
평균 생활인구의 성별 분포는 비슷함.
age_df = select_loc.groupby('법정동')[['10대','20대','30대','40대','50대','60대','70대']].agg(['mean','sum']).reset_index()
age_df.head(2)
법정동 10대 20대 30대 40대 50대 60대 70대
mean sum mean sum mean sum mean sum mean sum mean sum mean sum
0 당산동 2.261163 335784.93 8.678359 1288744.93 9.179425 1363153.73 6.345295 942282.68 4.845142 719508.37 3.189747 473680.55 2.013932 299070.9
1 당산동1가 3.306034 199975.37 10.888741 658638.15 15.119263 914533.96 10.742620 649799.57 10.846424 656078.47 8.452030 511246.40 5.218918 315681.9
age_10 = age_df.set_index('법정동')[['10대']].stack(level=0).reset_index()
for i in range(20,80,10):
age_range = age_df.set_index('법정동')[[f'{i}대']].stack(level=0).reset_index()
age_10 = age_10.merge(age_range, on=['법정동','level_1','mean','sum'],how='outer')
histplot
age_10['level_1'] = age_10['level_1'].str.replace('대','')
plt.figure(figsize=(15,5))
sns.histplot(age_10, x='sum', hue='level_1', kde=True)
plt.savefig('hist_plot_age.png')
target_age = age_10[age_10['level_1'].isin(['30','40'])]
plt.figure(figsize=(15,5))
sns.histplot(target_age, x='sum', hue='level_1', kde=True)
plt.savefig('hist_plot_age3040.png')
target_age['mean'].std()
# output
6.9572093072979495
mode / median / mean
round(target_age['mean']).mode()[0],target_age['mean'].median(),target_age['mean'].mean()
# output
(9.0, 11.410605288489172, 12.355457334040697)
target_age = age_10[age_10['level_1'].isin(['10','70'])]
plt.figure(figsize=(15,5))
sns.histplot(target_age, x='sum', hue='level_1',palette='Accent', kde=True)
plt.savefig('hist_plot_age1070.png')
target_age['mean'].std()
# output
2.27766613014633
mode / median / mean
round(target_age['mean']).mode()[0],target_age['mean'].median(),target_age['mean'].mean()
# output
(4.0, 3.404662687708904, 3.6857111421246325)
10대와 70대의 평균 생활인구 수가 가장 낮았으며 분포가 유사했다. 30-40대의 평균 생활인구 수가 가장 높고, 분포가 유사하며 표준편차가 더 큰 것으로 확인되었다.