7์ผ์ฐจ ๊ฐ•์˜ : ๐Ÿšด ๋”ฐ๋ฆ‰์ด ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™”

Luis_Jยท2024๋…„ 9์›” 27์ผ
0

MS_AI_School 5๊ธฐ

๋ชฉ๋ก ๋ณด๊ธฐ
8/70
post-thumbnail

Summary

  1. ํŒŒ์ด์ฌ์„ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™”๋Š” ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ๋ฒ•๋งŒ ์•Œ๋ฉด ์˜์™ธ๋กœ ์‰ฌ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  2. ๋ฌธ์ œ ์ •์˜, ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘, ๋ฐ์ดํ„ฐ ๊ฐ€๊ณต์ด

Introduction

๊น€์ˆ˜์ • ๊ฐ•์‚ฌ๋‹˜
Python์„ ํ™œ์šฉํ•˜์—ฌ ์„œ์šธ์‹œ ๋”ฐ๋ฆ‰์ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๊ณ  ์‹œ๊ฐํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์ด ๊ธ€์—์„œ๋Š” ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘, ๊ฐ€๊ณต, ๋ถ„์„, ๊ทธ๋ฆฌ๊ณ  ์‹œ๊ฐํ™”๊นŒ์ง€์˜ ์ „ ๊ณผ์ •์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“Œ ๋ชฉ์ฐจ
1. ๋ฌธ์ œ ์ •์˜
2. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
3. ๋ฐ์ดํ„ฐ ๊ฐ€๊ณต
4. ๋ฐ์ดํ„ฐ ๋ถ„์„
5. ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”

Code, Conept & Explanation

1๏ธโƒฃ ๋ฌธ์ œ ์ •์˜

์„œ์šธ์‹œ ๊ณต๊ณต์ž์ „๊ฑฐ ๋”ฐ๋ฆ‰์ด ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์–‘ํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์งˆ๋ฌธ์„ ํ•ด๊ฒฐํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

  • ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ์ด์šฉ ํŒจํ„ด์€?
  • ์žฅ์†Œ์  ํŠน์ง•์— ๋”ฐ๋ฅธ ์ด์šฉ ํŒจํ„ด์€?
  • ์‹œ๊ฐ„๊ณผ ์žฅ์†Œ๋ฅผ ๋™์‹œ์— ๊ณ ๋ คํ•œ ์ด์šฉ ํŒจํ„ด์€?

์ฃผ์š” ๋ถ„์„ ์ฃผ์ œ

์‹œ๊ฐ„ ๊ฐœ๋…: ์š”์ผ๋ณ„, ์‹œ๊ฐ„๋Œ€๋ณ„ ์ด์šฉ ํŒจํ„ด ๋ถ„์„.
์žฅ์†Œ์  ํŠน์ง•: ์ง€์—ญ๋ณ„ ์ธ๊ธฐ ๋Œ€์—ฌ์†Œ ๋ฐ ํ‰๊ท  ์ด์šฉ ์‹œ๊ฐ„ ๋ถ„์„.
์‹œ๊ฐ„๊ณผ ์žฅ์†Œ์˜ ๊ฒฐํ•ฉ: ํŠน์ • ์‹œ๊ฐ„๋Œ€์™€ ์žฅ์†Œ์˜ ์ด์šฉ ํŒจํ„ด ๋ถ„์„.

2๏ธโƒฃ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ

  • ์„œ์šธ์‹œ ๊ณต๊ณต์ž์ „๊ฑฐ ๋Œ€์—ฌ ์ด๋ ฅ ๋ฐ์ดํ„ฐ
  • ๊ฐ ๋Œ€์—ฌ์†Œ์˜ ์œ„๋„์™€ ๊ฒฝ๋„ ์ •๋ณด

๋ฐ์ดํ„ฐ ์ฝ๊ธฐ

import pandas as pd

# CSV ํŒŒ์ผ ์ฝ๊ธฐ
data_files = [f"bike_rent_{i}.csv" for i in range(1, 7)]
dataframes = [pd.read_csv(file) for file in data_files]

# ๋ฐ์ดํ„ฐ ํ•ฉ์น˜๊ธฐ
bikes = pd.concat(dataframes, ignore_index=True)

๋ฐ์ดํ„ฐ ํ™•์ธ

# ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ •๋ณด ํ™•์ธ
print(bikes.info())
# ๋ฐ์ดํ„ฐ ๋ฏธ๋ฆฌ๋ณด๊ธฐ
print(bikes.head())

3๏ธโƒฃ ๋ฐ์ดํ„ฐ ๊ฐ€๊ณต

ํ•„์š”ํ•œ ์ปฌ๋Ÿผ ์ถ”๊ฐ€

  • ๋‚ ์งœ, ์‹œ๊ฐ„, ์š”์ผ, ์ฃผ๋ง/ํ‰์ผ ๊ตฌ๋ถ„ ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ถ„์„์— ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
# '๋Œ€์—ฌ์ผ์‹œ'๋ฅผ datetime ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜
bikes['๋Œ€์—ฌ์ผ์‹œ'] = pd.to_datetime(bikes['๋Œ€์—ฌ์ผ์‹œ'])

# ์ผ์ž, ์š”์ผ, ์‹œ๊ฐ„๋Œ€ ์ถ”๊ฐ€
bikes['์ผ์ž'] = bikes['๋Œ€์—ฌ์ผ์‹œ'].dt.date
bikes['์š”์ผ'] = bikes['๋Œ€์—ฌ์ผ์‹œ'].dt.day_name()
bikes['์‹œ๊ฐ„๋Œ€'] = bikes['๋Œ€์—ฌ์ผ์‹œ'].dt.hour
bikes['์ฃผ๋ง๊ตฌ๋ถ„'] = bikes['์š”์ผ'].apply(lambda x: '์ฃผ๋ง' if x in ['Saturday', 'Sunday'] else 'ํ‰์ผ')

4๏ธโƒฃ ๋ฐ์ดํ„ฐ ๋ถ„์„

์š”์ผ๋ณ„ ์ด์šฉ ๊ฑด์ˆ˜

# ์š”์ผ๋ณ„ ๋Œ€์—ฌ ๊ฑด์ˆ˜
weekday_counts = bikes['์š”์ผ'].value_counts()
print(weekday_counts)

์‹œ๊ฐ„๋Œ€๋ณ„ ์ด์šฉ ํŒจํ„ด

# ์‹œ๊ฐ„๋Œ€๋ณ„ ๋Œ€์—ฌ ๊ฑด์ˆ˜
hourly_counts = bikes['์‹œ๊ฐ„๋Œ€'].value_counts().sort_index()
print(hourly_counts)

5๏ธโƒฃ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”

Seaborn์„ ํ™œ์šฉํ•œ ์‹œ๊ฐํ™”

1. ์š”์ผ๋ณ„ ๋Œ€์—ฌ ๊ฑด์ˆ˜

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
sns.countplot(data=bikes, x='์š”์ผ', order=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
plt.title('์š”์ผ๋ณ„ ๋Œ€์—ฌ ๊ฑด์ˆ˜')
plt.show()

2. ์‹œ๊ฐ„๋Œ€๋ณ„ ๋Œ€์—ฌ ํŒจํ„ด

plt.figure(figsize=(10, 6))
sns.barplot(x=hourly_counts.index, y=hourly_counts.values)
plt.title('์‹œ๊ฐ„๋Œ€๋ณ„ ๋Œ€์—ฌ ๊ฑด์ˆ˜')
plt.xlabel('์‹œ๊ฐ„๋Œ€')
plt.ylabel('๊ฑด์ˆ˜')
plt.show()

3. ์ง€๋„ ์‹œ๊ฐํ™”

import folium

# ์ง€๋„ ์ƒ์„ฑ
seoul_map = folium.Map(location=[37.5665, 126.9780], zoom_start=11)

# ๋Œ€์—ฌ์†Œ ์œ„์น˜ ํ‘œ์‹œ
for idx, row in bikes.iterrows():
    folium.CircleMarker(
        [row['์œ„๋„'], row['๊ฒฝ๋„']],
        radius=5,
        color='blue',
        fill=True
    ).add_to(seoul_map)

# ์ง€๋„ ์ถœ๋ ฅ
seoul_map.save("seoul_bike_map.html")

Challenges & Solutions

1. ๋ฐ์ดํ„ฐ ๋ถ„์„์ด ์‰ฌ์›Œ์ง€๋Š” ํ•ต์‹ฌ ๋ฉ”์„œ๋“œ

Pandas (๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ ๋ถ„์„)

1. ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ/์“ฐ๊ธฐ

  • read_csv(), to_csv():
import pandas as pd
df = pd.read_csv('data.csv')  # ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ
df.to_csv('output.csv', index=False)  # ๋ฐ์ดํ„ฐ ์ €์žฅ

2. ๋ฐ์ดํ„ฐ ํƒ์ƒ‰

  • head(), info(), describe():
print(df.head())      # ๋ฐ์ดํ„ฐ์˜ ์ƒ์œ„ 5ํ–‰ ํ™•์ธ
print(df.info())      # ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ์™€ ํƒ€์ž… ํ™•์ธ
print(df.describe())  # ์ˆ˜์น˜ํ˜• ๋ฐ์ดํ„ฐ ์š”์•ฝ ํ†ต๊ณ„

3. ๋ฐ์ดํ„ฐ ์„ ํƒ

  • iloc[], loc[]:
print(df.iloc[0])  # ์ฒซ ๋ฒˆ์งธ ํ–‰ ์„ ํƒ
print(df.loc[df['์ปฌ๋Ÿผ๋ช…'] > 10])  # ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ํ–‰ ์„ ํƒ

4. ๋ฐ์ดํ„ฐ ์ •๋ ฌ

  • sort_values(), sort_index():
df_sorted = df.sort_values(by='์ปฌ๋Ÿผ๋ช…', ascending=False)  # ํŠน์ • ์ปฌ๋Ÿผ ๊ธฐ์ค€ ์ •๋ ฌ

5. ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ

  • isnull(), fillna():
print(df.isnull().sum())  # ๊ฒฐ์ธก์น˜ ๊ฐœ์ˆ˜ ํ™•์ธ
df['์ปฌ๋Ÿผ๋ช…'] = df['์ปฌ๋Ÿผ๋ช…'].fillna(0)  # ๊ฒฐ์ธก์น˜๋ฅผ 0์œผ๋กœ ์ฑ„์›€

6. ๊ทธ๋ฃนํ™”

  • groupby():
grouped = df.groupby('์ปฌ๋Ÿผ๋ช…').mean()  # ํŠน์ • ์ปฌ๋Ÿผ ๊ธฐ์ค€ ํ‰๊ท  ๊ณ„์‚ฐ
print(grouped)

Matplotlib & Seaborn (๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”)

1. ๊ธฐ๋ณธ ํ”Œ๋กฏ ๊ทธ๋ฆฌ๊ธฐ

  • plot() (Matplotlib):
import matplotlib.pyplot as plt
df['์ปฌ๋Ÿผ๋ช…'].plot(kind='line')
plt.show()

2. ์‹œ๊ฐํ™” ๊ฐ•ํ™”

Seaborn:
python
Copy code
import seaborn as sns
sns.barplot(x='์š”์ผ', y='์ด์šฉ๊ฑด์ˆ˜', data=df)
plt.show()

Results

What I Learned & Insights

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด, ํŒŒ์ด์ฌ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ ์ •์ œ์™€ ๋ถ„์„์ด ์˜์™ธ๋กœ ์‰ฌ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๊ณ  ๊ฐ€๊ณตํ•˜๋Š” ๊ฒƒ์ด ์˜คํžˆ๋ ค ๋‚œ์ด๋„๊ฐ€ ๋†’์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํŒŒ์ด์ฌ ๋ฐ์ดํ„ฐ ๋ถ„์„์€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ํ™œ์šฉํ•˜๋Š๋ƒ๊ฐ€ ๊ด€๊ฑด์ž…๋‹ˆ๋‹ค. ๋ฉ”์„œ๋“œ๋งŒ ์ž˜ ์ดํ•ดํ•˜๊ณ  ์ ์šฉํ•˜๋ฉด, ๋ณต์žกํ•œ ๋ถ„์„๋„ ๊ฐ„๋‹จํ•œ ์ฝ”๋“œ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ, ๋ฐ์ดํ„ฐ์— ๋”ฐ๋ผ ๋ฌธ์ œ๋ฅผ ์ •์˜ํ•˜๊ณ  ์ ํ•ฉํ•œ ๋ฉ”์„œ๋“œ๋ฅผ ์„ ํƒํ•˜๋Š” ๋Šฅ๋ ฅ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

โœจ Conclusion ๊ฒฐ๋ก 

์„œ์šธ์‹œ ๋”ฐ๋ฆ‰์ด ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ํ†ตํ•ด ์‹œ๊ฐ„๊ณผ ์žฅ์†Œ์— ๋”ฐ๋ฅธ ์ด์šฉ ํŒจํ„ด์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋„์‹œ ๊ตํ†ต ๊ฐœ์„  ๋ฐ ์ž์ „๊ฑฐ ์ธํ”„๋ผ ํ™•์ถฉ์— ์ค‘์š”ํ•œ ์ฐธ๊ณ  ์ž๋ฃŒ๋กœ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

profile
New life & History

0๊ฐœ์˜ ๋Œ“๊ธ€

๊ด€๋ จ ์ฑ„์šฉ ์ •๋ณด