visualization - Matplotlib, Seaborn

Nam Eun-Ji·2021년 1월 20일

항상 데이터를 보고 시각화하는 것을 배워보고 싶었는데, 마침 노드에서 진행되었다👍

Matplotlib

pip install matplotlib

import

import matplotlib.pyplot as plt
%matplotlib inline  # notebook을 실행한 브라우저에서 바로 그림을 볼 수 있게 해주는 것

IPython 매직명령어 참고

축그리기

.figure() : 하나의 그림(figure)객체 생성
.add_subplot(행,열,위치) : 화면에 표시될 그래프 수

fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)

fig = plt.figure()
ax1 = fig.add_subplot(1,2,1)  # 1*2 행렬에 1번째
ax2 = fig.add_subplot(1,2,2)  # 1*2 행렬에 2번째

fig, axes = plt.subplots(2, 1)
print(fig)   # figure객체가 자동으로 생성된 것을 볼 수 있다.
print(axes)  # 각 그래프에 인덱스로 접근할 수 있다.

막대그래프

.bar(x,y) : x,y순으로 데이터를 넣어준다.

#그래프 데이터 
subject = ['English', 'Math', 'Korean', 'Science', 'Computer']
points = [40, 90, 50, 60, 100]

# 그래프 그리기
ax1.bar(subject, points)

라벨, 타이틀 달기

plt.xlabel('Subject')
plt.ylabel('Points')
plt.title("Tim's Test Result")

화면 출력

plt.savefig('./barplot.png')  # 그래프를 이미지로 출력
plt.show()                    # 그래프를 화면으로 출력

plt.plot() - 막대/선 그래프

plt.plot() : 가장 최근의 figure객체와 그 subplot을 그린다. 만약 subplot이 없으면 subplot 하나를 생성한다.

인자 - x데이터, y데이터, 마커옵션, 색상 등

import os
from datetime import datetime

import pandas as pd
import matplotlib.pyplot as plt

#그래프 데이터 
csv_path = os.getenv("HOME") + "/aiffel/data_represent/data/AMZN.csv"
data = pd.read_csv(csv_path ,index_col=0, parse_dates=True)
price = data['Close']  # pandas의 series

# 축 그리기 및 좌표축 설정
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
price.plot(ax=ax, style='black')  # pandas의 plot에서 matplotlib에서 정의한 subplot 공간 ax를 사용.
plt.ylim([1600,2200])  # y좌표의 범위 설정
plt.xlim(['2019-05-01','2020-03-01'])  # x좌표의 범위 설정

# 주석달기
important_data = [(datetime(2019, 6, 3), "Low Price"),(datetime(2020, 2, 19), "Peak Price")]
for d, label in important_data:
    ax.annotate(label, xy=(d, price.asof(d)+10), 
                xytext=(d,price.asof(d)+100),
                arrowprops=dict(facecolor='red'))

# 그리드, 타이틀 달기
plt.grid()  # 격자눈금 추가
ax.set_title('StockPrice')

# 보여주기
plt.show()

import numpy as np

x = np.linspace(0, 10, 100)  # 배열생성(0부터 10까지 100개간격으로)
plt.plot(x, np.sin(x),'o')
plt.plot(x, np.cos(x),'--', color='black') 
plt.show()

x = np.linspace(0, 10, 100) 

plt.subplot(2,1,1)
plt.plot(x, np.sin(x),'orange','o')

plt.subplot(2,1,2)
plt.plot(x, np.cos(x), 'orange') 
plt.show()

linestyle, marker옵션

x = np.linspace(0, 10, 100) 

plt.plot(x, x + 0, linestyle='solid') 
plt.plot(x, x + 1, linestyle='dashed') 
plt.plot(x, x + 2, linestyle='dashdot') 
plt.plot(x, x + 3, linestyle='dotted')
plt.plot(x, x + 0, '-g') # solid green 
plt.plot(x, x + 1, '--c') # dashed cyan 
plt.plot(x, x + 2, '-.k') # dashdot black 
plt.plot(x, x + 3, ':r'); # dotted red
plt.plot(x, x + 4, linestyle='-') # solid 
plt.plot(x, x + 5, linestyle='--') # dashed 
plt.plot(x, x + 6, linestyle='-.') # dashdot 
plt.plot(x, x + 7, linestyle=':'); # dotted

pandas + 막대그래프 : kind='bar'옵션 추가

fig, axes = plt.subplots(2, 1)
data = pd.Series(np.random.rand(5), index=list('abcde'))
data.plot(kind='bar', ax=axes[0], color='blue', alpha=1)
data.plot(kind='barh', ax=axes[1], color='red', alpha=0.3)

pandas + 선그래프 : kind='line'옵션 추가

df = pd.DataFrame(np.random.rand(6,4), columns=pd.Index(['A','B','C','D']))
df.plot(kind='line')

with Pandas

pandas.plot메서드 인자

메소드	내용
label	그래프의 범례이름
ax	그래프를 그릴 matplotlib의 서브플롯 객체
style	matplotlib에 전달할 'ko--'같은 스타일의 문자열
alpha	투명도 (0 ~1)
kind	그래프의 종류: line, bar, barh, kde
logy	Y축에 대한 로그스케일
use_index	객체의 색인을 눈금 이름으로 사용할지의 여부
rot	눈금 이름을 로테이션(0 ~ 360)
xticks, yticks	x축, y축으로 사용할 값
xlim, ylim	x축, y축 한계
grid	축의 그리드 표시할 지 여부

pandas의 data가 DataFrame일때 plot 메서드 인자

메소드	내용
subplots	각 DataFrame의 칼럼을 독립된 서브플롯에 그린다.
sharex	subplots=True면 같은 X축을 공유하고 눈금과 한계를 연결한다.
sharey	subplots=True면 같은 Y축을 공유한다.
figsize	그래프의 크기, 튜플로 지정
title	그래프의 제목을 문자열로 지정
sort_columns	칼럼을 알파벳 순서로 그린다.

Seaborn

pip install seaborn

Seaborn + pandas

# sex에 따른 tip평균
sns.barplot(data=df, x='sex', y='tip')

Seaborn + Matplot : 그래프에 다양한 옵션 추가

plt.figure(figsize=(10,6))
sns.barplot(data=df, x='sex', y='tip')
plt.ylim(0, 4)
plt.title('Tip by sex')

sns.violinplot()

fig = plt.figure()
ax3 = fig.add_subplot(1,1,1)
sns.violinplot(data=df, x='day', y='tip',palette="ch:.25")

sns.catplot()

sns.catplot(x="day", y="tip", jitter=False, data=tips)

sns.scatterplot() : 산점도

# day에 따른 tip과 total_bill의 관계
sns.scatterplot(data=df , x='total_bill', y='tip', hue='day')

sns.lineplot() : 선그래프

sns.lineplot(x, np.sin(x))
sns.lineplot(x, np.cos(x))

# 연도의 각 월별 탑승객 수
csv_path = os.getenv("HOME") + "/aiffel/data_represent/data/flights.csv"
data = pd.read_csv(csv_path)
flights = pd.DataFrame(data)

sns.barplot(data=flights, x='year', y='passengers')

Nam Eun-Ji

한 줄 소개가 자연스러워지는 그날까지

이전 포스트

python comprehension

다음 포스트