๐Ÿง‘โ€๐Ÿ’ป [Python EDA 3] Matplotlib

๊น€๋ฏธ์—ฐยท2023๋…„ 8์›” 21์ผ
0

[๋‚˜๋งŒ์˜ ๋…ธํŠธ] Python EDA

๋ชฉ๋ก ๋ณด๊ธฐ
3/8
* [์‹คํŒจ ์—†์ด ์™„์ฃผํ•˜๋Š” ํŒŒ์ด์ฌ ๋ฐ์ดํ„ฐ ๋ถ„์„ ์ž…๋ฌธ] ๊ฐ•์˜๋ฅผ ์ƒ๋‹น ๋ถ€๋ถ„ ์ฐธ๊ณ ํ•˜์—ฌ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

1. Matplotlib์ด๋ž€

https://matplotlib.org/

  • ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
    โ€‹

2. Matplotlib ์‚ฌ์šฉ๋ฐฉ๋ฒ•

1) ์„ค์น˜

!pip install matplotlib

2) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

import matplotlib.pyplot as plt

โ€‹

3. Matplotlib ๋ฐ์ดํ„ฐ ์ž…๋ ฅ

plt.plot([2, 3, 4, 5]) # 1๊ฐœ ๋ฆฌ์ŠคํŠธ ์‚ฝ์ž… ์‹œ y ๊ฐ’์œผ๋กœ ์ธ์‹
# ๋ฆฌ์ŠคํŠธ ์™ธ ํŠœํ”Œ, Numpy array ์ž…๋ ฅ ๊ฐ€๋Šฅ
plt.show() # ๊ทธ๋ž˜ํ”„๋ฅผ ํ™”๋ฉด์— ์ถœ๋ ฅ

plt.plot([1, 2, 3, 4], [1, 4, 9, 16]) # ๊ฐ๊ฐ์˜ ๋ฆฌ์ŠคํŠธ๋Š” ๊ฐ๊ฐ x, y ๊ฐ’์œผ๋กœ ์ธ์‹
plt.show()

4. Matplotlib ์ถ• ๋ ˆ์ด๋ธ” ์„ค์ •

  • xlabel() : x์ถ•์— ๋Œ€ํ•œ ๋ ˆ์ด๋ธ” ํ‘œ์‹œ
  • ylabel() : y์ถ•์— ๋Œ€ํ•œ ๋ ˆ์ด๋ธ” ํ‘œ์‹œ
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.xlabel('X-Label')
plt.ylabel('Y-Label')
plt.show()

5. Matplotlib ๋ฒ”๋ก€(Legend) ์„ค์ •

  • ๋ฒ”๋ก€(Legend) : ๊ทธ๋ž˜ํ”„์— ๋ฐ์ดํ„ฐ์˜ ์ข…๋ฅ˜๋ฅผ ํ‘œ์‹œํ•˜๊ธฐ ์œ„ํ•œ ํ…์ŠคํŠธ
  • legend() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ทธ๋ž˜ํ”„์— ๋ฒ”๋ก€ ํ‘œ์‹œ
  • plot() ํ•จ์ˆ˜์— label ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ์‚ฝ์ž…
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], label = 'Square')
plt.xlabel('X-Label')
plt.ylabel('Y-Label')
plt.legend()

plt.show()

6. Matplotlib ์ถ• ๋ฒ”์œ„ ์„ค์ •

  • xlim() : X์ถ•์ด ํ‘œ์‹œ๋˜๋Š” ๋ฒ”์œ„ ์ง€์ • [xmin, xmax]
  • ylim() : Y์ถ•์ด ํ‘œ์‹œ๋˜๋Š” ๋ฒ”์œ„ ์ง€์ • [ymin, ymax]
  • axis() : X, Y์ถ•์ด ํ‘œ์‹œ๋˜๋Š” ๋ฒ”์œ„ ์ง€์ • [xmin, xmax, ymin, ymax]
  • ์ž…๋ ฅ ๊ฐ’์ด ์—†์œผ๋ฉด ๋ฐ์ดํ„ฐ์— ๋งž๊ฒŒ ์ž๋™์œผ๋กœ ๋ฒ”์œ„ ์ง€์ •
plt.plot([1, 2, 3, 4], [3, 6, 9, 12])
plt.xlabel('X-Label')
plt.ylabel('Y-Label')
plt.xlim([0, 5])      
plt.ylim([0, 15])    

plt.show()
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.axis([0, 6, 0, 20])

plt.show()

7. Matplotlib ์„  ์ข…๋ฅ˜ ์„ค์ •

  • plot() ํ•จ์ˆ˜์˜ ํฌ๋งท ๋ฌธ์ž์—ด ์‚ฌ์šฉ

    '-' (Solid), '- -' (Dashed), ' : ' (Dotted), ' -. ' (Dash-dot)

  • plot() ํ•จ์ˆ˜์˜ linestyle ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ์‚ฝ์ž…

    solid, dashed, dotted, dashdot

  • ํŠœํ”Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ ์˜ ์ข…๋ฅ˜ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•

    (0, (1, 1)) [dotted], (0, (5, 5)) [dashed], (0, (3, 5, 1, 5)) [dashdotted]

# plot() ํ•จ์ˆ˜์˜ ํฌ๋งท ๋ฌธ์ž์—ด ์‚ฌ์šฉ
plt.plot([1, 2, 3], [4, 4, 4], '-', color='C0', label='Solid')
plt.plot([1, 2, 3], [3, 3, 3], '--', color='C0', label='Dashed')

# plot() ํ•จ์ˆ˜์˜ linestyle ๊ฐ’์œผ๋กœ ์‚ฝ์ž…
plt.plot([1, 2, 3], [2, 2, 2], linestyle='dotted', color='C0', label='Dotted')
plt.plot([1, 2, 3], [1, 1, 1], linestyle='dashdot', color='C0', label='Dash-dot')

plt.xlabel('X-Label')
plt.ylabel('Y-Label')
plt.axis([0.8, 3.2, 0.5, 5.0])
plt.legend(loc='upper right', ncol=4)

plt.show()

# ํŠœํ”Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ ์˜ ์ข…๋ฅ˜ ์ปค์Šคํ„ฐ๋งˆ์ด์ฆˆ
plt.plot([1, 2, 3], [4, 4, 4], linestyle=(0, (1, 1)), color='C0', label='(0, (1, 1))')
plt.plot([1, 2, 3], [3, 3, 3], linestyle=(0, (1, 5)), color='C0', label='(0, (1, 5))')
plt.plot([1, 2, 3], [2, 2, 2], linestyle=(0, (5, 1)), color='C0', label='(0, (5, 1))')
plt.plot([1, 2, 3], [1, 1, 1], linestyle=(0, (3, 5, 1, 5)), color='C0', label='(0, (3, 5, 1, 5))')

plt.xlabel('X-Label')
plt.ylabel('Y-Label')
plt.axis([0.8, 3.2, 0.5, 5.0])
plt.legend(loc='upper right', ncol=2)

plt.show()

8. Matplotlib ๋งˆ์ปค ์„ค์ •

  • ๊ธฐ๋ณธ์ ์œผ๋กœ๋Š” ์‹ค์„  ๋งˆ์ปค
  • plot() ํ•จ์ˆ˜์˜ ํฌ๋งท ๋ฌธ์ž์—ด (Format string)์„ ์‚ฌ์šฉํ•ด์„œ ๋งˆ์ปค ์ง€์ •
    • 'roโ€™๋Š” ๋นจ๊ฐ„์ƒ‰ (โ€˜redโ€™)์˜ ์›ํ˜• (โ€˜circleโ€™) ๋งˆ์ปค๋ฅผ ์˜๋ฏธ
    • 'k^โ€™๋Š” ๊ฒ€์ •์ƒ‰ (โ€˜blackโ€™)์˜ ์‚ผ๊ฐํ˜• (โ€˜triangleโ€™) ๋งˆ์ปค๋ฅผ ์˜๋ฏธ
  • plot() ํ•จ์ˆ˜์˜ marker ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ์‚ฝ์ž…
    • 's'(square), 'D'(diamond), '$ ๋ฌธ์ž $'(๋ฌธ์ž ๋งˆ์ปค)
plt.plot([4, 5, 6], "b")
plt.plot([3, 4, 5], "ro")
plt.plot([2, 3, 4], marker="s")
plt.plot([1, 2, 3], marker="D")
plt.plot([0, 1, 2], marker='$A$') # A ๋ชจ์–‘ ๋งˆ์ปค
plt.show()

9. Matplotlib ์ƒ‰์ƒ ์„ค์ •

  • plot() ํ•จ์ˆ˜์˜ ํฌ๋งท ๋ฌธ์ž์—ด (Format string)์„ ์‚ฌ์šฉํ•ด์„œ ์ƒ‰์ƒ ์ง€์ •
  • plot() ํ•จ์ˆ˜์˜ color ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ์‚ฝ์ž…
  • ๋‹ค์–‘ํ•œ ์ƒ‰์ƒ ๋งํฌ ์ฐธ๊ณ 
plt.plot([1, 2, 3, 4], [2.0, 3.0, 5.0, 10.0], 'r')
plt.plot([1, 2, 3, 4], [2.0, 2.8, 4.3, 6.5], color = 'violet')
plt.plot([1, 2, 3, 4], [2.0, 2.5, 3.3, 4.5], color = 'dodgerblue')

plt.xlabel('X-Label')
plt.ylabel('Y-Label')

plt.show()

10. Matplotlib ํƒ€์ดํ‹€ ์„ค์ •

  • title() ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ํƒ€์ดํ‹€ ์„ค์ •
  • title() ํ•จ์ˆ˜์˜ loc ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ์œ„์น˜ ์„ค์ •

    loc ํŒŒ๋ผ๋ฏธํ„ฐ : {โ€˜leftโ€™, โ€˜centerโ€™, โ€˜rightโ€™}

  • title() ํ•จ์ˆ˜์˜ pad ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ํƒ€์ดํ‹€๊ณผ ๊ทธ๋ž˜ํ”„์™€์˜ ๊ฐ„๊ฒฉ(ํฌ์ธํŠธ ๋‹จ์œ„) ์„ค์ •
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.xlabel('X-Label')
plt.ylabel('Y-Label')
plt.title('Graph Title', loc='center', pad=20)

plt.show()

11. Matplotlib ๋ˆˆ๊ธˆ ํ‘œ์‹œ

  • xticks() : X์ถ• ๋ˆˆ๊ธˆ ์„ค์ •
  • yticks() : Y์ถ• ๋ˆˆ๊ธˆ ์„ค์ •
  • xticks(), yticks() ํ•จ์ˆ˜์˜ label ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ๋ˆˆ๊ธˆ ๋ ˆ์ด๋ธ” ์„ค์ •

12. Matplotlib ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„

  • bar() ํ•จ์ˆ˜ ์ด์šฉํ•˜์—ฌ ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™”
  • bar() ํ•จ์ˆ˜์˜ color ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ์ƒ‰์ƒ ์„ค์ •
  • bar() ํ•จ์ˆ˜์˜ width ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ๋ง‰๋Œ€ ํญ ์„ค์ •
# years๋Š” X์ถ•์— ํ‘œ์‹œ๋  ์—ฐ๋„, values๋Š” ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„์˜ y ๊ฐ’ 
# xticks(x, years) : x์ถ•์˜ ๋ˆˆ๊ธˆ ๋ ˆ์ด๋ธ”์— '2022', '2023', '2024' ์ˆœ์„œ๋Œ€๋กœ ์„ค์ •
# color์™€ width๋กœ ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •

x = [1, 2, 3]
years = ['2022', '2023', '2024']
values = [300, 100, 700]

plt.bar(x, values, color=['r', 'g', 'b'], width=0.4)
#plt.bar(x, values, color=['r', 'g', 'b'], width=0.8)

plt.xticks(x, years)
plt.show()

13. Matplotlib ์‚ฐ์ ๋„

  • scatter() ํ•จ์ˆ˜ ์ด์šฉํ•˜์—ฌ ์‚ฐ์ ๋„ ์‹œ๊ฐํ™”
  • scatter() ํ•จ์ˆ˜์˜ color ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ๋งˆ์ปค์˜ ์ƒ‰์ƒ ์„ค์ •
  • scatter() ํ•จ์ˆ˜์˜ size ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์œผ๋กœ ๋งˆ์ปค์˜ ํฌ๊ธฐ ์„ค์ •
# numpy์˜ random ๋ชจ๋“ˆ์˜ rand ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ˆซ์ž ๋žœ๋คํ•˜๊ฒŒ ์ƒ์„ฑ
# color์™€ size๋กœ ์‚ฐ์ ๋„ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •
import numpy as np

np.random.seed(0)

n = 50
x = np.random.rand(n)
y = np.random.rand(n)
size = (np.random.rand(n) * 20)**2
colors = np.random.rand(n)

plt.scatter(x, y, s=size, c=colors)
plt.show()

14. Matplotlib ๋‹ค์–‘ํ•œ ๊ทธ๋ž˜ํ”„ ์ข…๋ฅ˜

  • matplotlib.pyplot.bar( ) : ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„
  • matplotlib.pyplot.barh( ) : ์ˆ˜ํ‰ ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„
  • matplotlib.pyplot.scatter( ) : ์‚ฐ์ ๋„
  • matplotlib.pyplot.hist( ) : ํžˆ์Šคํ† ๊ทธ๋žจ
  • matplotlib.pyplot.errorbar( ) : ์—๋Ÿฌ๋ฐ”
  • matplotlib.pyplot.pie( ) : ํŒŒ์ด ์ฐจํŠธ
  • matplotlib.pyplot.matshow( ) : ํžˆํŠธ๋งต

15. Matplotlib subplot ์ด์šฉํ•œ ์—ฌ๋Ÿฌ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™”

  • subplot() ํ•จ์ˆ˜๋Š” ์˜์—ญ์„ ๋‚˜๋ˆ  ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™”
  • plt.subplot(row, column, index)
  • tight_layout() ํ•จ์ˆ˜๋Š” ๋ชจ์„œ๋ฆฌ์™€ ์„œ๋ธŒํ”Œ๋กฏ์˜ ๋ชจ์„œ๋ฆฌ ์‚ฌ์ด์˜ ์—ฌ๋ฐฑ(padding)์„ ์„ค์ •
# subplot 
# nrows=2, ncols=1, index=1
plt.subplot(2, 1, 1)
plt.plot(x1, y1, 'o-')
plt.title('1st Graph')

# subplot 
# nrows=2, ncols=1, index=2
plt.subplot(2, 1, 2)
plt.plot(x2, y2, '.-')
plt.title('2nd Graph')

plt.tight_layout()
plt.show()

16. Matplotlib subplots ์ด์šฉํ•œ ์—ฌ๋Ÿฌ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™”

  • plt.subplots() ํ•จ์ˆ˜๋Š” ์—ฌ๋Ÿฌ ๊ฐœ ๊ทธ๋ž˜ํ”„๋ฅผ ํ•œ ๋ฒˆ์— ์‹œ๊ฐํ™” ๊ฐ€๋Šฅ

  • plt.subplots() ํ•จ์ˆ˜์˜ ๋””ํดํŠธ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” 1์ด๋ฉฐ ์ฆ‰ plt.subplots(nrows=1, ncols=1) ์˜๋ฏธ

  • plt.subplots() ํ•จ์ˆ˜๋Š” figure์™€ axes ๊ฐ’์„ ๋ฐ˜ํ™˜

    • figure : ์ „์ฒด subplot ์˜๋ฏธ
      - ์„œ๋ธŒํ”Œ๋กฏ ์•ˆ์— ๋ช‡ ๊ฐœ์˜ ๊ทธ๋ž˜ํ”„๊ฐ€ ์žˆ๋˜์ง€ ์ƒ๊ด€์—†์ด ๊ทธ๊ฑธ ๋‹ด๋Š” ์ „์ฒด ์‚ฌ์ด์ฆˆ๋ฅผ ์˜๋ฏธ
    • axe : ์ „์ฒด ์ค‘ ๋‚ฑ๋‚ฑ๊ฐœ ์˜๋ฏธ
      ex) ์„œ๋ธŒํ”Œ๋กฏ ์•ˆ์— 2๊ฐœ(a1,a2)์˜ ๊ทธ๋ž˜ํ”„๊ฐ€ ์žˆ๋‹ค๋ฉด a1, a2 ๋ฅผ ์ผ์ปฌ์Œ
  • .twinx() ํ•จ์ˆ˜๋Š” ax1๊ณผ ์ถ•์„ ๊ณต์œ ํ•˜๋Š” ์ƒˆ๋กœ์šด Axes ๊ฐ์ฒด ์ƒ์„ฑ

fig, ax1 = plt.subplots()

# -s(solid line style + square marker), alpha(ํˆฌ๋ช…๋„)
ax1.plot(x, y1, '-s', color='green', markersize=7, linewidth=5, alpha=0.7)

# .twinx() ํ•จ์ˆ˜๋Š” ax1๊ณผ ์ถ•์„ ๊ณต์œ ํ•˜๋Š” ์ƒˆ๋กœ์šด Axes ๊ฐ์ฒด ์ƒ์„ฑ
ax2 = ax1.twinx()
ax2.bar(x, y2, color='deeppink', alpha=0.7, width=0.7)

#plt.twinx()
#plt.bar(x, y2, color='deeppink', alpha=0.5)

plt.show()

fig, ax = plt.subplots(nrows=2, figsize=(5, 5), constrained_layout=True)

sns.countplot(data=df_ex2, y='Sex', hue='Survived', palette='Set1', ax=ax[0])
ax[0].legend(labels=['dead', 'survivors'])
ax[0].set(xlabel='', ylabel='', title='Number of dead & survivors')

sns.barplot(data=df_ex2, x='Pclass', y='Survived', hue='Sex', palette='Set1', ax=ax[1], errorbar=None)
ax[1].legend(title='')
ax[1].set(title='Survival rate', ylabel='')

0๊ฐœ์˜ ๋Œ“๊ธ€