Visualization with Seaborn

노정훈·2023년 8월 8일
0

Matplotlib

목록 보기
12/12
  • Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas.
  • By convention, Seaborn is often imported as sns
# In[1]
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

sns.set() # seaborn's method to set its chart style

Exploring Seaborn Plots

  • The main idea of Seaborn is that it provides high-level commands to create a variety of plot types useful for statistical data exploration, and even some statistical model fitting.
  • All of the following could be done using raw Matplotlib commands, but the Seaborn API is much more convenient.

Histograms, KDE, and Densities

  • Often in statistical data visualization, all you want is to plot histograms and joint distributions of variables.
# In[2]
data=np.random.multivariate_normal([0,0],[[5,2],[2,2]],size=2000)
data=pd.DataFrame(data,columns=['x','y'])

for col in 'xy':
    plt.hist(data[col],density=True,alpha=0.5)

  • Rather than just providing a histogram as a visual output, we can get a smooth estimate of the distribution using kernel density estimation, which Seaborn does with sns.kdeplot.
# In[3]
sns.kdeplot(data=data,shade=True);

  • If we pass x and y columns to kdeplot, we instead get a two-dimensional visualization of the joint density.
# In[4]
sns.kdeplot(data=data,x='x',y='y');

  • We can see the joint distribution and the marginal distribution together using sns.jointplot, which we'll explore further later in this chapter.

Pair Plots

  • When you generalize joint plots to datasets of larger dimensions, you end up with pair plots.
  • These are very useful for exploring correlations between multidimensional data, when you'd like to plot all pairs of values against each other.

  • We'll demo this with the Iris dataset, which lists measurements of petals and sepals of three Iris species.
# In[5]
iris=sns.load_dataset('iris')
iris.head()
# Out[5]
  sepal_length	sepal_width	petal_length	petal_width	species
0	       5.1	        3.5	         1.4	        0.2	 setosa
1	       4.9	        3.0	         1.4	        0.2	 setosa
2	       4.7	        3.2	         1.3	        0.2	 setosa
3	       4.6	        3.1	         1.5	        0.2	 setosa
4	       5.0	        3.6	         1.4	        0.2	 setosa
  • Visualizing the multidimensional relationships among the samples is as easy as calling sns.pairplot.
# In[6]
sns.pairplot(iris,hue='species',height=2.5);

Faceted Histograms

  • Sometimes the best way to view data is via histograms of subsets.
  • Seaborn's FaceGrid makes this simple. We'll take a look at some data that shows the amount that restaurant staff receive in tips based on various indicator data.
# In[7]
tips=sns.load_dataset('tips')
tips.head()
# Out[7]
  total_bill	 tip	   sex	smoker	day	  time	size
0	   16.99	1.01	Female	    No	Sun	Dinner	   2
1	   10.34	1.66	  Male	    No	Sun	Dinner	   3
2	   21.01	3.50	  Male	    No	Sun	Dinner	   3
3	   23.68	3.31	  Male	    No	Sun	Dinner	   2
4	   24.59	3.61	Female	    No	Sun	Dinner	   4

# In[8]
tips['tip_pct']=100 * tips['tip'] / tips['total_bill']

grid=sns.FacetGrid(tips,row='sex',col='time',margin_titles=True)
grid.map(plt.hist,"tip_pct",bins=np.linspace(0,40,15));

  • The faceted chart give us some quick insights into the dataset: for example, we see that it contains far more data on male servers during the dinner hour than other categories, and typical tip amounts appear to range from approximately 10% to 20%, with some outliers on either end.

Categorical Plots

  • Categorical plots can be useful for this kind of visualization as well.
  • These allow you to view the distribution of a parameter within bins defined by any other parameter.
# In[9]
with sns.axes_style(style='ticks'):
    g=sns.catplot(x='day',y='total_bill',hue='sex',
    data=tips,kind='box')
    g.set_axis_labels("Day","Total Bill");

Joint Distributions

  • Similar to the pair plot we saw earlier, we can use sns.jointplot to show the joint distribution between different datasets, along with the associated marginal distributions.
# In[10]
with sns.axes_style('white'):
    sns.jointplot(x='total_bill',y='tip',data=tips,kind='hex')

  • The joint plot can even do some automatic kernel density estimation and regression.
# In[11]
sns.jointplot(x='total_bill',y='tip',data=tips,kind='reg');

Bar Plots

  • Time series can be plotted using sns.factorplot.
  • We'll use the Planets dataset.
# In[12]
planets=sns.load_dataset('planets')
planets.head()
# Out[12]
             method	number	orbital_period	 mass	distance	year
0	Radial Velocity	     1	       269.300	 7.10	   77.40	2006
1	Radial Velocity	     1	       874.774	 2.21	   56.95	2008
2	Radial Velocity	     1	       763.000	 2.60	   19.84	2011
3	Radial Velocity	     1	       326.030	19.40	  110.62	2007
4	Radial Velocity	     1	       516.220	10.50	  119.47	2009

# In[13]
with sns.axes_style('white'):
    g=sns.catplot(x='year',data=planets,aspect=2,
    kind='count',color='steelblue')
    g.set_xticklabels(step=5)

  • We can learn more by looking at the method of discovery of each of these planets.
# In[14]
with sns.axes_style('white'):
    g=sns.catplot(x='year',data=planets,aspect=4.0,kind='count',
    hue='method',order=range(2001,2015))
    g.set_ylabels('Number of Planets Discovered')

For more information on plotting with Seaborn, refer to this url :
Seaborn API

profile
노정훈

1개의 댓글

comment-user-thumbnail
2023년 8월 8일

좋은 글이네요. 공유해주셔서 감사합니다.

답글 달기