Set up the notebook
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print("Setup Complete")
Select a dataset
Dataset from the US Department of Transportation that tracks flight delays. (in minutes)
Load the data
# Path of the file to read
flight_filepath = "../input/flight_delays.csv"
# Read the file into a variable flight_data
flight_data = pd.read_csv(flight_filepath, index_col="Month")
The row labels (from the 'Month' column) don't correspond to dates, we don't add parse_dates=True in the parentheses.
Examine the data
Since the dataset is small, we can easily print all of its contents.
# Print the data
flight_data
Bar chart
Create a bar chart showing the average arrival delay for Spirit airlines (airline code: NK) flights, by month.
# Set the width and height of the figure
plt.figure(figsize=(10,6))
# Add title
plt.title("Average Arrival Delay for Spirit airlines Flights, by Month")
# Bar chart showing average arrival delay for Spirit airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])
# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")
- x축에 표시되는 인덱스명이나 컬럼명이 길어서 서로 겹칠 경우, y축에 두면 좋다.
- You must select the indexing column with flight_data.index, and it is not possible to use flight_data['Month'] (which will return an error). This is because when we loaded the dataset, the "Month" column was used to index the rows.
Heatmap
Create a heatmap to quickly visualize patterns in data. Each cell is color-coded according to its corresponding value.
# Set the width and height of the figure
plt.figure(figsize=(14,7))
# Add title
plt.title("Average Arrival Delay for Each Airline, by Month")
# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_data, annot=True)
# Add label for horizontal axis
plt.xlabel("Airline")