This dataset consists of all Netflix original films released as of June 1st, 2021. Additionally, it also includes all Netflix documentaries and specials. The data was webscraped off of this Wikipedia page, which was then integrated with a dataset consisting of all of their corresponding IMDB scores. IMDB scores are voted on by community members, and the majority of the films have 1,000+ reviews.
Included in the dataset is:
Title of the film
Genre of the film
Original premiere date
Runtime in minutes
IMDB scores (as of 06/01/21)
Languages currently available (as of 06/01/21)
import re
df["Language_en"]= df["Language"].apply(lambda x:"English" if re.search("English",x) else"Others")
from scipy import stats
test_names = ["IMDB Score"]
english_documentary_scores = df[(df['Language_en'] == 'English')& (df["Genre"]=="Documentary")][['IMDB Score']]
english_others_scores = df[(df['Language_en'] == 'English')& (df["Genre"]!="Documentary")][['IMDB Score']]
for test_name in test_names:
t_statistics, p_value_levene = stats.levene(english_documentary_scores[test_name], english_others_scores[test_name])
if p_value_levene > 0.05:
print(f"{test_name} t_statistics: {round(t_statistics,3)}, p-value: {round(p_value_levene,3)}, 등분산 가정 만족")
else:
print(f"{test_name} t_statistics: {round(t_statistics,3)}, p-value: {round(p_value_levene,3)}, 이분산 가정 만족")
IMDB Score t_statistics: 3.411, p-value: 0.065, 등분산 가정 만족
from scipy import stats
test_names = ["IMDB Score"]
english_documentary_scores = df[(df['Language'] == 'English')& (df["Genre"]=="Documentary")][['IMDB Score']]
english_others_scores = df[(df['Language'] == 'English')& (df["Genre"]!="Documentary")][['IMDB Score']]
for test_name in test_names:
t_statistics, p_value_levene = stats.levene(english_documentary_scores[test_name], english_others_scores[test_name])
if p_value_levene > 0.05:
print(f"{test_name} t_statistics: {round(t_statistics,3)}, p-value: {round(p_value_levene,3)}, 등분산 가정 만족")
else:
print(f"{test_name} t_statistics: {round(t_statistics,3)}, p-value: {round(p_value_levene,3)}, 이분산 가정 만족")
IMDB Score t_statistics: 6.04, p-value: 0.014, 이분산 가정 만족
step2의 type1.이 등분산 가정 만족 함으로 type1분류로 t-test 진행
t_statistic, p_value = stats.ttest_ind(
a=only_english_documentary_scores,
b=only_english_others_scores,
alternative="greater",
equal_var=False
)
print(f"p-value: {p_value}")
print(f"귀무 가설 기각: {p_value < 0.05}")
p-value: 1.92128393e-23
귀무 가설 기각: True
NaN