06_Split data by dates

Kyungtaek Oh·2022년 1월 12일
0

Machine Learning

목록 보기
6/6

Split data by dates

모든 데이터를 날짜에 기반으로 나누기
예를 들면 2020/01/14의 모든 데이터들을 보기 위해 자료를 날짜별로 저장한다.

Step 1. Read all data from each category

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

DF_chrome = pd.read_csv('dataset/chrome.csv',header=[0])
DF_firefox = pd.read_csv('dataset/firefox.csv',header=[0])
DF_dns2tcp = pd.read_csv('dataset/dns2tcp.csv',header=[0])
DF_dnscat2 = pd.read_csv('dataset/dnscat2.csv',header=[0])
DF_iodine = pd.read_csv('dataset/iodine.csv',header=[0])

Step 2. Save into one dataframe

DF_all = DF_chrome.append(DF_firefox).append(DF_dns2tcp).append(DF_dnscat2).append(DF_iodine)

Step 3. Split time into 2parts ["Dates"]&["Sec"]

DF_all[["Dates","Sec"]] = DF_all["TimeStamp"].str.split(" ",expand=True)

Step 4. Count how many duplicated dates are exist

from collections import Counter
counts = dict(Counter(DF_all['Dates']))
duplicates_dates = {key:value for key, value in counts.items()}
print(duplicates_dates)

Output

Step 5. Save same dates of data into individual Data Frame

df_2020_01_14 = DF_all.loc[DF_all['Dates'] == '2020-01-14']
df_2020_01_13 = DF_all.loc[DF_all['Dates'] == '2020-01-13']
df_2020_01_12 = DF_all.loc[DF_all['Dates'] == '2020-01-12']
df_2019_12_10 = DF_all.loc[DF_all['Dates'] == '2019-12-10']
df_2019_12_11 = DF_all.loc[DF_all['Dates'] == '2019-12-11']
df_2019_12_13 = DF_all.loc[DF_all['Dates'] == '2019-12-13']
df_2019_12_14 = DF_all.loc[DF_all['Dates'] == '2019-12-14']
df_2019_12_15 = DF_all.loc[DF_all['Dates'] == '2019-12-15']
df_2019_12_16 = DF_all.loc[DF_all['Dates'] == '2019-12-16']
df_2019_12_17 = DF_all.loc[DF_all['Dates'] == '2019-12-17']
df_2019_12_09 = DF_all.loc[DF_all['Dates'] == '2019-12-09']
df_2019_12_19 = DF_all.loc[DF_all['Dates'] == '2019-12-19']
df_2019_12_20 = DF_all.loc[DF_all['Dates'] == '2019-12-20']
df_2020_04_01 = DF_all.loc[DF_all['Dates'] == '2020-04-01']
df_2020_03_31 = DF_all.loc[DF_all['Dates'] == '2020-03-31']
df_2020_03_30 = DF_all.loc[DF_all['Dates'] == '2020-03-30']
df_2020_03_25 = DF_all.loc[DF_all['Dates'] == '2020-03-25']
df_2020_03_24 = DF_all.loc[DF_all['Dates'] == '2020-03-24']
df_2020_03_28 = DF_all.loc[DF_all['Dates'] == '2020-03-28']
df_2020_03_23 = DF_all.loc[DF_all['Dates'] == '2020-03-23']
df_2020_03_29 = DF_all.loc[DF_all['Dates'] == '2020-03-29']
df_2020_03_27 = DF_all.loc[DF_all['Dates'] == '2020-03-27']
df_2020_03_26 = DF_all.loc[DF_all['Dates'] == '2020-03-26']
df_2020_03_20 = DF_all.loc[DF_all['Dates'] == '2020-03-20']
df_2020_03_21 = DF_all.loc[DF_all['Dates'] == '2020-03-21']
df_2020_03_19 = DF_all.loc[DF_all['Dates'] == '2020-03-19']
df_2020_03_22 = DF_all.loc[DF_all['Dates'] == '2020-03-22']
df_2020_03_18 = DF_all.loc[DF_all['Dates'] == '2020-03-18']

Step 6. Check the values

Number of all data from 2020-03-18 are total 6416.

profile
Studying for Data Analysis, Data Engineering & Data Science

0개의 댓글