01_Histogram

Kyungtaek Oh·2021년 12월 14일
0

Machine Learning

목록 보기
1/6

DataSet Histogram

Before starting your project, you need to know exactly what you are going to do.

  1. You should read the data and understand it
  2. You should be able to make a histogram with your data, so you can pick better training data.

What is the total amount of data?
and how many of HTTPS and Malicious DoH each?

Read the data

Calculate

Total instances

Total = 1,167,269(100%)
Benign = 917,300(78.59%)
Malicious = 249,969(21.41%)

Benign:

  1. Google Chrome: 545,464(59.46%)
  2. FireFox: 371,836(40.54%)

Malicious:

  1. dns2tcp: 167,517(67.02%)
  2. DNSCat2: 35,854(14.34%)
  3. Iodine: 46,598(18.64%)

How are you going to build your model to train based on the histogram?

For the trainset, it is going to be randomly picked. But , it needs to be proportionally similar to its original one.

Outline

profile
Studying for Data Analysis, Data Engineering & Data Science

0개의 댓글