The target attribute is "DoH".
Set Y for column 'DoH'
Set X for all of columns except for 'DoH'
All 5 categories needs to be set.
Y_chrome = DF_chrome['DoH']
X_chrome = DF_chrome.drop('DoH', axis =1)
Y_firefox = DF_firefox['DoH']
X_firefox = DF_firefox.drop('DoH', axis =1)
Y_dns2tcp = DF_dns2tcp['DoH']
X_dns2tcp = DF_dns2tcp.drop('DoH', axis =1)
Y_dnscat2 = DF_dnscat2['DoH']
X_dnscat2 = DF_dnscat2.drop('DoH', axis =1)
Y_iodine = DF_iodine['DoH']
X_iodine = DF_iodine.drop('DoH', axis =1)
Randomly Select 70% of data set from each category to have a better classifiers and results.
X_chrome_training, X_chrome_testing, Y_chrome_training, Y_chrome_testing = train_test_split(X_chrome, Y_chrome, test_size= 0.3, stratify = Y_chrome, random_state = 1, shuffle = True)
So on...
DF_X_training = X_chrome_training.append(X_firefox_training).append(X_dns2tcp_training).append(X_dnscat2_training).append(X_iodine_training)
DF_X_testing = X_chrome_testing.append(X_firefox_testing).append(X_dns2tcp_testing).append(X_dnscat2_testing).append(X_iodine_testing)
DF_Y_training = Y_chrome_training.append(Y_firefox_training).append(Y_dns2tcp_training).append(Y_dnscat2_training).append(Y_iodine_training)
DF_Y_testing = Y_chrome_testing.append(Y_firefox_testing).append(Y_dns2tcp_testing).append(Y_dnscat2_testing).append(Y_iodine_testing)
print(DF_X_training.shape[0])
print(DF_X_testing.shape[0])
print(DF_Y_training.shape[0])
print(DF_Y_testing.shape[0])