Amazon S3 ๋ฒํท์ Access Logs๋ฅผ ํ์ฑํ ํ๊ณ , python์ Pandas๋ฅผ ์ด์ฉํ์ฌ ์ง๊ณ ๋ฐ์ดํฐ๋ฅผ ๋ถ์ํด๋ณธ๋ค.
https://aws.amazon.com/ko/blogs/storage/monitor-amazon-s3-activity-using-s3-server-access-logs-and-pandas-in-python/
๋ณด์ ํ๊ณ ์๋ ๋ฒํท์ request ์์ฒญ์ ๋ํด์ ๋ก๊น
์ ์ ๊ณตํ๋ ๊ธฐ๋ฅ์ผ๋ก ๋ณด์ ๊ฐ์ฌ ํ๋
๋ฑ์์ ํ์ฉ๋๊ฐ ๋์๊ฒ์ด๋ผ ์๊ฐ๋๋ค.
๋ค๋ง ์์๊ฐ์ด ์๋น์ค ๋ฐ์ดํฐ๊ฐ ๋ณด๊ด๋ ๋ฒํท๊ณผ ๋ก๊น ๋๋ ๋ฒํท์ ๊ตฌ๋ถํ์ง ์์ผ๋ฉด ์ผ๋ช Looping ๊ตฌ์กฐ๊ฐ ๋ฐ์ํ ์ ์๊ธฐ์ ์ฃผ์ํด์ผ ํ ๊ฒ์ด๋ค.
์ง๋์๊ฐ ์คํฐ๋์์ ๋ฐฐ์ด ๋ด์ญ์ผ๋ก ๋ค์๊ณผ ๊ฐ์ ๋ฐฐ๊ฒฝ์ ์ค๋นํ๋ค.
- ahss-mybucket : ํผ๋ธ๋ฆญ ์ ๊ทผ์ด ๊ฐ๋ฅํ ๋ฒํท๊ณผ, ๋๋คํ ํ์ผ ์ฌ๋ฌ๊ฐ
- ahss-s3-access-logging : ahss-mybucket์ Access Log๋ฅผ ์ ์ฅํ ์ ์๋ ๋ฒํท
- ๊ทธ ์ธ ๋ค์๊ณผ ๊ฐ์ flow๋ฅผ ์ค๋น
- python์ ์ด์ฉํ์ฌ randomํ html ํ์ผ์ ์์ฑํ๊ณ , ์ด๋ฅผ boto3๋ฅผ ์ด์ฉํ์ฌ ahss-mybucket์ ์ ๋ก๋
- python์ ์ด์ฉํ์ฌ ๋ด PC์์ randomํ User-Agent ๋ฐ ๋ค์ ํ์ผ์ ์ ๊ทผ ์๋
- boto3 ๋ฐ pandas๋ฅผ ์ด์ฉํ์ฌ ์ง๊ณ ๋ฐ์ดํฐ ์์ฑ
- Amazon S3 ์๋ช ์ฃผ๊ธฐ ๊ด๋ฆฌ ๋ชจ๋ํฐ๋ง
๋ฒํท์ ๋ค์๊ณผ ๊ฐ์ด ํผ๋ธ๋ฆญ ์ค์ ๊ณผ S3:GetObject
์ ์ฑ
์ ํ ๋นํ๋ค.
๋ํ
Access Logging์ ์ํํ ๋ฒํท์ ํผ๋ธ๋ฆญ ์ ๊ทผ์ด ๋ถํ์ํ๊ธฐ์ ์ผ๋ฐ์ ์ธ ๋ฒํท ์์ฑ์ ๋ฐ๋ฅธ๋ค.
์๋น์ค ๋ฐ์ดํฐ๊ฐ ์กด์ฌํ๋ ํผ๋ธ๋ฆญ ๋ฒํท์ ์ค์ ์ค Access Logging์ ์ค์ ํ๊ณ , ๋์์ 2-2์์ ์์ฑํ ๋ฒํท์ผ๋ก ์ง์ ํ๋ค.
python์ ์ด์ฉํ์ฌ ๋๋คํ html ํ์ผ์ ์์ฑํ๊ณ , boto3 ๋ชจ๋์ ์ด์ฉํ์ฌ S3์ ์
๋ก๋๋ฅผ ์ํํ๋ค.
์ด๋ ์ฌ์ ์ aws credential์ด ํ์ํ๋ค.
import boto3
import random
import string
import os
bucket_name = "ahss-mybucket"
num_files = 5
s3 = boto3.client('s3')
for i in range(1, num_files + 1):
random_file_name = f"random_{i}.html"
html_content = f"<html><head><title>Random HTML {i}</title></head><body><h1>This is random HTML file {i}</h1></body></html>"
with open(random_file_name, 'w') as f:
f.write(html_content)
s3.upload_file(random_file_name, bucket_name, random_file_name)
print(f"Uploaded {random_file_name} to S3 bucket {bucket_name}")
os.remove(random_file_name)
์คํ ์ ๋ค์๊ณผ ๊ฐ์ ๊ฒฐ๊ณผ๋ฅผ ์ป์ ์ ์๋ค.
Uploaded random_1.html to S3 bucket ahss-mybucket
Uploaded random_2.html to S3 bucket ahss-mybucket
Uploaded random_3.html to S3 bucket ahss-mybucket
Uploaded random_4.html to S3 bucket ahss-mybucket
Uploaded random_5.html to S3 bucket ahss-mybucket
๋ค์๊ณผ ๊ฐ์ด ํ
์คํธ ํ์ผ 1๊ฐ์ ๋ํด URL ์ฃผ์ ํ๋ ํ PC์์ ์ ๊ทผ์ ํ
์คํธ ํด๋ณธ๋ค.
์ ์์ ์ผ๋ก ์ ๊ทผ์ด ๊ฐ๋ฅํจ์ ํ์ธ
python์์ ๋น๋๊ธฐ ๋ฐฉ์์ผ๋ก ๋ฏธ๋ฆฌ ์์ฑํด๋ random ํ์ผ์ User-Agent๋ฅผ ์์๋ก ๋ฐ๊ฟ๊ฐ๋ฉด์ ํผ๋ธ๋ฆญ์ผ๋ก ์ ๊ทผ์ ๋ค์ ์๋ํ๋ค.
import requests
import random
bucket_url = "https://ahss-mybucket.s3.ap-northeast-2.amazonaws.com/"
# ์ฌ์ฉ์ ์์ด์ ํธ(User-Agent) ๋ชฉ๋ก
user_agent_list = [
# Chrome
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36",
# Firefox
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0",
# Safari
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Safari/605.1.15",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Safari/605.1.15",
# Edge
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Edge/100.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Edge/101.0.0.0 Safari/537.36",
]
# 10000๋ฒ์ ์์ฒญ ๋ณด๋ด๊ธฐ
for _ in range(10000):
user_agent = random.choice(user_agent_list)
random_file_url = f"{bucket_url}random_{random.randint(1, 5)}.html"
headers = {"User-Agent": user_agent}
response = requests.get(random_file_url, headers=headers)
if response.status_code == 200:
print(f"Successfully fetched file with User-Agent: {user_agent}")
else:
print(f"Failed to fetch file with User-Agent: {user_agent}, Status code: {response.status_code}")
์คํ ๊ฒฐ๊ณผ
Successfully fetched file with User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36
Successfully fetched file with User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36
Successfully fetched file with User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Edge/101.0.0.0 Safari/537.36
Successfully fetched file with User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36
๋ค์๊ณผ ๊ฐ์ด ahss-s3-access-loggin ๋ฒํท์ ์ ๊ทผ ์ด๋ ฅ ๊ด๋ จ๋ ๋ก๊ทธ๊ฐ ์ ์์ธ๋ค.
https://aws.amazon.com/ko/blogs/storage/monitor-amazon-s3-activity-using-s3-server-access-logs-and-pandas-in-python/
์ด๊ณณ์ ์ฐธ์กฐํ์ฌ ํ์ต ๋ฐ ํ ์คํธ๋ฅผ ์ํํด๋ณธ๋ค.
๊ธฐ๋ณธ์ ์ธ python ์ธ๋ถ ๋ชจ๋ ์ค์น
pip3 install boto3, pandas
boto3 ๋ผ์ด๋ธ๋ฌ๋ฆฌ ํ ์คํธ
import os
import boto3
import pandas as pdbucket = 'demo-access-logs-bucket'
s3_client = boto3.client('s3')
print (s3_client)<botocore.client.S3 object at 0x7efecf37cfd0>
์ ์์ ์ผ๋ก ๋ผ์ด๋ธ๋ฌ๋ฆฌ ์ฌ์ฉ์ด ๊ฐ๋ฅํจ์ ํ์ธํ์์ผ๋ ์กฐ๊ธ ๋ ํ์ฅํด๋ณด์
import boto3
import pandas as pd
import matplotlib.pyplot as plt
class S3AccessLogAnalyzer:
def __init__(self, bucket_name):
self.bucket_name = bucket_name
self.s3_client = boto3.client('s3')
self.log_objects = []
def list_log_objects(self):
paginator = self.s3_client.get_paginator('list_objects_v2')
result = paginator.paginate(Bucket=self.bucket_name, Prefix='')
for each in result:
key_list = each['Contents']
for key in key_list:
self.log_objects.append(key['Key'])
def load_log_data(self):
log_data = []
for log_key in self.log_objects:
log_data.append(pd.read_csv(f's3://{self.bucket_name}/{log_key}', sep=" ", names=['Bucket_Owner', 'Bucket', 'Time', 'Time_Offset', 'Remote_IP', 'Requester_ARN/Canonical_ID',
'Request_ID', 'Operation', 'Key', 'Request_URI', 'HTTP_status', 'Error_Code', 'Bytes_Sent', 'Object_Size',
'Total_Time', 'Turn_Around_Time', 'Referrer', 'User_Agent', 'Version_Id', 'Host_Id', 'Signature_Version',
'Cipher_Suite', 'Authentication_Type', 'Host_Header', 'TLS_version'],
usecols=range(25)))
return pd.concat(log_data)
def analyze_and_visualize(self):
df = self.load_log_data()
# Pie ๊ทธ๋ํ: ๊ฐ์ฅ ๋ง์ด ์์ฒญ๋ ํ์ผ 5๊ฐ
top_five_objects = df[(df['Operation'] == 'REST.GET.OBJECT')]['Key'].value_counts().nlargest(5)
top_five_objects.plot.pie(label='')
plt.savefig("/home/ec2-user/my_graph.png")
plt.show()
# Bar ๊ทธ๋ํ: ์๋ต ์ํ ์ฝ๋๋ณ ์์ฒญ ์
response_codes = df['HTTP_status'].value_counts()
response_codes.plot.bar()
plt.savefig("/home/ec2-user/my_graph2.png")
# ์ ๊ทผ IP ๋ชฉ๋ก
access_ip_list = df[(df['Operation'] == 'REST.GET.OBJECT')]['Remote_IP'].value_counts()
print("=== access ip list ===")
print(access_ip_list)
# ๊ฑฐ๋ถ๋ IP ๋ชฉ๋ก
deny_ip_list = df[(df['HTTP_status'] == 403)]['Remote_IP'].value_counts()
print("=== deny ip list ===")
print(deny_ip_list)
# ๊ฑฐ๋ถ๋ ํ์ผ ๋ชฉ๋ก
deny_key_list = df[(df['HTTP_status'] == 403)]['Key'].value_counts()
print("=== deny key list ===")
print(deny_key_list)
if __name__ == "__main__":
bucket_name = 'ahss-s3-access-logging'
analyzer = S3AccessLogAnalyzer(bucket_name)
# S3 ๋ก๊ทธ ๊ฐ์ฒด ๋ชฉ๋ก ๊ฐ์ ธ์ค๊ธฐ
analyzer.list_log_objects()
# ๋ฐ์ดํฐ ๋ถ์ ๋ฐ ์๊ฐํ
analyzer.analyze_and_visualize()
Jupiter๊ฐ ์๋, terminal์ ์ด์ฉํ์ฌ ssh ์ ๊ทผ ํ ํ ์คํธ ํ์๊ธฐ์ ๊ทธ๋ํ ์ถ๋ ฅ์ ๋ฌธ์ ๊ฐ ์์ด์ ํ์ผ๋ก ์ ์ฅํ๊ณ , ๋ก์ปฌ PC์์ ๋ค์ด๋ก๋ํ์ฌ ํ์ธํ๋ค
๊ฐ์ฅ ์์ฃผ ์ก์ธ์ค๋๋ ๊ฐ์ฒด ํ์ธ
โฏ scp -i test.pem ec2-user@$EC2_Public_IP:/home/ec2-user/my_graph.png ./my_graph.png
๋ฒํท์ ์๋ต ์ฝ๋ ๋ณด๊ธฐ
โฏ scp -i test.pem ec2-user@$EC2_Public_IP:/home/ec2-user/my_graph2.png ./my_graph2.png
ํน์ด my_graph2.png๋ ์๋ฌด๋๋ ๊ธฐ์กด plt๋ฅผ ์ด๊ธฐํํ์ง ์์์ ๋ฐ์ํ ์ค๋ฅ๋ก ๋ณด์ด๊ณ ์ถํ ๊ฐ์ ํ ์ ์๋๋ก ํ๋ค.
๋ํ ๋ค์๊ณผ ๊ฐ์ ๊ฒฐ๊ณผ๋ ๋์ถ ๊ฐ๋ฅํ๋ค
ํ
์คํธ๋ฅผ ์๋ํ ๊ณต์ธ IP๊ฐ ์ ํ์ ์ธ ์ํฉ์์ ๊ณ ๋ คํ๋ค.
=== access ip list ===
Remote_IP
%client_public_ip% 200
Name: count, dtype: int64
์์๊ฐ์ดhttp_status_code๊ฐ 200์ธ ์ ์ ์ ๊ทผ์ ์๋ํ IP
๋ชฉ๋ก ํ์ธ์ด ๊ฐ๋ฅ
=== deny ip list ===
Remote_IP
%client_public_ip% 40
Name: count, dtype: int64
์์๊ฐ์ดhttp_status_code๊ฐ 403์ธ ๋น์ ์ ์ ๊ทผ์ ์๋ํ IP
๋ชฉ๋ก ํ์ธ์ด ๊ฐ๋ฅ
=== deny key list ===
Key
favicon.ico 38
random_1.html 2
Name: count, dtype: int64
์์๊ฐ์ดhttp_status_code๊ฐ 403์ธ ๋น์ ์ ์ ๊ทผ์ ์๋ํ uri
๋ชฉ๋ก ํ์ธ์ด ๊ฐ๋ฅ
AWS Workshop์ ๋ณด๊ณ ๋ฐ๋ผํด๋ณด์์ผ๋, ๊ธฐํ ๋ง๋ฃ๋ ํ์ผ ์ค๋น ํ ํ ์คํธ๊ฐ ํ์ํ์ฌ ์ค๋น์ค