Monitor Amazon S3 activity using S3 server access logs and Pandas in Python

์ง€๋šœยท2023๋…„ 9์›” 2์ผ
0

aws

๋ชฉ๋ก ๋ณด๊ธฐ
2/3

ํ•™์Šต ์žฌ๋ฏธ : ๐Ÿ˜€๐Ÿ˜€

Amazon S3 ๋ฒ„ํ‚ท์˜ Access Logs๋ฅผ ํ™œ์„ฑํ™” ํ•˜๊ณ , python์˜ Pandas๋ฅผ ์ด์šฉํ•˜์—ฌ ์ง‘๊ณ„ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•ด๋ณธ๋‹ค.
https://aws.amazon.com/ko/blogs/storage/monitor-amazon-s3-activity-using-s3-server-access-logs-and-pandas-in-python/

1. Amazon S3 Access Logs?


๋ณด์œ ํ•˜๊ณ  ์žˆ๋Š” ๋ฒ„ํ‚ท์˜ request ์š”์ฒญ์— ๋Œ€ํ•ด์„œ ๋กœ๊น…์„ ์ œ๊ณตํ•˜๋Š” ๊ธฐ๋Šฅ์œผ๋กœ ๋ณด์•ˆ ๊ฐ์‚ฌ ํ™œ๋™ ๋“ฑ์—์„œ ํ™œ์šฉ๋„๊ฐ€ ๋†’์„๊ฒƒ์ด๋ผ ์ƒ๊ฐ๋œ๋‹ค.

๋‹ค๋งŒ ์œ„์™€๊ฐ™์ด ์„œ๋น„์Šค ๋ฐ์ดํ„ฐ๊ฐ€ ๋ณด๊ด€๋œ ๋ฒ„ํ‚ท๊ณผ ๋กœ๊น…๋˜๋Š” ๋ฒ„ํ‚ท์„ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š์œผ๋ฉด ์ผ๋ช… Looping ๊ตฌ์กฐ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๊ธฐ์— ์ฃผ์˜ํ•ด์•ผ ํ• ๊ฒƒ์ด๋‹ค.

2. Summary

์ง€๋‚œ์‹œ๊ฐ„ ์Šคํ„ฐ๋””์—์„œ ๋ฐฐ์šด ๋‚ด์—ญ์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฐ๊ฒฝ์„ ์ค€๋น„ํ•œ๋‹ค.

  • ahss-mybucket : ํผ๋ธ”๋ฆญ ์ ‘๊ทผ์ด ๊ฐ€๋Šฅํ•œ ๋ฒ„ํ‚ท๊ณผ, ๋žœ๋คํ•œ ํŒŒ์ผ ์—ฌ๋Ÿฌ๊ฐœ
  • ahss-s3-access-logging : ahss-mybucket์˜ Access Log๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๋ฒ„ํ‚ท
  • ๊ทธ ์™ธ ๋‹ค์Œ๊ณผ ๊ฐ™์€ flow๋ฅผ ์ค€๋น„
    • python์„ ์ด์šฉํ•˜์—ฌ randomํ•œ html ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ , ์ด๋ฅผ boto3๋ฅผ ์ด์šฉํ•˜์—ฌ ahss-mybucket์— ์—…๋กœ๋“œ
    • python์„ ์ด์šฉํ•˜์—ฌ ๋‚ด PC์—์„œ randomํ•œ User-Agent ๋ฐ ๋‹ค์ˆ˜ ํŒŒ์ผ์— ์ ‘๊ทผ ์‹œ๋„
    • boto3 ๋ฐ pandas๋ฅผ ์ด์šฉํ•˜์—ฌ ์ง‘๊ณ„ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
    • Amazon S3 ์ˆ˜๋ช…์ฃผ๊ธฐ ๊ด€๋ฆฌ ๋ชจ๋‹ˆํ„ฐ๋ง

2-1. ํผ๋ธ”๋ฆญ ๋ฒ„ํ‚ท ์ƒ์„ฑ

๋ฒ„ํ‚ท์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํผ๋ธ”๋ฆญ ์„ค์ •๊ณผ S3:GetObject ์ •์ฑ…์„ ํ• ๋‹นํ•œ๋‹ค.

๋˜ํ•œ

2-2. Access Logging์šฉ ๋ฒ„ํ‚ท ์ƒ์„ฑ

Access Logging์„ ์ˆ˜ํ–‰ํ•  ๋ฒ„ํ‚ท์€ ํผ๋ธ”๋ฆญ ์ ‘๊ทผ์ด ๋ถˆํ•„์š”ํ•˜๊ธฐ์— ์ผ๋ฐ˜์ ์ธ ๋ฒ„ํ‚ท ์ƒ์„ฑ์„ ๋”ฐ๋ฅธ๋‹ค.

2-3. ํผ๋ธ”๋ฆญ ๋ฒ„ํ‚ท์˜ Access Logging ์„ค์ •

์„œ๋น„์Šค ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” ํผ๋ธ”๋ฆญ ๋ฒ„ํ‚ท์˜ ์„ค์ • ์ค‘ Access Logging์„ ์„ค์ •ํ•˜๊ณ , ๋Œ€์ƒ์„ 2-2์—์„œ ์ƒ์„ฑํ•œ ๋ฒ„ํ‚ท์œผ๋กœ ์ง€์ •ํ•œ๋‹ค.

2-4. ๋žœ๋ค html ํŒŒ์ผ ์ƒ์„ฑ

python์„ ์ด์šฉํ•˜์—ฌ ๋žœ๋คํ•œ html ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ , boto3 ๋ชจ๋“ˆ์„ ์ด์šฉํ•˜์—ฌ S3์— ์—…๋กœ๋“œ๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.
์ด๋•Œ ์‚ฌ์ „์— aws credential์ด ํ•„์š”ํ•˜๋‹ค.

import boto3
import random
import string
import os

bucket_name = "ahss-mybucket"
num_files = 5
s3 = boto3.client('s3')

for i in range(1, num_files + 1):
    random_file_name = f"random_{i}.html"
    html_content = f"<html><head><title>Random HTML {i}</title></head><body><h1>This is random HTML file {i}</h1></body></html>"
    with open(random_file_name, 'w') as f:
        f.write(html_content)
    s3.upload_file(random_file_name, bucket_name, random_file_name)
    print(f"Uploaded {random_file_name} to S3 bucket {bucket_name}")
    os.remove(random_file_name)

์‹คํ–‰ ์‹œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

Uploaded random_1.html to S3 bucket ahss-mybucket
Uploaded random_2.html to S3 bucket ahss-mybucket
Uploaded random_3.html to S3 bucket ahss-mybucket
Uploaded random_4.html to S3 bucket ahss-mybucket
Uploaded random_5.html to S3 bucket ahss-mybucket

2-5. PC์—์„œ S3 URL์— ๋Œ€ํ•ด Request ํ…Œ์ŠคํŠธ

๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ…Œ์ŠคํŠธ ํŒŒ์ผ 1๊ฐœ์— ๋Œ€ํ•ด URL ์ฃผ์†Œ ํš๋“ ํ›„ PC์—์„œ ์ ‘๊ทผ์„ ํ…Œ์ŠคํŠธ ํ•ด๋ณธ๋‹ค.

์ •์ƒ์ ์œผ๋กœ ์ ‘๊ทผ์ด ๊ฐ€๋Šฅํ•จ์„ ํ™•์ธ

2-6. PC์—์„œ randomํ•œ User-Agent ๋ฐ ๋‹ค์ˆ˜ ํŒŒ์ผ์— ์ ‘๊ทผ ์‹œ๋„

python์—์„œ ๋น„๋™๊ธฐ ๋ฐฉ์‹์œผ๋กœ ๋ฏธ๋ฆฌ ์ƒ์„ฑํ•ด๋‘” random ํŒŒ์ผ์— User-Agent๋ฅผ ์ž„์˜๋กœ ๋ฐ”๊ฟ”๊ฐ€๋ฉด์„œ ํผ๋ธ”๋ฆญ์œผ๋กœ ์ ‘๊ทผ์„ ๋‹ค์ˆ˜ ์‹œ๋„ํ•œ๋‹ค.

import requests
import random

bucket_url = "https://ahss-mybucket.s3.ap-northeast-2.amazonaws.com/"

# ์‚ฌ์šฉ์ž ์—์ด์ „ํŠธ(User-Agent) ๋ชฉ๋ก
user_agent_list = [
    # Chrome
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36",
    
    # Firefox
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0",

    # Safari
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Safari/605.1.15",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Safari/605.1.15",

    # Edge
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Edge/100.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Edge/101.0.0.0 Safari/537.36",
]

# 10000๋ฒˆ์˜ ์š”์ฒญ ๋ณด๋‚ด๊ธฐ
for _ in range(10000):
    user_agent = random.choice(user_agent_list)
    random_file_url = f"{bucket_url}random_{random.randint(1, 5)}.html"
    headers = {"User-Agent": user_agent}
    response = requests.get(random_file_url, headers=headers)
    if response.status_code == 200:
        print(f"Successfully fetched file with User-Agent: {user_agent}")
    else:
        print(f"Failed to fetch file with User-Agent: {user_agent}, Status code: {response.status_code}")

์‹คํ–‰ ๊ฒฐ๊ณผ
Successfully fetched file with User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36
Successfully fetched file with User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36
Successfully fetched file with User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Edge/101.0.0.0 Safari/537.36
Successfully fetched file with User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36

๋‹ค์Œ๊ณผ ๊ฐ™์ด ahss-s3-access-loggin ๋ฒ„ํ‚ท์— ์ ‘๊ทผ ์ด๋ ฅ ๊ด€๋ จ๋œ ๋กœ๊ทธ๊ฐ€ ์ž˜ ์Œ“์ธ๋‹ค.

3. boto3 ๋ฐ pandas๋ฅผ ์ด์šฉํ•˜์—ฌ ์ง‘๊ณ„ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ

https://aws.amazon.com/ko/blogs/storage/monitor-amazon-s3-activity-using-s3-server-access-logs-and-pandas-in-python/
์ด๊ณณ์„ ์ฐธ์กฐํ•˜์—ฌ ํ•™์Šต ๋ฐ ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•ด๋ณธ๋‹ค.

๊ธฐ๋ณธ์ ์ธ python ์™ธ๋ถ€ ๋ชจ๋“ˆ ์„ค์น˜

pip3 install boto3, pandas

boto3 ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ…Œ์ŠคํŠธ

import os
import boto3
import pandas as pd

bucket = 'demo-access-logs-bucket'
s3_client = boto3.client('s3')
print (s3_client)

<botocore.client.S3 object at 0x7efecf37cfd0>

์ •์ƒ์ ์œผ๋กœ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•จ์„ ํ™•์ธํ•˜์˜€์œผ๋‹ˆ ์กฐ๊ธˆ ๋” ํ™•์žฅํ•ด๋ณด์ž

import boto3
import pandas as pd
import matplotlib.pyplot as plt

class S3AccessLogAnalyzer:
    def __init__(self, bucket_name):
        self.bucket_name = bucket_name
        self.s3_client = boto3.client('s3')
        self.log_objects = []

    def list_log_objects(self):
        paginator = self.s3_client.get_paginator('list_objects_v2')
        result = paginator.paginate(Bucket=self.bucket_name, Prefix='')
        for each in result:
            key_list = each['Contents']
            for key in key_list:
                self.log_objects.append(key['Key'])

    def load_log_data(self):
        log_data = []
        for log_key in self.log_objects:
            log_data.append(pd.read_csv(f's3://{self.bucket_name}/{log_key}', sep=" ", names=['Bucket_Owner', 'Bucket', 'Time', 'Time_Offset', 'Remote_IP', 'Requester_ARN/Canonical_ID',
                                                                                             'Request_ID', 'Operation', 'Key', 'Request_URI', 'HTTP_status', 'Error_Code', 'Bytes_Sent', 'Object_Size',
                                                                                             'Total_Time', 'Turn_Around_Time', 'Referrer', 'User_Agent', 'Version_Id', 'Host_Id', 'Signature_Version',
                                                                                             'Cipher_Suite', 'Authentication_Type', 'Host_Header', 'TLS_version'],
                                       usecols=range(25)))

        return pd.concat(log_data)

    def analyze_and_visualize(self):
        df = self.load_log_data()
        
        # Pie ๊ทธ๋ž˜ํ”„: ๊ฐ€์žฅ ๋งŽ์ด ์š”์ฒญ๋œ ํŒŒ์ผ 5๊ฐœ
        top_five_objects = df[(df['Operation'] == 'REST.GET.OBJECT')]['Key'].value_counts().nlargest(5)
        top_five_objects.plot.pie(label='')
        plt.savefig("/home/ec2-user/my_graph.png")
        plt.show()

        # Bar ๊ทธ๋ž˜ํ”„: ์‘๋‹ต ์ƒํƒœ ์ฝ”๋“œ๋ณ„ ์š”์ฒญ ์ˆ˜
        response_codes = df['HTTP_status'].value_counts()
        response_codes.plot.bar()
        plt.savefig("/home/ec2-user/my_graph2.png")

        # ์ ‘๊ทผ IP ๋ชฉ๋ก
        access_ip_list = df[(df['Operation'] == 'REST.GET.OBJECT')]['Remote_IP'].value_counts()
        print("=== access ip list ===")
        print(access_ip_list)

        # ๊ฑฐ๋ถ€๋œ IP ๋ชฉ๋ก
        deny_ip_list = df[(df['HTTP_status'] == 403)]['Remote_IP'].value_counts()
        print("=== deny ip list ===")
        print(deny_ip_list)

        # ๊ฑฐ๋ถ€๋œ ํŒŒ์ผ ๋ชฉ๋ก
        deny_key_list = df[(df['HTTP_status'] == 403)]['Key'].value_counts()
        print("=== deny key list ===")
        print(deny_key_list)

if __name__ == "__main__":
    bucket_name = 'ahss-s3-access-logging'
    analyzer = S3AccessLogAnalyzer(bucket_name)
    
    # S3 ๋กœ๊ทธ ๊ฐ์ฒด ๋ชฉ๋ก ๊ฐ€์ ธ์˜ค๊ธฐ
    analyzer.list_log_objects()
    
    # ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ์‹œ๊ฐํ™”
    analyzer.analyze_and_visualize()

Jupiter๊ฐ€ ์•„๋‹Œ, terminal์„ ์ด์šฉํ•˜์—ฌ ssh ์ ‘๊ทผ ํ›„ ํ…Œ์ŠคํŠธ ํ•˜์˜€๊ธฐ์— ๊ทธ๋ž˜ํ”„ ์ถœ๋ ฅ์— ๋ฌธ์ œ๊ฐ€ ์žˆ์–ด์„œ ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๊ณ , ๋กœ์ปฌ PC์—์„œ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ํ™•์ธํ•œ๋‹ค

๊ฐ€์žฅ ์ž์ฃผ ์•ก์„ธ์Šค๋˜๋Š” ๊ฐ์ฒด ํ™•์ธ
โฏ scp -i test.pem ec2-user@$EC2_Public_IP:/home/ec2-user/my_graph.png ./my_graph.png

๋ฒ„ํ‚ท์˜ ์‘๋‹ต ์ฝ”๋“œ ๋ณด๊ธฐ
โฏ scp -i test.pem ec2-user@$EC2_Public_IP:/home/ec2-user/my_graph2.png ./my_graph2.png
ํŠน์ด my_graph2.png๋Š” ์•„๋ฌด๋ž˜๋„ ๊ธฐ์กด plt๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜์ง€ ์•Š์•„์„œ ๋ฐœ์ƒํ•œ ์˜ค๋ฅ˜๋กœ ๋ณด์ด๊ณ  ์ถ”ํ›„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

๋˜ํ•œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋„ ๋„์ถœ ๊ฐ€๋Šฅํ•˜๋‹ค
ํ…Œ์ŠคํŠธ๋ฅผ ์‹œ๋„ํ•œ ๊ณต์ธ IP๊ฐ€ ์ œํ•œ์ ์ธ ์ƒํ™ฉ์ž„์„ ๊ณ ๋ คํ•œ๋‹ค.

=== access ip list ===
Remote_IP
%client_public_ip% 200
Name: count, dtype: int64
์œ„์™€๊ฐ™์ด http_status_code๊ฐ€ 200์ธ ์ •์ƒ ์ ‘๊ทผ์„ ์‹œ๋„ํ•œ IP ๋ชฉ๋ก ํ™•์ธ์ด ๊ฐ€๋Šฅ

=== deny ip list ===
Remote_IP
%client_public_ip% 40
Name: count, dtype: int64
์œ„์™€๊ฐ™์ด http_status_code๊ฐ€ 403์ธ ๋น„์ •์ƒ ์ ‘๊ทผ์„ ์‹œ๋„ํ•œ IP ๋ชฉ๋ก ํ™•์ธ์ด ๊ฐ€๋Šฅ

=== deny key list ===
Key
favicon.ico 38
random_1.html 2
Name: count, dtype: int64
์œ„์™€๊ฐ™์ด http_status_code๊ฐ€ 403์ธ ๋น„์ •์ƒ ์ ‘๊ทผ์„ ์‹œ๋„ํ•œ uri ๋ชฉ๋ก ํ™•์ธ์ด ๊ฐ€๋Šฅ

4. Amazon S3 ์ˆ˜๋ช…์ฃผ๊ธฐ ๊ด€๋ฆฌ ๋ชจ๋‹ˆํ„ฐ๋ง

AWS Workshop์„ ๋ณด๊ณ  ๋”ฐ๋ผํ•ด๋ณด์•˜์œผ๋‚˜, ๊ธฐํ•œ ๋งŒ๋ฃŒ๋œ ํŒŒ์ผ ์ค€๋น„ ํ›„ ํ…Œ์ŠคํŠธ๊ฐ€ ํ•„์š”ํ•˜์—ฌ ์ค€๋น„์ค‘

profile
๋‚จ๋“ค์€ ์—ฌ๊ธฐ์— ๋ญ˜ ์“ฐ๋”๋ผ?

0๊ฐœ์˜ ๋Œ“๊ธ€

๊ด€๋ จ ์ฑ„์šฉ ์ •๋ณด