The 4 pillars of AWS Observability: CloudWatch, CloudTrail , EventBridge, and Config

jinhyukko·2026년 1월 24일
post-thumbnail

0. Introduction

there are lots of ways to audit and monitor a infracsturucre of Inofrmation Processing system in practice. in the old days, we just looked at logs after a crash. Now, we need a "Self-Healing" cloud. We'll break down what AWS has to offer


[!] IMPORTANT
The intersection among the three can be complicated due to duplication of what they can do.

  • CloudWatch : Numbers, Logs
  • CloudTrail : Behavior
  • Config : State

Can all three stop a Public S3 Bucket? Yes.
Technically, you can achieve "Self-healing" using any of the three, but the approach and efficiency differ significantly:

  1. CloudWatch (The Log-based Approach)
    How: Collect S3 Access Logs → Use Metric Filters to find "Public" keywords → Trigger Alarm → Lambda.

The Verdict: Slowest & Most Complex. It relies on log ingestion delays. Not recommended for immediate security response.

  1. CloudTrail + EventBridge (The Event-driven Approach)
    How: Capture the PutBucketPolicy API call in real-time → EventBridge detects the specific "Behavior" → Trigger Lambda to revert settings.

The Verdict: Fastest (Near Real-time). Best for "Catching the intruder" the moment they hit the Enter key.

  1. AWS Config (The Compliance-based Approach)
    How: Enable s3-bucket-public-read-prohibited rule → Trigger Automatic Remediation (SSM or Lambda) when the "State" becomes Non-compliant.

The Verdict: Most Professional & Robust. This is the standard "Self-healing" architecture for governance. It ensures the infrastructure stays in its desired state.

0.1. The Observability Gap

  • CloudTrail is your Security Camera (History).
  • CloudWatch is your Dashboard (Current Health).
  • EventBridge is your Nervous System (The "If this happens, do that" logic).

1. CloudTrail

The Vibe: Accountability. No one touches the infrastructure without a record.

if a developer accidentally deletes a database at 3:00 AM, CloudTrail tells you exactly who it was and what IP address they used.

1.1. Management Events vs Data Events

Management Events :
Data Events :

1.2. Self-Healing Process

CloudTrail -> EventBridge -> Lambda or SNS

2. CloudWatch

The Vibe: Visibility and Performance.
What to write: * Explain that Metrics are just numbers (like your heart rate). If your CPU hits 90%, CloudWatch screams an Alarm.

  • Logs Insights is the killer feature here—tell your readers they can search through millions of log lines in seconds using simple commands to find that one "Error 500."

CloudWatch Logs cost $0.50/GB. Don't log every single "Success 200" message if you don't need to.

  • Pro-tip: Store long-term logs in S3 (cheap) and only keep active "fire-fighting" logs in CloudWatch (expensive but fast).

2.1 Self-Healing Process

CloudWatch Metrics -> CloudWatch Alarm -> (EventBridge) -> Lambda/SNS -> Action
CloudWatch Logs -> EventBridge -> Lambda or SNS -> Action

EventBridge can be skipped

3. EventBridge

This is the "Router." It listens for "Events."

  • it directs data. It sees an event from CloudTrail or CloudWatch and says, "Hey Lambda, go fix this!" or "Hey SNS, text the Boss!"
  • The "Audit-to-Alert" Flow: CloudTrail detects a login without MFA → EventBridge catches that specific event → EventBridge triggers an SNS email to you.
  • The "Performance-to-Scale" Flow: CloudWatch sees high traffic → EventBridge triggers an Auto-Scaling group to add more servers.
  • The "Security-to-Fix" Flow: Someone makes an S3 bucket public → EventBridge triggers a Lambda function that immediately changes it back to private. (This is called Auto-Remediation).

4. Config

How: Enable s3-bucket-public-read-prohibited rule → Trigger Automatic Remediation (SSM or Lambda) when the "State" becomes Non-compliant.

Most Professional & Robust. This is the standard "Self-healing" architecture for governance. It ensures the infrastructure stays in its desired state.

4.1 Self-Healing Process

AWS Config -> (EventBridge) -> Lambda/SNS -> Action
AWS Config -> Action

5. Conclusion: Building a Reactive Cloud

you don't have to check the logs every day because the system tells you when something is wrong.


profile
Cloud Security, Pentesting, AWS

0개의 댓글