OLAP in AWS

jinhyukko·2026년 1월 19일
post-thumbnail

OLAP vs OLTP

AspectOLTPOLAP
Primary goalExecute transactionsAnalyze data
Query typeShort, simple (INSERT, UPDATE, SELECT)Long, complex (JOIN, GROUP BY, aggregations)
Data volumeSmall per transactionVery large datasets
Read/Write patternFrequent reads & writesMostly read-only
LatencyMillisecondsSeconds to minutes
ConcurrencyVery highLow to medium
TransactionsFull ACID supportUsually limited or none
Schema designHighly normalized (3NF)Denormalized (Star/Snowflake)
Storage layoutRow-orientedColumn-oriented
Index usageMany indexesMinimal indexes
Typical workloadUser actions, paymentsReports, dashboards, analytics

Data Storage

S3 is in a Data Lake Layer
Redshift is in a Data warehouse layer

Hadoop Ecosystem vs AWS

Hadoop ComponentRoleAWS EquivalentNotes
HDFSDistributed file systemAmazon S3De facto replacement for HDFS
YARNResource & job schedulingEMR / EKS / ECS Plane
MapReduceBatch processing engineEMR (legacy)Mostly replaced by Spark
HiveSQL on HadoopAmazon AthenaPresto-based SQL engine
Hive MetastoreMetadata catalogAWS Glue Data CatalogShared metastore
SparkDistributed compute engineEMR / Glue / EKSGlue = serverless Spark
HBaseWide-column NoSQLDynamoDBConceptual equivalent
KafkaStreaming platformMSK / Kinesis
FlumeLog ingestionKinesis Firehose
SqoopRDBMS ↔ HDFS transferAWS DMS
OozieWorkflow schedulerStep Functions
ZooKeeperDistributed coordinationSelf-managedNo direct AWS replacement
profile
Cloud Security, Pentesting, AWS

0개의 댓글