Data | Data Lake(S3) | Data Transformation | DW |
---|---|---|---|
Data Sources | Data in Raw Format | Spark, Athena | |
사용자 액세스 로그, raw_data | Storage | 데이터 정돈 후 DW에 적재 |
Click stream, call data, ads performance data, transactions, sensor data, metadata, production databases, log files, API, stream data (Kafka topic)
SELECT
c.courseid
, COUNT(DISTINCT cr.studentid) “수강생수"
, COUNT(DISTINCT cr.reviewid)“리뷰수"
, AVG(cr.rating) “평점"
FROM course c
LEFT JOIN course_review cr ON c.courseid = cr.courseid
GROUP BY 1;
Couse에 Course-review Join 통해 DW에서 미리 계산