๐Ÿ“’ AWS(5) - RedShift(2)

Kimdongkiยท2024๋…„ 5์›” 8์ผ

DB

๋ชฉ๋ก ๋ณด๊ธฐ
16/33

๐Ÿ“Œ ์‚ฌ์šฉ์ž ๊ทธ๋ฃน ๊ถŒํ•œ ์„ค์ •

analytics_authorsanalayics_userspii_usersadmin
raw_data TablesReadReadXRead, Write
analytics TablesRead, WriteReadXRead, Write
adhoc TablesRead, WriteRead, WriteXRead, Write
pii TablesXXReadRead, Write

์œ„ ํ‘œ ์ฒ˜๋Ÿผ ๊ถŒํ•œ์„ ์„ค์ •ํ•ด ๋ณด์ž.

  • analytics_authors
GRANT ALL ON SCHEMA analytics TO GROUP analytics_authors;
GRANT ALL ON ALL TABLES IN SCHEMA analytics TO GROUP analytics_authors;

GRANT ALL ON SCHEMA adhoc to GROUP analytics_authors;
GRANT ALL ON ALL TABLES IN SCHEMA adhoc TO GROUP analytics_authors;

GRANT USAGE ON SCHEMA raw_data TO GROUP analytics_authors;
GRANT SELECT ON ALL TABLES IN SCHEMA raw_data TO GROUP analytics_authors;
  • analytics_users
GRANT USAGE ON SCHEMA analytics TO GROUP analytics_users;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO GROUP analytics_users;

GRANT ALL ON SCHEMA adhoc to GROUP analytics_users;
GRANT ALL ON ALL TABLES IN SCHEMA adhoc TO GROUP analytics_users;

GRANT USAGE ON SCHEMA raw_data TO GROUP analytics_users;
GRANT SELECT ON ALL TABLES IN SCHEMA raw_data TO GROUP analytics_users;
  • pii_users
GRANT USAGE ON SCHEMA pii TO GROUP pii_users;
GRANT SELECT ON ALL TABLES IN SCHEMA pii TO GROUP pii_users;

๐Ÿ“Œ Redshift Spectrum - S3 ์™ธ๋ถ€ ํ…Œ์ด๋ธ” ์กฐ์ž‘

  • Fact Table : ๋ถ„์„์˜ ์ดˆ์ ์ด ๋˜๋Š” ์–‘์  ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ์ค‘์•™ ํ…Œ์ด๋ธ”
    • ์ผ๋ฐ˜์ ์œผ๋กœ ๋งค์ถœ ์ˆ˜์ต, ํŒ๋งค๋Ÿ‰ ๋˜๋Š” ์ด์ต๊ณผ ๊ฐ™์€ ์‚ฌ์‹ค ๋˜๋Š” ์ธก์ • ํ•ญ๋ชฉ์„ ํฌํ•จํ•˜๋ฉฐ ๋น„์ฆˆ๋‹ˆ์Šค ๊ฒฐ์ •์— ์‚ฌ์šฉํ•œ๋‹ค.
    • Fact Table์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์™ธ๋ž˜ ํ‚ค๋ฅผ ํ†ตํ•˜์—ฌ ์—ฌ๋Ÿฌ Dimension Table๊ณผ ์—ฐ๊ฒฐ๋œ๋‹ค.
    • ๋ณดํ†ต Fact Table์˜ ํฌ๊ธฐ๊ฐ€ ํ›จ์”ฌ ํฌ๋‹ค.
  • Dimension Table : Fact Table์— ๋Œ€ํ•œ ์ƒ์„ธ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ํ…Œ์ด๋ธ”
    • ๊ณ ๊ฐ, ์ œํ’ˆ๊ณผ ๊ฐ™์€ ํ…Œ์ด๋ธ”๋กœ Fact Table์— ๋Œ€ํ•œ ์ƒ์„ธ ์ •๋ณด ์ œ๊ณต
    • Fact Table์˜ ๋ฐ์ดํ„ฐ์— ๋งฅ๋ฝ์„ ์ œ๊ณตํ•˜์—ฌ ์‚ฌ์šฉ์ž๊ฐ€ ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์กฐ๊ฐ๋‚ด๊ณ  ๋ถ„์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์ค€๋‹ค.
    • Dimension Table์€ ์ผ๋ฐ˜์ ์œผ๋กœ Primary Key๋ฅผ ๊ฐ€์ง€๋ฉฐ, Fact Table์˜ Foreign Key์—์„œ ์ฐธ์กฐํ•œ๋‹ค.
    • ๋ณดํ†ต Dimension Table์˜ ํฌ๊ธฐ๋Š” ํ›จ์”ฌ ์ž‘๋‹ค.
  • ์™ธ๋ถ€ ํ…Œ์ด๋ธ”(External Table)
    • DB ์—”์ง„์ด ์™ธ๋ถ€์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋งˆ์น˜ ๋‚ด๋ถ€ ํ…Œ์ด๋ธ”์ฒ˜๋Ÿผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
      -> ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์€ ์™ธ๋ถ€์— ์ €์žฅ๋œ ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ DB๋‚ด๋ถ€๋กœ ๋ณต์‚ฌํ•˜๊ณ  ์“ฐ๋Š” ๊ฒƒ์ด ์•„๋‹Œ ์ž„์‹œ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.
    • SQL ๋ช…๋ น์–ด๋กœ DB์— ์™ธ๋ถ€ ํ…Œ์ด๋ธ” ์ƒ์„ฑ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
      -> ๋ฐ์ดํ„ฐ๋ฅผ ์ƒˆ๋กœ ๋งŒ๋“ค๊ฑฐ๋‚˜ ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ฐธ์กฐ๋งŒ ํ•˜๊ฒŒ ๋œ๋‹ค.
      -> ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์€ CSV, JSON, XML๊ณผ ๊ฐ™์€ ํŒŒ์ผ ํ˜•์‹ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ODBC or JDBC ๋“œ๋ผ์ด๋ฒ„๋ฅผ ํ†ตํ•ด ์•ก์„ธ์Šคํ•˜๋Š” ์›๊ฒฉ DB์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค์— ๋Œ€ํ•ด ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค.
    • ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํ›„ ๊ฒฐ๊ณผ๋ฅผ DB์— ์ ์žฌํ•˜๋Š”๋ฐ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค.
      -> ์˜ˆ๋ฅผ ๋“ค์–ด, ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ๊ทธ ํŒŒ์ผ์„ ์ฝ๊ณ  ์ •์ œ๋œ ๋‚ด์šฉ์„ DB Table์— ์ ์žฌ ๊ฐ€๋Šฅํ•˜๋‹ค.
    • ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์€ ๋ณด์•ˆ ๋ฐ ์„ฑ๋Šฅ ๋ฌธ์ œ์— ๋Œ€ํ•ด ์‹ ์ค‘ํ•œ ๊ณ ๋ ค๊ฐ€ ํ•„์š”ํ•˜๋‹ค.
    • HIVE๋“ฑ์—์„œ ์ฒ˜์Œ ์‹œ์ž‘ํ•œ ๊ฐœ๋…์œผ๋กœ ์ด์ œ๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๋น… ๋ฐ์ดํ„ฐ ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์šฉ๋œ๋‹ค.
  • Redshift Spectrum ์‚ฌ์šฉ๋ฐฉ์‹
    • S3์— ์žˆ๋Š” ํŒŒ์ผ๋“ค์„ ๋งˆ์น˜ ํ…Œ์ด๋ธ”์ฒ˜๋Ÿผ SQL๋กœ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ
      • S3ํŒŒ์ผ๋“ค์„ ์™ธ๋ถ€ ํ…Œ์ด๋ธ”๋“ค๋กœ ์ฒ˜๋ฆฌํ•˜๋ฉด์„œ Redshift ํ…Œ์ด๋ธ”๊ณผ ์กฐ์ธ ๊ฐ€๋Šฅ
      • S3 ์™ธ๋ถ€ ํ…Œ์ด๋ธ”๋“ค์€ ๋ณดํ†ต Fact Table์ด ๋˜๊ณ  Redshift Table์€ Dimension Table์ด ๋œ๋‹ค.
    • Redshift Cluster๊ฐ€ ํ•„์š”ํ•˜๋‹ค.
      • S3์™€ Redshift Cluster๋Š” ๊ฐ™์€ Region์— ์žˆ์–ด์•ผํ•œ๋‹ค.
    • S3 Fact ๋ฐ์ดํ„ฐ๋ฅผ ์™ธ๋ถ€ ํ…Œ์ด๋ธ”๋กœ ์ •์˜ํ•ด์•ผ ํ•œ๋‹ค.
  • IAM ์ •์ฑ… ์ถ”๊ฐ€

๐Ÿ“Œ Redshift Spectrum - S3 ์™ธ๋ถ€ ํ…Œ์ด๋ธ” ์กฐ์ž‘ ์‹ค์Šต

  • ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์šฉ ์Šคํ‚ค๋งˆ ์„ค์ •
    • ๋จผ์ € ์•ž์„œ ๋งŒ๋“  redshift.read.s3 ROLE์— AWSGlueConsoleFullAccess ๊ถŒํ•œ ์ง€์ •์ด ํ•„์š”ํ•˜๋‹ค.
    • ๋‹ค์Œ์œผ๋กœ ์•„๋ž˜ SQL์„ ์‹คํ–‰ํ•˜์—ฌ ์™ธ๋ถ€ ํ…Œ์ด๋ธ”์šฉ ์Šคํ‚ค๋งˆ ์ƒ์„ฑ
CREATE EXTERNAL SCHEMA external_schema
FROM data catalog
database 'myspectrum_db'
iam_role 'arn~~~'
CREATE external database if not exists;
  • ์™ธ๋ถ€ Fact Table ์ •์˜
    • S3์— usc๋ผ๋Š” ํด๋”๋ฅผ ๊ฐ์ž S3 ๋ฒ„ํ‚ท ํ•˜์œ„์— ๋งŒ๋“ ๋‹ค.
    • usc ํด๋”์— user_session_channel.csv ํŒŒ์ผ ๋ณต์‚ฌ
    • ์•„๋ž˜ SQL ์‹คํ–‰
CREATE EXTERNAL TABLE external_schema.user_session_channel(
	userid integer,
    sessionid varchar(32),
    channel varchar(32)
)
row format delimited
fields terminated by','
stored as textfile
location 's3://~~~';
  • ๋‚ด๋ถ€ Dimension Table ์ •์˜
CREATE TABLE raw_data.user_property AS
SELECT 
	userid,
    CASE WHEN cast(random()*2 as int) = 0 THEN 'male' ELSE 'female' END gender,
    (CAST(random()*50 as int)+18) age
FROM(
	SELECT DISTINCT userid
    FROM raw_data.user_session_channel
);
  • Fact + Dimension Table JOIN
SELECT gender, COUNT(1)
FROM external_schema.user_session_channel usc
JOIN raw_data.user_property up ON usc.userid = up.userid
GROUP BY 1;

0๊ฐœ์˜ ๋Œ“๊ธ€