AWS - 7. Redshift

Tony Lee·2023년 5월 24일
0

knowledge

목록 보기
14/16
post-thumbnail

Redshift

AZ scoped Data warehouse that can go up to 2PB

  • It is OLAP which has high latency and not applicable for production database.
  • Column based storage which enables compression for each column and manipulating columns is fairly quick
  • Supports bulk update
  • Does NOT guarantee primary key uniqueness

Infrastructure

Use RA3 node type

If ANY of the nodes fail, the entire AZ scoped Redshift shuts down.

Scalability

If additional resource is required, user has to manually add a new node to scale.

Scale Out
Adding additional nodes

Scale Up
Upgrading the current node to a better spec

Optimizing

If there are mulitple nodes in Redshift, optimizing can be an cumbersome task to handle.

Distribution Style (all, even, key)
Is HOW the data will be separated.

Distritubtion Key
Indicates as the reference of the data being separated to individual nodes when using Key dstribution

Redshift Serverless in Colab

  1. Create Redshift Serverless as default
  2. Allow public access
  3. Go to VPC security Group and open up port 5439 in Inbound Rules
  4. In Google Colab
    %sql postgresql://YOUR_ID:YOUR_PW@YOUR_ENDPOINT:5439/dev
  5. Create Schema
%%sql

CREATE SCHEMA raw_data;
CREATE SCHEMA analytics;
CREATE SCHEMA adhoc;
CREATE SCHEMA pii;
  1. Create User
%%sql

CREATE USER tony PASSWORD '*****'; 
  1. Set Group
%%sql

CREATE GROUP analytics_users;
CREATE GROUP pii_users;
CREATE GROUP analytics_authors;

ALTER GROUP analytics_authors ADD USER tony;
ALTER GROUP analytics_users ADD USER tony;
ALTER GROUP pii_users ADD USER tony;
  1. Set Roles
%%sql

CREATE ROLE staff;
CREATE ROLE manager;
CREATE ROLE external;

GRANT ROLE staff TO tony;
GRANT ROLE staff TO ROLE manager;
  1. Create Table
%%sql

CREATE TABLE raw_data.user_session_channel (
   userid integer ,
   sessionid varchar(32) primary key,
   channel varchar(32)
);
  1. Upload desired csv file to S3 bucket
  2. Allow Redshift to access S3 bucket.
  3. In IAM console, create user Role
  4. Select AWS service -> Redshift
  5. Allow AmazonS3FullAccess
  6. Associate this role to the Redshift Cluster
  7. Read from S3
%%sql

-- 아래 쿼리에서 aws_iam_role의 값은 각자 Redshift cluster에 지정한 S3 읽기 권한 ROLE의 ARN을 지정
COPY raw_data.user_session_channel
FROM 'YOUR_S3_FILE_DIR'
credentials 'aws_iam_role=YOUR_IAM_ARN_from_#15'
delimiter ',' dateformat 'auto' timeformat 'auto' IGNOREHEADER 1 removequotes;

Permission and Policy

Role vs Group

It is common to see RBAC (Role Based Access Control) since roles are easier to manage than groups.
Roles are able to inherit from other roles.

Updating Permissions in SQL

GRANT ALL ON SCHEMA analytics TO GROUP analytics_authors;
GRANT ALL ON ALL TABLES IN SCHEMA analytics TO GROUP analytics_authors;

GRANT ALL ON SCHEMA adhoc to GROUP analytics_authors;
GRANT ALL ON ALL TABLES IN SCHEMA adhoc TO GROUP analytics_authors;

GRANT USAGE ON SCHEMA raw_data TO GROUP analytics_authors;
GRANT SELECT ON ALL TABLES IN SCHEMA raw_data TO GROUP analytics_authors;

Console vs SQL

Using SQL commands for granting permissions in Redshift offers flexibility, automation, version control, and consistency advantages. However, the console should still be used for visibility and management purposes, especially when viewing or modifying permissions in the future.

Backups

Redshift

Redshift offers snapshots which reverts back to the previous state.
It only stores changes from the previous backup to optimze storage.

Redshift Serverless

Serverless only offers 24 hours of Recovery Point.

Redshift Spectrum

Use S3 file as a table in Redshift

  • S3 and Redshift cluster has to be in the same region.
  • Beneficial when the raw data in S3 is too large and want to process before uploading to Redshift.

Use case

Fact Table

Large dataset that contains ALL data for sales, revenue, etc.

Dimension Table

Provide specific details that are linked to the Fact Table, such as, product details, customer info, etc.

Demension tables are usually uploaded to Redshift

Disclaimer

This summary is made possible by Oreilly's AWS, 3rd Edition - Chad Smith.
If the above post violates any copyright permissions, please let me know!

profile
Striving to have a positive impact on the community

0개의 댓글