AZ scoped Data warehouse that can go up to 2PB
Use RA3 node type
If ANY of the nodes fail, the entire AZ scoped Redshift shuts down.
If additional resource is required, user has to manually add a new node to scale.
Scale Out
Adding additional nodes
Scale Up
Upgrading the current node to a better spec
If there are mulitple nodes in Redshift, optimizing can be an cumbersome task to handle.
Distribution Style (all, even, key)
Is HOW the data will be separated.
Distritubtion Key
Indicates as the reference of the data being separated to individual nodes when using Key dstribution
%sql postgresql://YOUR_ID:YOUR_PW@YOUR_ENDPOINT:5439/dev
%%sql
CREATE SCHEMA raw_data;
CREATE SCHEMA analytics;
CREATE SCHEMA adhoc;
CREATE SCHEMA pii;
%%sql
CREATE USER tony PASSWORD '*****';
%%sql
CREATE GROUP analytics_users;
CREATE GROUP pii_users;
CREATE GROUP analytics_authors;
ALTER GROUP analytics_authors ADD USER tony;
ALTER GROUP analytics_users ADD USER tony;
ALTER GROUP pii_users ADD USER tony;
%%sql
CREATE ROLE staff;
CREATE ROLE manager;
CREATE ROLE external;
GRANT ROLE staff TO tony;
GRANT ROLE staff TO ROLE manager;
%%sql
CREATE TABLE raw_data.user_session_channel (
userid integer ,
sessionid varchar(32) primary key,
channel varchar(32)
);
AmazonS3FullAccess
%%sql
-- 아래 쿼리에서 aws_iam_role의 값은 각자 Redshift cluster에 지정한 S3 읽기 권한 ROLE의 ARN을 지정
COPY raw_data.user_session_channel
FROM 'YOUR_S3_FILE_DIR'
credentials 'aws_iam_role=YOUR_IAM_ARN_from_#15'
delimiter ',' dateformat 'auto' timeformat 'auto' IGNOREHEADER 1 removequotes;
It is common to see RBAC (Role Based Access Control) since roles are easier to manage than groups.
Roles are able to inherit from other roles.
GRANT ALL ON SCHEMA analytics TO GROUP analytics_authors;
GRANT ALL ON ALL TABLES IN SCHEMA analytics TO GROUP analytics_authors;
GRANT ALL ON SCHEMA adhoc to GROUP analytics_authors;
GRANT ALL ON ALL TABLES IN SCHEMA adhoc TO GROUP analytics_authors;
GRANT USAGE ON SCHEMA raw_data TO GROUP analytics_authors;
GRANT SELECT ON ALL TABLES IN SCHEMA raw_data TO GROUP analytics_authors;
Using SQL commands for granting permissions in Redshift offers flexibility, automation, version control, and consistency advantages. However, the console should still be used for visibility and management purposes, especially when viewing or modifying permissions in the future.
Redshift offers snapshots which reverts back to the previous state.
It only stores changes from the previous backup to optimze storage.
Serverless only offers 24 hours of Recovery Point.
Use S3 file as a table in Redshift
Large dataset that contains ALL data for sales, revenue, etc.
Provide specific details that are linked to the Fact Table, such as, product details, customer info, etc.
Demension tables are usually uploaded to Redshift
This summary is made possible by Oreilly's AWS, 3rd Edition - Chad Smith.
If the above post violates any copyright permissions, please let me know!