[AWS] Glue - Databases+table

sm_cloud_life·2023년 4월 4일

GLUE aws

2023 지방경기기능대회 2과제

목록 보기

7/9

Glue 개념 사이트 - https://velog.io/@ginee_park/AWS-Glue란

1. Glue Databases 생성

Glue Databasess - 중앙 메타데이터 저장소라고 하는 데이터베이스를 사용한다.

이 말은 즉, 모든 데이터를 한 곳에 모아 ETL 작업을 할 수 있도록 제공한다는 것이다.

스크립트를 생성하여 데이터를 변환(Transform)할 수 있다

크롤러를 정의하여 메타데이터 테이블 정의로 AWS Glue Data Catalog를 채운다.

[경로] AWS Glue → Databases → Add database

Name - wsi-glue-db

2. Glue tables 생성

Glue tables - 데이터 저장소의 데이터를 나타내는 메타데이터의 정의이다

[경로] AWS Glue → Tables → Add table

Name - wsi-table

Database - wsi-glue-db

Select the type of source - kinesis

Region - Asia Pacific (Seoul) ap-northeast-2

Kinesis stream name -test-kinesis

Classification - Json - Next

Add를 통하여

Key	Type
app_id	string
event_name	string
event_time	timestamp
idfa	string
advertising_id	string

이제 Schema를 수정할건데 어떤 식으로 생성해야 하는지 많이 막막할 것이다. 만약 과제지에 아래와 같이 친절하게 Data Set이 제공되면 아래 Data Set을 기반으로 Table을 생성하면 되는데

만약 없을 경우 KDS나 배포파일을 뜯어서 Request되는 값을 보고 알아서 판단하면 된다.

참고 사이트 - https://docs.aws.amazon.com/ko_kr/databrew/latest/dg/datatypes.html

{
"app_id": "com.sokoloff06.sdktest",
      "event_name": "af_cross_promotion",
      "event_time": "2020-05-10 00:57:26.038",
      "idfa": null,
      "advertising_id": "cb654aa6-8026-4633-bfd1-de619896fd6a"
}

예를 들어 위와 같은 데이터가 Request된다면 분석해보면 된다.

간단한 json type이라서 Key를 Scheme의 Name 로 넣고, Value를 보면서 Type 결정하면된다.

이어서 schema를 수정헌다.

Next를 클릭하여 생성한다

curl \
    -X POST \
    -H "Content-Type: application/json" \
    -d '{
    	"app_id": "com.sokoloff06.sdktest",
        "event_name": "af_cross_promotion",
        "event_time": "2020-05-10 00:57:26.038",
        "idfa": null,
        "advertising_id": "cb654aa6-8026-4633-bfd1-de619896fd6a"
    '} \
    "https://lx19nagsj3.execute-api.ap-northeast-2.amazonaws.com/deploy?stream=test-kinesis"

sm_cloud_life

이전 포스트

[AWS] api gateway 특정 IP만 접근 허용

다음 포스트

[AWS] Glue - Databases+table

2023 지방경기기능대회 2과제

1. Glue Databases 생성

2. Glue tables 생성

[AWS] api gateway 특정 IP만 접근 허용

[AWS] Glue - crawler

0개의 댓글

관련 채용 정보