오늘의 할일

CodeSeries로 Lambda Container이미지 자동배포해보기

CloudFormation으로 자동화하기

ECR 생성

ECR 생성 권한 연결

계정 MLops가 ECR Private Repository를 생성할 수 있도록 새로운 정책을 연결해준다.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CreateRepository",
                "ecr:ReplicateImage"
            ],
            "Resource": "*"
        }
    ]
}
ECR 프라이빗으로 생성

ECR에 이미지 업로드

ECR에 push했으나 이미지가 보이지 않는다.

권한에서 오류가 있는 것 같다.

MLops계정에 AmazonEC2ContainerRegistryPowerUser권한을 부여한다.

보인다!

Lambda함수 생성

현재 MLops에서 Lambda를 생성할 때에 ECR image에 접근할 수 없다.

https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-images.html
{
"Version": "2012-10-17",
"Statement": [
  {
  "Sid": "VisualEditor0",
  "Effect": "Allow",
  "Action": ["ecr:SetRepositoryPolicy","ecr:GetRepositoryPolicy"],
  "Resource": "arn:aws:ecr:<region>:<account>:repository/<repo name>/"
  }
]
}     
다음과 같은 권한이 필요하다고 한다.

현재 권한에는 GetRepositoryPolicy만 존재함을 볼 수 있다.

SetRepositoryPolicy를 넣어준다.

위에 예시에는 Resource가 정의되어 있지만, 이후 람다를 생성할때 저 권한이 필요했다.
그래서 *로 바꿔줘야 한다.

기존 역할에 추가로 넣어줬다.

람다를 생성할 수 있었다.

현재는 어제 실습했던 5:35PM이 출력된다. 이후 CodeBuild를 테스트할때 수정한다.

람다 구성

제한시간을 1분이상으로 늘려줘야 한다.

CodeCommit 생성

어제 작성했던 buildspec.yaml파일까지 업로드해준다.

CodeBuild

프로젝트 생성

에러

Codebuild를 위한 역할이 너무 많은 서비스를 신뢰한다고 한다.

1개만 존재해야한다.

현재 기본 Codebuild와의 신뢰관계 외에도 Lambda생성을 위해 Lambda와의 신뢰관계를 넣어줬었다.

현재 신뢰관계
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
				"Service": "codebuild.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		},
		{
			"Effect": "Allow",
			"Principal": {
				"Service": "lambda.amazonaws.com"
			},
			"Action": "sts:AssumeRole"
		}
	]
}
여기서 lambda신뢰관계를 제거해준다.

생성되었다.

빌드 시작

에러 발생, buildspec.yaml에서 새로만든 lambda함수의 이름을 바꿔주지 않아서, 이미지를 Deploy할 lambdafunction을 찾을 수 없는 것이다.

Buildspec.yaml과 내용을 조금 수정해 빌드를 수행했다.

ImageProcessing 생성

위에서 자동화까지는 아니지만, CodeCommit을 통해 Lambda Container Image Deploy까지 진행시켰다.

이제 S3를 생성하고, ImageProcessing코드를 담아 Lambda를 Deploy한 후 CloudWatch 로그그룹에서 로그를 확인한다.

마찬가지로 TextProcessing도 작업해준다.

S3생성

S3와 RDS은 홈페이지 인프라에서 구축될 예정이다.

미리 구축되어 있다고 가정하기위해 루트계정으로 미리 생성한다.

GCP Vision API를 사용하기위한 Credential.json파일을 업로드하고, 이미지를 올릴 Images폴더를 생성해준다.

Lambda와 S3연결

모든 이벤트에 대해서 이벤트가 Invoke가 된다.

또 이번에는 Prefix로 Iamges/를 넣어 Images 폴더에서 일어나는 이벤트만 받아들인다.

ImageProcessing 컨테이너 이미지 배포

app.py

import sys
import os
from struct import pack
import json
import urllib.parse
import boto3

print('Loading function')

s3 = boto3.client('s3')

# get pharagraph
def Get_Description(words):
  text = ''
  for word in words:
    for symbol in word.symbols:
      text += symbol.text
    text += ' '
  return text

# get paragraph's box
def Get_Box(vertices):
  x_list = []
  y_list = []
  for ver in vertices:
    x_list.append(ver.x)
    y_list.append(ver.y)
  return [min(x_list), min(y_list), max(x_list), max(y_list)]

# Call GCP Vision API
def detect_document(path):
    """Detects document features in an image."""
    from google.cloud import vision
    import io
    client = vision.ImageAnnotatorClient()

    # [START vision_python_migration_document_text_detection]
    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.document_text_detection(image=image)
    if response.error.message:
        raise Exception(
            '{}\nFor more info on error messages, check: '
            'https://cloud.google.com/apis/design/errors'.format(
                response.error.message))
    # [END vision_python_migration_document_text_detection]
    return response.full_text_annotation.pages[0].blocks
      
     
def handler(event, context):
    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    image_name = key.split('/')[-1]
    image_path = '/tmp/'+image_name

    #get credential_key and set path to the '$GOOGLE_APPLICATION_CREDENTIALS'
    credential_key = '<GCP Vision Credential>'
    credential_name = credential_key
    credential_path = '/tmp/'+credential_name

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

    try:
        S3_response = s3.get_object(Bucket=bucket, Key=key)
        print("CONTENT TYPE: " + S3_response['ContentType'])

        image_path = '/tmp/'+image_name
        s3.download_file(bucket, key, image_path)
        s3.download_file(bucket, credential_key, credential_path)

        response = detect_document(image_path)
        print('Detect_document is done')

    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

    # body = The list of Json
    body = []
    for res in response:
        for parag in res.paragraphs:
          body.append({
              'description' : Get_Description(parag.words),
              'box' : Get_Box(parag.bounding_box.vertices)
          })
  
    return{
        'body' : body,
        'bucket':bucket,
        'key': key
    }

Dockerfile

FROM public.ecr.aws/lambda/python:3.7

# Copy function code
COPY app.py ${LAMBDA_TASK_ROOT}
#COPY rds_config.py ${LAMBDA_TASK_ROOT}
#COPY rds_connect.py ${LAMBDA_TASK_ROOT}
#COPY TextProcessing.py ${LAMBDA_TASK_ROOT}
# Install the function's dependencies using file requirements.txt
# from your project folder.

COPY requirements.txt  .
RUN  pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "app.handler" ]

requirements.txt

# 20220825 ver1 requirements
#pymysql
#pandas
google
google-cloud
google-cloud-vision
google-api-python-client 
wget
#pillow

CodeCommit push

git add .

git commit -m "Image Process code test1"

git push

CodeCommit에 잘 업로드 되었다.

Codebuild

빌드가 성공했다.

테스트

이미지를 업로드 했다.
cloudwatch 로그그룹을 보자
...... 권한을 추가하러가자
MLops에 권한을 추가해준다.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CreateRepository",
                "ecr:ReplicateImage",
                "ecr:SetRepositoryPolicy"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:DescribeLogStreams"
            ],
            "Resource": "*"
        }
    ]
}

이미지를 업로드 했을 때, 에러 없이 Detect_document is done 메세지를 볼 수 있었다.

TextProcessing

RDS 생성

ECR 생성

생성된 ECR에 TextProcess 코드를 build해서 push해준다.

Lambda생성

방금 생성한 TextProcess container image로 Lambda를 생성해준다.

일반 구성에서 제한시간을 1분 이상으로 변경해준다.

VPC를 연결해준다.

RDS가 ap-northeast-2a에 생성되어있기 때문에 서브넷도 2a에 연결되어있는 Public, Private 서브넷을 사용해준다.

private하나만 사용해되 되지만, 생성시 가용성을 위해 2개이상을 선택하게 되어있다.

CodeCommit 생성

app.py

import os
import json
import pandas as pd
import rds_connect
>
class NUTIRITION:
  allergys_keyword = []
  allergy_dict =  {}
  ingredient_list = []
  nutrition_keyword = []
  nutrition_list = []
>
  def __init__(self):
    self.allergys_keyword = ['계란', '우유', '땅콩', '견과류', '밀', '갑각류', '대두', '메밀', '육류', '생선', '과일']
    self.allergy_dict =  {i:'0' for i in self.allergys_keyword}
    self.ingredient_list = []
    self.nutrition_keyword = set(['영양정보', '영양 정보'])
    self.nutrition_list = []
>
  def Nutrition_Processing(self, word_list):
    for word in word_list[1:]:
      if '%' in word:
        item_list = word.split('%')
        for item in item_list:
          item = item.strip()
          if item:
             self.nutrition_list.append(item+'%')
      else:
        self.nutrition_list.append(word)
>
  def Check_Word(self, labels):
    flag = True
    for idx, label in enumerate(labels):
      label = label.strip()
      label = label.replace(',', '|')
>
      for key in self.allergys_keyword:
        if key in label:
          print(key, label)
          self.allergy_dict[key] = '1'
>
      if(label in self.nutrition_keyword):
        print('in')
        self.Nutrition_Processing(labels[idx:-2])
        break
>
      else:
        self.ingredient_list.append(label)
>
>
def handler(event, context):
    datas = event['responsePayload']['body']
    key = event['responsePayload']['key']
    bucket = event['responsePayload']['bucket']
    print(key, bucket)
>
    df = pd.DataFrame(columns=['description', 'box', 'height'])
    for data in datas:
        description = data['description']
        box = data['box']
        height = box[0]/20 + box[1]
        df.loc[len(df.index)]=[description, box, height]
>
    sorted_df = df.sort_values(by='height')
    show_labels = sorted_df.description.tolist()
    process = NUTIRITION()
    process.Check_Word(show_labels)
>
    try:
        rds_connect.Insert_RDS(process, key, bucket)
        return {
            'statusCode': 200,
            'body': json.dumps('Hello from Lambda!')
        }
    except:
        return{
            'statusCode': 400,
            'body': json.dumps('ERROR!')
        }

rds_connect.py

def dict_to_query(dic):
    keys, vals = [], []
>
    for key, val in dic.items():
        if(val):
            keys.append(key)
            vals.append('\"'+str(val)+('\"')) 
>
    return ((', ').join(keys), (', ').join(vals))
>
>
def Insert_RDS(text_data, ITEM_KEY, bucket):
    import json
    import sys
    import logging
    import rds_config
    import pymysql
    import os
    #rds settings
    rds_endpoint  = rds_config.rds_endpoint
    name = rds_config.db_username
    password = rds_config.db_password
    db_name = rds_config.db_name
    table_name = rds_config.table_name
>
    # dict_keys(['Item_id', 'Item_URL', 'Item_key', '계란', '우유', '땅콩', '견과류', '밀', '갑각류', '대두', '메밀', '육류', '생선', '과일', 'Nutirition', 'Ingredient']
    item_dict = {k:v[0] for k, v in rds_config.rds_keys.items()}
>
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
>
    # Connect to RDS
    try:
        conn = pymysql.connect(host=rds_endpoint, user=name, passwd=password, db=db_name, connect_timeout=5)
        print('connected')
    except pymysql.MySQLError as e:
        logger.error("ERROR: Unexpected error: Could not connect to MySQL instance.")
        logger.error(e)
        sys.exit(1)
>
    logger.info("SUCCESS: Connection to RDS MySQL instance succeeded")
>
>
    #Init Table
    try:
        ### CREATE TABLE
        sql_create_option = []
        for k, v in rds_config.rds_keys.items():
            sql_create_option.append(k+' '+v[1])
        sql_create_option = (', ').join(sql_create_option)
        print('init table :',sql_create_option )
        with conn.cursor() as cur:
            cur.execute("create table if not exists "+table_name+" ( "+sql_create_option+" )")
            conn.commit()
        print('init table successed')
    except pymysql.MySQLError as e:
        logger.error("ERROR: Init Table Error")
        logger.error(e)
        sys.exit(2)
>
>
    #Set Item_ID
    item_dict['Item_id'] = ITEM_KEY.split('/')[-1]
    print(item_dict['Item_id'])
>
    '''
    #Init ROW
    data_keys, data_vals = dict_to_query(item_dict)
    try:
        with conn.cursor() as cur:
            cur.execute('insert into '+table_name+' ('+data_keys+') values('+data_vals+') WHERE NOT EXIST (SELECT Item_id FROM '+table_name+' WHERE Item_id = '+item_dict['Item_id']+';')
            conn.commit()
        print("Added items from RDS MySQL table")
>
    except:
        logger.error("ERROR: Init Row Error")
        sys.exit(3)
    '''
>
    # Item INSERT
    ## fit text_data to item_dict
    for k, v in text_data.allergy_dict.items():
        item_dict[k] = v
    >
    item_dict['Nutrition'] = ('|').join(text_data.nutrition_list)
    item_dict['Ingredient'] = ('|').join(text_data.ingredient_list)
    print(item_dict)
    >
    data_keys, data_vals = dict_to_query(item_dict)
    print('insert into '+table_name+' ('+data_keys+') values('+data_vals+')')
>
    try:
        with conn.cursor() as cur:
            cur.execute('insert into '+table_name+' ('+data_keys+') values('+data_vals+')')
            conn.commit()
        print("Added items from RDS MySQL table")
>
    except:
        logger.error("ERROR: Insert Fail.")
        sys.exit(2)

rds_config.py

#config file containing credentials for RDS MySQL instance
# for test
import os
rds_endpoint = os.environ['RDS_ENDPOINT']
db_username = os.environ['USERNAME']
db_password = os.environ['PASSWORD']
db_name = os.environ['DB_NAME']
table_name = os.environ['TABLE_NAME']
>
rds_keys={
    'Item_id': ['', 'varchar(200)'],
    'Item_URL': ['', 'varchar(200)'],
    'Item_key': ['',  'varchar(200)'],
    '계란': ['0', 'boolean default 0'],
    '우유': ['0', 'boolean default 0'],
    '땅콩': ['0', 'boolean default 0'],
    '견과류': ['0', 'boolean default 0'],
    '밀': ['0', 'boolean default 0'],
    '갑각류': ['0', 'boolean default 0'],
    '대두': ['0', 'boolean default 0'],
    '메밀': ['0', 'boolean default 0'],
    '육류': ['0', 'boolean default 0'],
    '생선': ['0', 'boolean default 0'],
    '과일': ['0', 'boolean default 0'],
    'Nutirition': ['', 'varchar(1000)'],
    'Ingredient' : ['', 'varchar(2000)']
}

Dockerfile

FROM public.ecr.aws/lambda/python:3.7
>
# Copy function code
COPY app.py ${LAMBDA_TASK_ROOT}
COPY rds_config.py ${LAMBDA_TASK_ROOT}
COPY rds_connect.py ${LAMBDA_TASK_ROOT}
# Install the function's dependencies using file requirements.txt
# from your project folder.
>
#Set Environ value for RDS
ENV RDS_ENDPOINT rds-cocudeny.cidsblwezpmk.ap-northeast-2.rds.amazonaws.com
ENV USERNAME cocudeny
ENV PASSWORD Kosa0401!
ENV DB_NAME item_ingredients
ENV TABLE_NAME Item
>
COPY requirements.txt  .
RUN  pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "app.handler" ]

requirmentes.txt

pymysql
pandas
pillow

buildspec.yaml

version: 0.2
>
phases:
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - ls
      - aws ecr get-login-password --region ap-northeast-2 | docker login --username AWS --password-stdin <account>.dkr.ecr.<region>.amazonaws.com
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - cat app.py
      - docker build -t <ECR Container Image name> .
      - docker tag cocudeny_rds:latest <account>.dkr.ecr.<region>.amazonaws.com/<ECR Container Image name>:latest
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker image...
      - docker push <account>.dkr.ecr.<region>.amazonaws.com/<ECR Container Image name>:latest
      - echo Deploy New Image to Lambda function...
      - aws lambda update-function-code --region <region> --function-name <Lambda Function name> --image-uri <account>.dkr.ecr.<region>.amazonaws.com/<ECR Container Image name>:latest

ImageProcess에서 TextProcess함수 대상으로 연결

CodeBuild 생성

새 역할을 만들어준다.

프로젝트가 생성되었다.

TextProcess-CodeBuild 역할 정책 추가

CodeBuildBasePolicy-TextProcess_CICD-ap-northeas는 새로 역할을 생성하면 자동으로 생성되는 정책이다.
그 외에도 AmazonEC2ContainerRegistryPowerUser와 AWSLambdaBasicExecutionRole를 생성해준다.

S3-to-RDS-CICD-role

업로드중..

이건 ImageProcess에서 추가했던 규칙을 하나로 모아준것이다.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ecr:SetRepositoryPolicy",
                "ecr:GetRepositoryPolicy"
            ],
            "Resource": "arn:aws:ecr:<region>:<Account>:repository/<ECR Repo name>/"
        },
        {
            "Sid": "ECRPullPolicy",
            "Effect": "Allow",
            "Action": [
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage"
            ],
            "Resource": [
                "arn:aws:ecr:<region>:<Account>:repository/<ECR Repo name>/"
            ]
        },
        {
            "Sid": "ECRAuthPolicy",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken"
            ],
            "Resource": [
                "arn:aws:ecr:<region>:<Account>:repository/<ECR Repo name>/"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "lambda:UpdateFunctionCode"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

CodeBuild 빌드시작

성공했다.

TextProcess Lambda 함수 테스트

{
  "responsePayload": {
    "body": [
      {
        "description": "영양 정보 ",
        "box": [48, 94, 654, 262]
      }
    ],
    "bucket": "s3.cocudeny",
    "key" : "Images/capture.JPG"
  }
}

위 테스트 코드로 테스트를 진행한다
많은 에러가 있었지만, Syntax에러를 고치니 결국 성공했다.

해야할일

Pipeline 연결

인프라 구축 스택생성

HyeonKi Jo

Talking Potato

이전 포스트

220831

다음 포스트

220901

[데일리]클라우드 강의

오늘의 할일

ECR 생성

ECR 생성 권한 연결

ECR 프라이빗으로 생성

ECR에 이미지 업로드

Lambda함수 생성

람다 구성

CodeCommit 생성

CodeBuild

프로젝트 생성

에러

현재 신뢰관계

빌드 시작

ImageProcessing 생성

S3생성

Lambda와 S3연결

ImageProcessing 컨테이너 이미지 배포

app.py

Dockerfile

requirements.txt

CodeCommit push

Codebuild

테스트

TextProcessing

RDS 생성

ECR 생성

Lambda생성

CodeCommit 생성

app.py

rds_connect.py

rds_config.py

Dockerfile

requirmentes.txt

buildspec.yaml

ImageProcess에서 TextProcess함수 대상으로 연결

CodeBuild 생성

TextProcess-CodeBuild 역할 정책 추가

S3-to-RDS-CICD-role

CodeBuild 빌드시작

TextProcess Lambda 함수 테스트

해야할일

220831

220902

0개의 댓글

관련 채용 정보