220910

HyeonKi Jo·2022년 9월 10일
0

오늘 할일

  • Text분석 코드 수정하기
  • 이미지에서 Item ID 얻는코드 다시하기
  • Image_Url컬럼을 삭제하던지 키도 마찬가지
  • Item 테이블을 새로 만들어야함
  • 상품명을 추가해야한다면 크롤링에 추가해서 다시 크롤링하기
  • 새로 만들고 이미지 데이터베이스에 넣기

AI_Text_Process 코드 수정

  • 이미지 이름으로 Item_ID를 얻는 코드를 다시 넣어야한다.
  • Image_Url컬럼을 삭제하고, 이미지이름을 Item_ID, 이미지 번호를 Item_key로 넣는다.
  • Item 따라서 Item테이블을 DROP하고 다시 생성한다.

git clone

  • git clone <CodeCommit HTTPS 주소>

rds_connect.py

def dict_to_query(dic):
    keys, vals = [], []
>
    for key, val in dic.items():
        if(val):
            keys.append(key)
            vals.append('\"'+str(val)+('\"')) 
>
    return ((', ').join(keys), (', ').join(vals))
>
>
def Insert_RDS(text_data, ITEM_KEY, bucket):
    import json
    import sys
    import logging
    import rds_config
    import pymysql
    import os
    #rds settings
    rds_endpoint  = rds_config.rds_endpoint
    name = rds_config.db_username
    password = rds_config.db_password
    db_name = rds_config.db_name
    table_name = rds_config.table_name
>
    # dict_keys(['Item_id', 'Item_URL', 'Item_key', '계란', '우유', '땅콩', '견과류', '밀', '갑각류', '대두', '메밀', '육류', '생선', '과일', 'Nutirition', 'Ingredient']
    item_dict = {k:v[0] for k, v in rds_config.rds_keys.items()}
>
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
>
    # Connect to RDS
    try:
        conn = pymysql.connect(host=rds_endpoint, user=name, passwd=password, db=db_name, connect_timeout=5)
        print('connected')
    except pymysql.MySQLError as e:
        logger.error("ERROR: Unexpected error: Could not connect to MySQL instance.")
        logger.error(e)
        sys.exit(1)
>
    logger.info("SUCCESS: Connection to RDS MySQL instance succeeded")
>
>
    #Init Table
    try:
        ### CREATE TABLE
        sql_create_option = []
        for k, v in rds_config.rds_keys.items():
            sql_create_option.append(k+' '+v[1])
        sql_create_option = (', ').join(sql_create_option)
        print('init table :',sql_create_option )
        with conn.cursor() as cur:
            cur.execute("create table if not exists "+table_name+" ( "+sql_create_option+" )")
            conn.commit()
        print('init table successed')
    except pymysql.MySQLError as e:
        logger.error("ERROR: Init Table Error")
        logger.error(e)
        sys.exit(2)
>
>
    #Set Item_ID (코드 추가)
    item_dict['Item_id'] = ITEM_KEY.split('/')[-1]
    item_dict['Item_key'] = item_dict['Item_id'].split('.')[0]
>
    # Item INSERT
    ## fit text_data to item_dict
    for k, v in text_data.allergy_dict.items():
        item_dict[k] = v
    >
    item_dict['Nutrition'] = ('|').join(text_data.nutrition_list)
    item_dict['Ingredient'] = ('|').join(text_data.ingredient_list)
    print(item_dict)
    >
    data_keys, data_vals = dict_to_query(item_dict)
    print('insert into '+table_name+' ('+data_keys+') values('+data_vals+')')
>
    try:
        with conn.cursor() as cur:
            cur.execute('insert into '+table_name+' ('+data_keys+') values('+data_vals+')')
            conn.commit()
        print("Added items from RDS MySQL table")
>
    except pymysql.err.InternalError as e:
    	code, msg = e.args
    	logger.error("ERROR: Insert Fail. Code:",code," message:", msg)
    	sys.exit(2)
  • Item_ID, Item_key 부분을 추가했다.
  • git add .
  • git commit -m "<MESSAGE>"
  • git push
  • 잘 push 되었다.
  • 이어서 CodePipeline으로 AWS Lambda에 자동으로 Deploy까지 되었다.

어제 크롤링한 이미지 URL을 통해 S3에 업로드 하기

Image_Upload.py

import boto3
import pandas as pd
import urllib.request
import time

# S3 Client
s3 = boto3.client('s3')
bucket_name = 'ai-item-image'
Ori_folder_name = 'Ori_Images'
Det_folder_name = 'Images'

# read cvs
PATH = '/home/ec2-user/environment/image_save/'
csv_path = PATH+'marketkurly_Crawling3.csv'
img_df = pd.read_csv(csv_path)
img_df.columns = ['Image_ID', 'Item_Name', 'Ori_img_URL', 'Detail_img_URL']

# for each row
for i in range(len(img_df)):
    # get Image
    ori_url = img_df.iloc[i]['Ori_img_URL']
    det_url = img_df.iloc[i]['Detail_img_URL']

    ori_save_path = PATH + Ori_folder_name + '/' + str(i) + ".jpg"
    det_save_path = PATH + Det_folder_name +'/' + str(i) + ".jpg"

    urllib.request.urlretrieve(ori_url, ori_save_path)
    urllib.request.urlretrieve(det_url, det_save_path)


    # Upload Image to S3
    s3.upload_file(ori_save_path, bucket_name, Ori_folder_name + '/' + str(i) + ".jpg")
    s3.upload_file(det_save_path, bucket_name, Det_folder_name +'/' + str(i) + ".jpg")

    # maybe sleep 1 sec
    time.sleep(1)
  • 상품 이미지(Ori_img)와, 상품 상세 이미지(Det_img)를 urllib.request.urlretrieve모듈로 다운로드 받고 그를 각각 S3폴더에 저장하는 코드이다.
  • Ori_imgS3Ori_Images 폴더로 들어가고, Det_imgS3Det_img로 들어가게 된다.
  • 이미지들이 잘 업로드 되었다.

CloudWatch 확인

  • 이미지 96장을 한번에 업로드하니 수많은 CloudWatch 로그그룹이 생성되었다.
  • 그래도 오류는 나오지 않은 것 같다.
  • 참고로 39번 이미지는 위와같은 알러지 정보를 가지고 있다.

DB 확인

  • SELECT * FROM <Table이름>
  • SELECT Item_ID FROM <Table이름> ORDER BY Item_ID

  • 총 96개의 이미지중 84개만 DB에 삽입되었다.
  • 이유를 찾아야 할 것 같다.

고객 주문 리스트 DB에 올리기

rds_config.py

#config file containing credentials for RDS MySQL instance
# for test
import os
rds_endpoint = '<AWS RDS Endpoint>'
db_username = '<Username>'
db_password = '<PassWord>'
db_name = '<Database name>'
table_name = '<Table Name>'

rds_keys={
    'Items': ['', 'varchar(1000) not null'],
    'Custommer_ID': [0, 'int not null'],
    'Custommer_Name': ['', 'varchar(50) not null'],
    'Allergy_List': ['', 'varchar(1000)']
}

order_list.py

import pandas as pd
import pymysql
# read cvs
PATH = 'CSV파일 경로'
csv_path = PATH+'total_Custommer.csv'
img_df = pd.read_csv(csv_path, index_col='Unnamed: 0')
img_df.columns = ['Items', 'Custommer_ID', 'Custommer_Name', 'Allergy_List']

def str_list_to_list(x):
    return (', ').join([item[1:-1] for item in x[1:-1].split(', ')])

img_df['Items'] = img_df['Items'].apply(str_list_to_list)
img_df['Allergy_List'] = img_df['Allergy_List'].apply(str_list_to_list)


################################################################################
import rds_config
rds_endpoint  = rds_config.rds_endpoint
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
table_name = rds_config.table_name


# Connect to RDS
try:
    conn = pymysql.connect(host=rds_endpoint, user=name, passwd=password, db=db_name, connect_timeout=5)
    print('connected')
except pymysql.MySQLError as e:
    print(e)

print("SUCCESS: Connection to RDS MySQL instance succeeded")


#Init Table

### CREATE TABLE
sql_create_option = []
for k, v in rds_config.rds_keys.items():
    sql_create_option.append(k+' '+v[1])
sql_create_option = (', ').join(sql_create_option)
print('init table :',sql_create_option )
with conn.cursor() as cur:
    cur.execute("create table if not exists "+table_name+" ("+sql_create_option+")")
    conn.commit()
print('init table successed')


# Item INSERT

data_keys = (', ').join(img_df.columns.tolist())

error_list = []
for i in range(len(img_df)):
    A, B, C, D = tuple(img_df.iloc[i][keyword] for keyword in img_df.columns.tolist())
    data_vals = '\'{}\', {}, \'{}\', \'{}\''.format(A, B, C, D)
    print('insert into '+table_name+' ('+data_keys+') values('+data_vals+')')

    try:
        with conn.cursor() as cur:
            cur.execute('insert into '+table_name+' ('+data_keys+') values('+data_vals+')')
            conn.commit()
        print("Added items from RDS MySQL table")

    except pymysql.err.InternalError as e:
    	print(e)
    	error_list.append(i)

print(error_list)

주문 리스트에서 고객 명단 리스트 만들어서 RDS 업로드

custommer_list.py

import pandas as pd
import pymysql
# read cvs
PATH = '<CSV파일 경로>'
csv_path = PATH+'total_Custommer.csv'
img_df = pd.read_csv(csv_path, index_col='Unnamed: 0')
img_df.columns = ['Items', 'Custommer_ID', 'Custommer_Name', 'Allergy_List']
>
def str_list_to_list(x):
    return (', ').join([item[1:-1] for item in x[1:-1].split(', ')])
    >
cus_list = pd.DataFrame(columns=['Custommer_ID', 'Custommer_Name'])
cus_list['Custommer_ID'] = img_df['Custommer_ID']
cus_list['Custommer_Name'] = img_df['Custommer_Name']
cus_list = cus_list.sort_values(['Custommer_ID'])
cus_list = cus_list.drop_duplicates()
>
>
################################################################################
import rds_config
rds_endpoint  = rds_config.rds_endpoint
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
table_name = 'Custommer_list'
>
>
# Connect to RDS
try:
    conn = pymysql.connect(host=rds_endpoint, user=name, passwd=password, db=db_name, connect_timeout=5)
    print('connected')
except pymysql.MySQLError as e:
    print(e)
>
print("SUCCESS: Connection to RDS MySQL instance succeeded")
>
>
#Init Table
>
### CREATE TABLE
>
sql_create_option = 'Custommer_ID int not null PRIMARY KEY, Custommer_Name varchar(50) not null'
print('init table :',sql_create_option )
with conn.cursor() as cur:
    cur.execute("create table if not exists "+table_name+" ("+sql_create_option+")")
    conn.commit()
print('init table successed')
>
    >
# Item INSERT
>
data_keys = (', ').join(cus_list.columns.tolist())
>
error_list = []
print(len(cus_list))
>
for i in range(len(cus_list)):
    A, B = tuple(cus_list.iloc[i][keyword] for keyword in cus_list.columns.tolist())
    data_vals = '{}, \'{}\''.format(A, B)
    print('insert into '+table_name+' ('+data_keys+') values('+data_vals+')')
>
    try:
        with conn.cursor() as cur:
            cur.execute('insert into '+table_name+' ('+data_keys+') values('+data_vals+')')
            conn.commit()
        print("Added items from RDS MySQL table")
    
    except pymysql.err.InternalError as e:
    	print(e)
    	error_list.append(i)
>
print(error_list)

결과 확인

  • 데이터베이스에 크롤링으로 생성한 데이터가 모두 올라갔다.
profile
Talking Potato

0개의 댓글