데이터 원본 출처: Target Data(CSV): 화장품의 종류, 브랜드, 제품명, 성분명 등이 포함된 csv 파일
Source: Kaggle
성분 사전: 화장품 전성분의 한글명, 표준영문명, 구명칭, 구 영문명이 포함된 pdf 파일
Source: 대한 화장품협회
DownLoad: 별첨1. 표준화명칭목록_220530.pdf
작업절차
파이썬으로 PDF 파일을 읽은 후, pickle 이라는 포맷의 데이터로 저장합니다.
파이썬으로 PDF 파일을 읽기 위해 필요한 모듈
pip install tabula-py
import pandas as pd
from grading import *
df_target = pd.read_csv('../data/cosmetics.csv')
df_target = df_target.iloc[:5] # 5개의 row만 사용합니다
df_target

import tabula
import pickle
# Tabula로 PDF 읽기 -> DataFrame List
ingredients_list = tabula.read_pdf('../data/별첨1. 표준화명칭목록_220530.pdf',encoding='cp949', pages='all', lattice=True)
# DataFrame List를 Pickle로 저장
with open('../data/ingredients_list.pkl', 'wb') as f:
pickle.dump(ingredients_list, f)
1.성분사전 Dataframe 만들기
ingredients_list

Pickle을 이용해 Load한 ingredients_list_pkl를 하나의 DataFrame으로 합치기
import pandas as pd
ingredients_df = pd.concat(ingredients_list_pkl, ignore_index=False)
ingredients_df


\r이 얼마나 있는지 확인
row = ingredients_df[ingredients_df.apply(lambda row: row.astype(str).str.contains('\r').any(), axis=1)]
row

ingredients_df['표준 성분명'] = ingredients_df['표준 성분명'].str.replace('\r', '')
ingredients_df['표준 영문명'] = ingredients_df['표준 영문명'].str.replace('\r', '')
ingredients_df['구명칭'] = ingredients_df['구명칭'].str.replace('\r', '')
ingredients_df['구영문명'] = ingredients_df['구영문명'].str.replace('\r', '')
ingredients_df

3.성분사전 Dataframe 내의 Data 수정
replace_dict = {
'Acetobacter/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitumit/Chrysanthemum Morifolium Fruit/Poria Cocos/ Cinnamomum Cassia': 'Acetobacter/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitumit/Chrysanthemum Morifolium Fruit/Poria Cocos/ Cinnamomum Cassia Ferment',
'Saccharomyces/Licorice Root/Rehmannia Glutinosa Root/Angelica Gigas Root/Ophiopogon Japonicus Root/Atractylodes Macrocephala Root/Paeonia Lactiflora Root/Anemarrhena Asphodeloides Root/Fraxinus Excelsior Bark/Asparagus Cochinchinensis/Phellodendron Amurense': 'Saccharomyces/Licorice Root/Rehmannia Glutinosa Root/Angelica Gigas Root/Ophiopogon Japonicus Root/Atractylodes Macrocephala Root/Paeonia Lactiflora Root/Anemarrhena Asphodeloides Root/Fraxinus Excelsior Bark/Asparagus Cochinchinensis/Phellodendron Amurense Bark Ferment Extract',
'Bacillus/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitum Fruit/Chrysanthemum Morifolium Fruit/Poria Cocos/ Cinnamomum Cassia': 'Bacillus/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitum Fruit/Chrysanthemum Morifolium Fruit/Poria Cocos/ Cinnamomum Cassia Ferment',
'Bifida/Angelica Gigas/Angelica Tenuissima Root/Antler Velvet/Rehmannia Glutinosa Root/Atractylodes Japonica Rhizome/Cnidium Officinale Root/Cordyceps Sinensis/Ledebouriella Seseloides Root/Licorice Root/Paeonia Lactiflora Root/Panax Ginseng': 'Bifida/Angelica Gigas/Angelica Tenuissima Root/Antler Velvet/Rehmannia Glutinosa Root/Atractylodes Japonica Rhizome/Cnidium Officinale Root/Cordyceps Sinensis/Ledebouriella Seseloides Root/Licorice Root/Paeonia Lactiflora Root/Panax Ginseng Root/Phellinus Linteus/Scutellaria Baicalensis Root Ferment',
'Leuconostoc/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitum Fruit/Chrysanthemum Morifolium Fruit/Poria Cocos/ Cinnamomum Cassia': 'Leuconostoc/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitum Fruit/Chrysanthemum Morifolium Fruit/Poria Cocos/ Cinnamomum Cassia Ferment',
'Saccharomyces/Anemarrhena Asphodeloides Root/Angelica Gigas Root/Asparagus Cochinchinensis/Atractylodes Macrocephala Root/Fraxinus Excelsior Bark/Licorice Root/Ophiopogon Japonicus Root/Paeonia Lactiflora Root/Phellodendron Amurense': 'Saccharomyces/Anemarrhena Asphodeloides Root/Angelica Gigas Root/Asparagus Cochinchinensis/Atractylodes Macrocephala Root/Fraxinus Excelsior Bark/Licorice Root/Ophiopogon Japonicus Root/Paeonia Lactiflora Root/Phellodendron Amurense Bark Ferment Extract',
'Saccharomyces/Camellia Japonica Flower/Castanea Crenata Shell/Diospyros Kaki Leaf/Paeonia Suffruticosa Root/Rhus Javanica/Sanguisorba Officinalis Root Extract': 'Saccharomyces/Camellia Japonica Flower/Castanea Crenata Shell/Diospyros Kaki Leaf/Paeonia Suffruticosa Root/Rhus Javanica/Sanguisorba Officinalis Root Extract Ferment Filtrate',
'Lactobacillus/Honeysuckle Flower/Licorice Root/Morus Alba Root/Pueraria Lobata Root/Schisandra Chinensis Fruit/Scutellaria Baicalensis Root/Sophora Japonica Flower': 'Lactobacillus/Honeysuckle Flower/Licorice Root/Morus Alba Root/Pueraria Lobata Root/Schizandra Chinensis Fruit/Scutellaria Baicalensis Root/Sophora Japonica Flower Extract Ferment Filtrate',
'Lactobacillus/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitum Fruit/Chrysanthemum Morifolium Fruit/Poria Cocos/Cinnamomum Cassia': 'Lactobacillus/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitum Fruit/Chrysanthemum Morifolium Fruit/Poria Cocos/Cinnamomum Cassia Ramulus Bark Ferment Filtrate',
'Saccharomyces/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitum Fruit/Chrysanthemum Morifolium Fruit/Poria Cocos/ Cinnamomum Cassia': 'Saccharomyces/Lycium Chinense Fruit/Rehmannia Glutinosa Root/Cuscuta Chinensis Fruit/Cistanche Deserticola/Zanthoxylum Piperitum Fruit/Chrysanthemum Morifolium Fruit/Poria Cocos/ Cinnamomum Cassia Ferment',
}
코드를 입력하세요
ingredients_df['표준 영문명'] = ingredients_df['표준 영문명'].replace(replace_dict)
ingredients_df
조건1: 맨 끝에 마침표('.')가 있다면 마지막 마침표만 제거하세요
ex) 'Algae (Seaweed) Extract. Sea Salt.' -> 'Algae (Seaweed) Extract. Sea Salt'
조건2: '. May Contain'를 포함하고 있다면, '. May Contain' 이후의 데이터를 제거하세요 - ex) 'Algae (Seaweed) Extract. May Contain: Sea Salt, Fragrance' -> 'Algae (Seaweed) Extract'
조건3: 아래의 replace_str_dict는 현재값(key):변경할값(value)의 쌍으로 이루어져 있습니다. 이 replace_str_dict를 이용하여 데이터를 변경하세요 - 참고: replace_str_dict의 내용만 변경하면 됩니다. 다른 누락사항을 확인하여 변경할 필요는 없습니다.
replace_str_dict = {
'Algae (Seaweed) Extract': 'Algae Extract',
'Citrus Aurantifolia (Lime) Extract': 'Citrus Aurantifolia (Lime) Fruit Extract',
'Eucalyptus Globulus (Eucalyptus) Leaf Oil': 'Eucalyptus Globulus Leaf Oil',
'Galactomyces Ferment Filtrate (Pitera)': 'Galactomyces Ferment Filtrate',
'Bacillus/Soybean/ Folic Acid Ferment Extract': 'Bacillus/Folic Acid/Soybean Ferment Extract',
'Butyrospermum Parkii (Shea Butter)': 'Butyrospermum Parkii (Shea) Butter',
'Sea Salt/Maris Sal/Sel Marin': 'Sea Salt',
'Parfum/Fragrance': 'Fragrance|Perfume|Parfum',
', Fragrance': ', Fragrance|Perfume|Parfum',
}
코드를 입력하세요
# 조건 1
df_target['Ingredients'] = df_target['Ingredients'].apply(lambda x : x[:-1] if x.endswith('.') else x)
# 조건 2
df_target['Ingredients'] = df_target['Ingredients'].apply(lambda x : x.split('. May Contain')[0].strip() if '. May Contain' in x else x)
# 조건 3
def replacing(val):
for key, replac in replace_str_dict.items():
val = val.replace(key, replac)
return val
df_target['Ingredients'] = df_target['Ingredients'].apply(replacing)

조건1: 'Ingredients' Column의 각 데이터를 ', '(쉼표+띄어쓰기)로 분리하여 List로 변환하세요
ex) ' Algae (Seaweed) Extract, Sea Salt ' -> [' Algae (Seaweed) Extract', ' Sea Salt ']
조건2: 조건1에서 변경한 list의 각 Element 앞뒤의 공백이 있다면 공백을 삭제하세요 - ex) [' Algae (Seaweed) Extract', ' Sea Salt '] -> ['Algae (Seaweed) Extract', 'Sea Salt']
조건3: 'Ingredients List' Column을 새로 생성하여 조건1과 조건2에서 만든 list를 각 행에 맞게 입력하세요
set_list = df_target['Ingredients'].str.split(", ")
for i in range(len(set_list)):
for j in range(len(set_list[i])):
set_list[i][j] = set_list[i][j].strip()
df_target['Ingredients List'] = set_list

Image Image
def map_ingredients_to_codes(ingredients_list):
code_list = []
for ingredient in ingredients_list:
# 성분사전에서 '표준 영문명' 또는 '구영문명'에 해당하는 성분코드를 찾기
# | 의미 : 두 피연산자가 set 유형이면 '|' 연산자는 두 집합의 합집합을 반환
# 합집합 기호 | [참고] : https://f7project.tistory.com/390
code = ingredients_df[(ingredients_df['표준 영문명'].str.lower() == ingredient.lower()) | (ingredients_df['구영문명'].str.lower() == ingredient.lower())]['성분코드'].values
# 성분코드가 존재한다면, code_list에 추가
if len(code) > 0:
code_list.append(code[0])
else:
code_list.append(None) # 성분사전에서 찾지 못한 경우 None을 추가
return code_list
df_target['Code List'] = df_target['Ingredients List'].apply(lambda x: map_ingredients_to_codes([ingredient.strip() for ingredient in x]))

Understanding what goes into your skincare is essential, and "Cosmetic ingredients analysis" provides valuable insights. Platforms like divinewellcare.com offer detailed breakdowns of product components, empowering consumers to make informed choices. This transparency ensures that you select products aligned with your skin’s needs while avoiding harmful substances, fostering trust and promoting healthier beauty routines.