Feature Engineering

제이브로·2021년 11월 30일

AI부트캠프 Data Preprocess EDA Feature Engineering apply함수 string replace 코드스테이츠

AI부트캠프

목록 보기

4/32

Feature Engineering

Data Preprocess & EDA

1. Feature Engineering

Q . Feature Engineering이란?
A . 도메인 지식과 창의성을 바탕으로 데이터셋에 존재하는 Feature들을 재조합하여 새로운 Feature를 만드는 것

도메인 지식이란? 배경지식 (ex. 게임데이터이면 게임에 대한 배경지식)

DataFrame : 테이블 형태의 데이터
Dataset : 데이터를 흔히 데이터셋이라고 부른다.
- 'object'는 String의 Data type 이다.

Q . Na, Null, NaN, 0, Undefined 의 차이는?

2. String

string = 문자열

2.1 string replace

string.replace(oldvalue, newvalue, count)

oldvalue : 바꾸고 싶은 string
newvalue : 변환하고 싶은 string
count : (optional) 몇개 / default는 모든 단어

Q . 🔥 Python Strings

''' 단어 ''' : 문자열 처리

""" 단어 """ : 문자열 처리

for x in "banana": print(x)
# b a n a n a 를 하나씩 한줄씩 출력

txt = "free time!"; print("free" in txt)
# True 참 거짓이 판별된다.

2.2 Type casting

Q . type casting이란?
A . 변수의 type을 강제로 다른 type으로 변경하는 것이다.
ex. int(123.4) / float(123.4) / str(123.4)

2.3 as Function

String을 int로 변환

def toInt(string):
    return int(string.replace(',',''))
    
toInt('25,970')
# 25970
# datatype : integer

Q . as Function

함수인자(argument)가 불명확할 때 *를 파라미터에 붙인다.

def my_function(*kids):
print("The youngest child is " + kids[2])
my_function("Emil", "Tobias", "Linus")

# The youngest child is Linus

*args : Arbitrary Arguments을 줄여서 부른다.

파라미터를 고정할 수 있다.

def my_function(country = "Korea"):
  print("I am from " + country)

my_function("Sweden")
my_function()

# I am from Sweden
# I am from Korea

3. Apply

3.1 apply 사용법

column 단위로 함수 적용
1. apply 안에 들어갈 함수를 선언
2. column에 apply 적용.

# column 단위로 string값을 int화
df['자산2'] = df['자산'].apply(toInt)

print(df)

👉 과정 한눈에 보기

제이브로

기록하지 않으면 기록되지 않는다.

이전 포스트

EDA

다음 포스트

Feature Engineering

AI부트캠프

Feature Engineering

Data Preprocess & EDA

1. Feature Engineering

2. String

2.1 string replace

2.2 Type casting

2.3 as Function

3. Apply

3.1 apply 사용법

👉 과정 한눈에 보기

EDA

Data Manipulation

0개의 댓글