AICE-기초데이터2

이강민·2023년 8월 1일
0

AICE

목록 보기
3/18
post-thumbnail

필요 데이터 변경하기

데이터 추가하기

import pandas as pd

flight = pd.read_csv('/Clean_Dataset.csv', encoding='cp949')
# price2 컬럼을 새로 만듦
flight['price2'] = flight['price']*2
#price와 price2를 이용하여 price3 칼럼 데이터를 만듦
flight['price3'] = flight['price'] + flight['price2']
#flight의 상위 5개 데이터를 가져옴
flight.head()
	Unnamed: 0	airline	flight	source_city	departure_time	stops	arrival_time	destination_city	class	duration	days_left	price	price2	price3
0	0	SpiceJet	SG-8709	Delhi	Evening	zero	Night	Mumbai	Economy	2.17	1	5953	11906	17859
1	1	SpiceJet	SG-8157	Delhi	Early_Morning	zero	Morning	Mumbai	Economy	2.33	1	5953	11906	17859
2	2	AirAsia	I5-764	Delhi	Early_Morning	zero	Early_Morning	Mumbai	Economy	2.17	1	5956	11912	17868
3	3	Vistara	UK-995	Delhi	Morning	zero	Afternoon	Mumbai	Economy	2.25	1	5955	11910	17865
4	4	Vistara	UK-963	Delhi	Morning	zero	Morning	Mumbai	Economy	2.33	1	5955	11910	17865

데이터를 원하는 위치에 추가하기

df.insert(loc,column,value,allow_duplicates=False)
-loc : 삽입될 열의 위치
-column : 삽입될 열의 이름
-value:삽입될 열의 값
-allow_duplicates : True일 경우 중복 열의 삽입 허용
import pandas as pd

flight = pd.read_csv('/Clean_Dataset.csv', encoding='cp949')
# 0번째를 기준으로 10번째 자리에 duration2 칼럼을 duration의 2배 수의 데이터를 넣었다. 
flight.insert(10, 'duration2', flight['duration']*2)
flight

데이터 삭제하기

axis = 1 : 열을 기준으로 데이터를 삭제 
axis = 0 : 행을 기준으로 데이터를 삭제
inplace : True일 경우 원본 데이터에서 지움

열을 기준으로 삭제

import pandas as pd

flight = pd.read_csv('/Clean_Dataset.csv', encoding='cp949')
flight.drop('price', axis=1).head()

Unnamed: 0	airline	flight	source_city	departure_time	stops	arrival_time	destination_city	class	duration	days_left
0	0	SpiceJet	SG-8709	Delhi	Evening	zero	Night	Mumbai	Economy	2.17	1
1	1	SpiceJet	SG-8157	Delhi	Early_Morning	zero	Morning	Mumbai	Economy	2.33	1
2	2	AirAsia	I5-764	Delhi	Early_Morning	zero	Early_Morning	Mumbai	Economy	2.17	1
3	3	Vistara	UK-995	Delhi	Morning	zero	Afternoon	Mumbai	Economy	2.25	1
4	4	Vistara	UK-963	Delhi	Morning	zero	Morning	Mumbai	Economy	2.33	1

행을 기준으로 삭제

import pandas as pd

flight = pd.read_csv('/Clean_Dataset.csv', encoding='cp949')
flight.drop('price', axis=0).head()

Unnamed: 0	airline	flight	source_city	departure_time	stops	arrival_time	destination_city	class	duration	days_left	price
1	1	SpiceJet	SG-8157	Delhi	Early_Morning	zero	Morning	Mumbai	Economy	2.33	1	5953
2	2	AirAsia	I5-764	Delhi	Early_Morning	zero	Early_Morning	Mumbai	Economy	2.17	1	5956
3	3	Vistara	UK-995	Delhi	Morning	zero	Afternoon	Mumbai	Economy	2.25	1	5955
4	4	Vistara	UK-963	Delhi	Morning	zero	Morning	Mumbai	Economy	2.33	1	5955
5	5	Vistara	UK-945	Delhi	Morning	zero	Afternoon	Mumbai	Economy	2.33	1	5955

원본 데이터 삭제

삭제 후 데이터 프레임에 저장

import pandas as pd

flight = pd.read_csv('/Clean_Dataset.csv', encoding='cp949')
flight = flight.drop(index=0, axis=0)
flight.head()

inplace 옵션 사용

import pandas as pd

flight = pd.read_csv('/Clean_Dataset.csv', encoding='cp949')
flight.drop(index=0, axis=0, inplace = True).head()
	Unnamed: 0	airline	flight	source_city	departure_time	stops	arrival_time	destination_city	class	duration	days_left	price
1	1	SpiceJet	SG-8157	Delhi	Early_Morning	zero	Morning	Mumbai	Economy	2.33	1	5953
2	2	AirAsia	I5-764	Delhi	Early_Morning	zero	Early_Morning	Mumbai	Economy	2.17	1	5956
3	3	Vistara	UK-995	Delhi	Morning	zero	Afternoon	Mumbai	Economy	2.25	1	5955
4	4	Vistara	UK-963	Delhi	Morning	zero	Morning	Mumbai	Economy	2.33	1	5955
5	5	Vistara	UK-945	Delhi	Morning	zero	Afternoon	Mumbai	Economy	2.33	1	5955

칼럼명 변경하기

변수.rename(columns = {'기존칼럼명':'변경 칼럼명', ...})
import pandas as pd

flight = pd.read_csv('/Clean_Dataset.csv', encoding='cp949')
flight = flight.rename(columns = {'airline' : 'airline_name'})
flight.head()

Unnamed: 0	airline_name	flight	source_city	departure_time	stops	arrival_time	destination_city	class	duration	days_left	price
0	0	SpiceJet	SG-8709	Delhi	Evening	zero	Night	Mumbai	Economy	2.17	1	5953
1	1	SpiceJet	SG-8157	Delhi	Early_Morning	zero	Morning	Mumbai	Economy	2.33	1	5953
2	2	AirAsia	I5-764	Delhi	Early_Morning	zero	Early_Morning	Mumbai	Economy	2.17	1	5956
3	3	Vistara	UK-995	Delhi	Morning	zero	Afternoon	Mumbai	Economy	2.25	1	5955
4	4	Vistara	UK-963	Delhi	Morning	zero	Morning	Mumbai	Economy	2.33	1	5955

데이터프레임 정렬하기

변수.sort_values(by = '칼럼 명', ascending = True | False)
import pandas as pd

flight = pd.read_csv('/Clean_Dataset.csv', encoding='cp949')
# 칼럼명 변경하고 price를 기준으로 정렬함.
flight = flight.rename(columns = {'airline' : 'airline_name'}).sort_values(by='price',ascending=True)
flight.head()
	Unnamed: 0	airline_name	flight	source_city	departure_time	stops	arrival_time	destination_city	class	duration	days_left	price
205012	205012	Indigo	6E-605	Chennai	Afternoon	one	Evening	Hyderabad	Economy	4.75	31	1105
205754	205754	Indigo	6E-605	Chennai	Afternoon	one	Night	Hyderabad	Economy	10.08	39	1105
205024	205024	Indigo	6E-6137	Chennai	Morning	one	Evening	Hyderabad	Economy	8.83	31	1105
204736	204736	AirAsia	I5-517	Chennai	Morning	zero	Morning	Hyderabad	Economy	1.17	28	1105
205023	205023	Indigo	6E-6113	Chennai	Afternoon	one	Night	Hyderabad	Economy	8.67	31	1105
profile
배움은 끝없이

0개의 댓글