Data Type

김윤하·2023년 10월 4일

Data Engineer

목록 보기

6/6

data type?

구분가능한 데이터 형태들로 그 종류는 비슷하지만 자신이 사용하고자 하는 데이터 파일 포멧이나 DB에 따라서 다른 형타입이 될 수도 있고 또는 해당 타입을 사용 못 할 수도 있다.
특히 데이터를 다루는 사람들은 이에 대한 이해를 하고 있다면 메모리 관리, 형타입 변환 등 다양하게 활용할 수 있다.

(1) dataframe data type

data type

data	data type	string aliases
tz-aware datetime	DatetimeTZDtype	'datetime64[ns, ]'
Categorical	CategoricalDtype	'category'
period (time spans)	PeriodDtype	'period[]'
sparse (희소행렬)	SparseDtype	'Sparse', 'Sparse[int]', 'Sparse[float]'
intervals	IntervalDtype	'interval', 'Interval', 'Interval[<numpy_dtype>]', 'Interval[datetime64[ns, ]]', 'Interval[timedelta64[]]'
nullable integer	Int64Dtype, ...	'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'
nullable float	Float64Dtype, …	'Float32', 'Float64'
Strings	StringDtype	'string'
Boolean (with NA)	BooleanDtype	'boolean'

check dataframe memory use

import sys
sys.getsizeof(df)

df.info()
df.memory_usage()
df.memory_usage(index=True, deep=True).sum()

(2) parquet file data type

Parquet	Transformation	Range
Binary	Binary	1 to 104,857,600 bytes
Binary (UTF8)	String	1 to 104,857,600 characters
Boolean	Integer	-2,147,483,648 to 2,147,483,647 Precision of 10, scale of 0
Date	Date/Time	January 1, 0001 to December 31, 9999
Decimal	Decimal	Decimal value with declared precision and scale. Scale must be less than or equal to precision
Double	Double	Precision of 15 digits
Float	Double	Precision of 15 digits
Int32	Integer	-2,147,483,648 to 2,147,483,647, Precision of 10
Int64	Bigint	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807, Precision of 19
Map	Map	Unlimited number of characters
Struct	Struct	Unlimited number of characters
Time	Date/Time	Time of the day. Precision to microsecond. Timestamp Date/Time January 1, 0001 00:00:00 to December 31, 9999 23:59:59.997
group (LIST)	Array	Unlimited number of characters

https://pandas.pydata.org/docs/user_guide/basics.html#dtypes
https://parquet.apache.org/docs/file-format/types/

김윤하

data engineer

이전 포스트

Data Type

Data Engineer

data type?

(1) dataframe data type

data type

check dataframe memory use

category

(2) parquet file data type

Datahub

0개의 댓글

관련 채용 정보