SQL 데이터 분석 : 2.5.3 빅쿼리에서 날짜 다루기

yeppi1802·2024년 7월 18일

📖 빅쿼리에서 날짜 다루기

🔆 날짜 데이터 유형

날짜 데이터 유형예시
DATE연도-월-일자2017-11-25
DATETIME연도-월-일자 시간2017-11-25T11:10:33
TIMESTAMP연도-월-일자 시간(타임존 정보)2017-11-25 11:10:33 UTC
TIME시간11:10:33
  • UTC : 타임존 (한국 : UTC + 9)

🔆 DATE() / DATETIME() / TIMESTAMP()

SELECT
	DATE(2016, 12, 25) AS date_ymd,
	DATE(DATETIME '2016-12-25 23:59:59') AS date_dt,
	DATE(TIMESTAMP '2016-12-25 05:30:00+07', 'America/Los_Angeles') AS date_tstz;
																					 -- 시간대 설정 (한국 시간대 : 'Asia/Seoul')

SELECT
	DATETIME(2008, 12, 25, 05, 30, 00) AS datetime)ymdhms,
	DATETIME(TIMESTAMP '2008-12-25 05:30:00+00', 'America/Los_Angeles') AS datetime_tstz;

☁️ DATE() / DATETIME() / TIMESTAMP() 실습

SELECT
	order_purchase_timestamp,

	DATE(order_purchase_timestamp) AS ord_date,

	DATETIME(order_purchase_timestamp) AS ord_dt,

	TIMESTAMP(order_purchase_timestamp) AS ord_ts,
	TIMESTAMP(DATE(order_purchase_timestamp)) AS ord_date_ts, 
	          -- DATE로 먼저 감싸 시간 없음

	TIME(order_purchase_timestamp) AS ord_time
FROM `olist.olist_orders`

🔆 날짜에서 일부분만 추출하고 싶다면?

EXTRACT(‘추출할 부분’ FROM ‘날짜 컬럼’)
추출할 부분의미결과(정수)
YEAR연도2017
MONTH11
DAY25
DAYOFWEEK요일(0:일요일 ~ 6: 토요일)1
WEEK주(기본 일요일 시작)13
WEEK(MONDAY)주(월요일 시작)12
QUARTER분기2
HOUR1
MINUTE30
SECOND55

☁️ EXTRACT 실습

SELECT
  order_purchase_timestamp,
  EXTRACT(YEAR FROM order_purchase_timestamp) AS YEAR,
  EXTRACT(QUARTER FROM order_purchase_timestamp) AS QUARTER,
  EXTRACT(MONTH FROM order_purchase_timestamp) AS MONTH,
  EXTRACT(DAY FROM order_purchase_timestamp) AS DAY,
  EXTRACT(DAYOFWEEK FROM order_purchase_timestamp) AS DAYOFWEEK,
  EXTRACT(WEEK FROM order_purchase_timestamp) AS WEEK,
  EXTRACT(WEEK(MONDAY) FROM order_purchase_timestamp) AS WEEK_MONDAY,
  EXTRACT(HOUR FROM order_purchase_timestamp) AS HOUR,
  EXTRACT(MINUTE FROM order_purchase_timestamp) AS MINUTE,
  EXTRACT(SECOND FROM order_purchase_timestamp) AS SECOND
FROM `olist.olist_orders`
WHERE
  DATE(order_purchase_timestamp) = '2017-11-26' -- 일요일

🔆 날짜에서 원하는 정보까지만 남기고 싶다면?

DATE_TRUNC(‘날짜’, ‘남기는 부분’)
DATETIME_TRUNC(‘날짜’, ‘남기는 부분’)
TIMESTAMP_TRUNC(‘날짜’, ‘남기는 부분’)

추출할 부분의미결과(날짜)예시(2017-11-26 16:13:13 UTC)
YEAR연도해당 연도의 첫 날2017-01-01 00:00:00 UTC
MONTH해당 연도, 월의 첫 날2017-11-01 00:00:00 UTC
DAY해당 날짜의 0시 0분2017-11-26 00:00:00 UTC
WEEK주(기본 일요일 시작)해당 주차의 첫 날(일)2017-11-26 00:00:00 UTC
WEEK(MONDAY)주(월요일 시작)해당 주차의 첫 날(월)2017-11-20 00:00:00 UTC
QUARTER분기해당 분기의 첫 날2017-10-01 00:00:00 UTC
HOUR해당 날짜의 시2017-11-26 16:00:00 UTC
MINUTE해당 날짜의 시, 분2017-11-26 16:13:00 UTC
SECOND해당 날짜의 시, 분, 초2017-11-26 16:13:13 UTC

☁️ TIMESTAMP_TRUNC 실습

SELECT
  order_purchase_timestamp,
  TIMESTAMP_TRUNC(order_purchase_timestamp, YEAR) AS YEAR,
  TIMESTAMP_TRUNC(order_purchase_timestamp,QUARTER) AS QUARTER,
  TIMESTAMP_TRUNC(order_purchase_timestamp,MONTH) AS MONTH,
  TIMESTAMP_TRUNC(order_purchase_timestamp,DAY) AS DAY,
  TIMESTAMP_TRUNC(order_purchase_timestamp,WEEK) AS WEEK,
  TIMESTAMP_TRUNC(order_purchase_timestamp,WEEK(MONDAY)) AS WEEK_MONDAY,
  TIMESTAMP_TRUNC(order_purchase_timestamp,HOUR) AS HOUR,
  TIMESTAMP_TRUNC(order_purchase_timestamp,MINUTE) AS MINUTE,
  TIMESTAMP_TRUNC(order_purchase_timestamp,SECOND) AS SECOND
FROM `olist.olist_orders`
WHERE
  DATE(order_purchase_timestamp) = '2017-11-26' -- 일요일

☁️ EXTRACT와 TIMESTAMP_TRUNC 비교

SELECT
  order_purchase_timestamp,
  EXTRACT(YEAR FROM order_purchase_timestamp) AS YEAR1,
  TIMESTAMP_TRUNC(order_purchase_timestamp, YEAR) AS YEAR2,
  EXTRACT(MONTH FROM order_purchase_timestamp) AS MONTH1,
  TIMESTAMP_TRUNC(order_purchase_timestamp, MONTH) AS MONTH2
FROM `olist.olist_orders`
WHERE
  DATE(order_purchase_timestamp) = '2017-11-26' -- 일요일

🔆 한번에 주석 처리하는 방법

  • 원하는 블록 구간 drag → ctrl + /

☁️ 종합 쿼리 2번째에 날짜 정보 붙이기

  /* 두번째 */
  SELECT
    -- 날짜 유형 
    ord.order_approved_at,
    DATE(ord.order_approved_at) AS ord_date,
    DATETIME(ord.order_approved_at) AS ord_dt,
    TIMESTAMP(ord.order_approved_at) AS ord_ts,

    ord.order_id, 
    ord.order_status,
    ord.customer_id,
    cust.customer_unique_id, 
    tb.ord_amt, 
    tb.prd_cnt
  FROM `olist.olist_orders` AS ord
  LEFT JOIN `olist.olist_customers` AS cust -- 고객 unique ID
    ON ord.customer_id = cust.customer_id
  INNER JOIN tb                             -- 1번 결과 
    ON ord.order_id = tb.order_id

  /* 두번째 */
  SELECT
    -- 날짜 유형
    ord.order_approved_at,
    DATE(ord.order_approved_at) AS ord_date,
    DATETIME(ord.order_approved_at) AS ord_dt,
    TIMESTAMP(ord.order_approved_at) AS ord_ts,

    -- EXTRACT / TRUNC
    EXTRACT(YEAR FROM ord.order_approved_at) AS ord_year,
    EXTRACT(MONTH FROM ord.order_approved_at) AS ord_month,
    TIMESTAMP_TRUNC(ord.order_approved_at, YEAR) AS ord_year_t,
    TIMESTAMP_TRUNC(ord.order_approved_at, MONTH) AS ord_year_m,

    ord.order_id, 
    ord.order_status,
    ord.customer_id,
    cust.customer_unique_id, 
    tb.ord_amt, 
    tb.prd_cnt
  FROM `olist.olist_orders` AS ord
  LEFT JOIN `olist.olist_customers` AS cust -- 고객 unique ID
    ON ord.customer_id = cust.customer_id
  INNER JOIN tb                             -- 1번 결과 
    ON ord.order_id = tb.order_id

☁️ 종합 쿼리 3번째에 날짜 정보 붙이기

/* 세번째 */
SELECT
  ord_year, -- 필요한 날짜를 기준으로 보고싶을때 다양하게 변경하여 분석 가능 
  SUM(ord_amt) AS `총 매출`,
	COUNT(DISTINCT order_id) AS `총 주문수`,
	COUNT(prd_cnt) AS `총 판매상품수`,

	SUM(ord_amt) / COUNT(DISTINCT order_id) AS `주문당 평균 가격`,
	COUNT(prd_cnt) / COUNT(DISTINCT order_id) AS `평균판매상품수`,
	SUM(ord_amt) / COUNT(prd_cnt) AS `제품개당평균가격`,

  COUNT(DISTINCT customer_unique_id) AS `주문고객수`,
  COUNT(DISTINCT order_id) / COUNT(DISTINCT customer_unique_id) AS `주문빈도`
FROM base
GROUP BY ord_year
ORDER BY ord_year
  • Null 값이 있는 이유 : olist_orders 원본에서 일부 행의 날짜가 Null값이기 때문


📖 Appendix, 빅쿼리의 유용한 함수

🔆 column이 중요한 빅쿼리

  • 열기반으로 저장하는 빅쿼리

🔆 EXCEPT : 필요없는 컬럼은 빼고 불러오자

SELECT * EXCEPT('제외할 컬럼')

🔆 REPLACE : 불러올 때 간단하게 바꿔주자

SELECT * REPLACE('대체할 값' AS '컬럼명')

☁️ 실습

  • 원본
SELECT *
FROM `olist.olist_order_items`
LIMIT 100;

  • EXCEPT
SELECT * EXCEPT(shipping_limit_date, freight_value)
FROM `olist.olist_order_items`;

  • REPLACE
SELECT * REPLACE(price * 10000 AS price)
FROM `olist.olist_order_items`;

🔆 데이터 타입을 바꾸는 CAST 그리고 SAFE_CAST

CAST('값' AS '데이터타입')
SAFE_CAST('값' AS '데이터타입')
CAST('123' AS INT64)  -- INT64
SAFE_CAST('123' AS INT64) -- INT64

CAST('abc' AS INT64)  -- error!
SAFE_CAST('abc' AS INT64) -- null

🔆 오류를 발생시키지 않는 SAFE 연산

  • 에러 있을 경우 에러를 발생 시키지 않고 NULL로 반환함
SAFE_ADD(X, Y) = X + Y
SAFE_SUBTRACT(X, Y) = X - Y
SAFE_MULTIPLY(X, Y) = X * Y
SAFE_DIVIDE(X, Y) = X / Y -- 중요! 
  • 특히 잘 쓰이는 SAFE_DIVIDE!
    • IFNULL과 조합하면 0으로도 반환 가능

☁️ CAST & SAFE_CAST 실습

SELECT
  CAST('123' AS INT64) AS cast_result,
  SAFE_CAST('123' AS INT64) AS safe_result;

SELECT
  CAST('abc' AS INT64) AS cast_result;

SELECT
  SAFE_CAST('abc' AS INT64) AS safe_result;

☁️ SAFE_DIVIDE

SELECT
  SAFE_DIVIDE(5, 0)

SELECT
  SAFE_DIVIDE(5, 0),
  IFNULL(SAFE_DIVIDE(5, 0), 0)

0개의 댓글