Trino-Python Connector를 사용하여 Python으로 테이블 조회하기

NewNewDaddy·2023년 12월 19일
0

PYTHON

목록 보기
3/7
post-thumbnail

0. INTRO

1. 본문

  1. trino 라이브러리 설치

    > pip install trino
    
    --------------------------------------------
        Collecting trino
          Downloading trino-0.327.0-py3-none-any.whl (49 kB)
             |████████████████████████████████| 49 kB 1.3 MB/s eta 0:00:011
        Collecting requests>=2.31.0
          Downloading requests-2.31.0-py3-none-any.whl (62 kB)
             |████████████████████████████████| 62 kB 1.1 MB/s eta 0:00:01
        Requirement already satisfied: python-dateutil in /usr/local/lib/python3.8/dist-packages (from trino) (2.8.2)
        Collecting backports.zoneinfo; python_version < "3.9"
          Downloading backports.zoneinfo-0.2.1-cp38-cp38-manylinux1_x86_64.whl (74 kB)
             |████████████████████████████████| 74 kB 1.6 MB/s eta 0:00:011
        Collecting pytz
          Downloading pytz-2023.3.post1-py2.py3-none-any.whl (502 kB)
             |████████████████████████████████| 502 kB 8.0 MB/s eta 0:00:01
        Collecting tzlocal
          Downloading tzlocal-5.2-py3-none-any.whl (17 kB)
  2. Python Connection 코드 작성

    
    from trino.dbapi import connect
    import pandas as pd
    
    conn = connect(
        host = "TRINO_HOST_ADDRESS",
        port = PORT,
        user = "TRINO_USER",
        catalog = "TRINO_CATALOG",
        schema = "TRINO_SCHEMA",
    )
    
    cur = conn.cursor()
  3. 쿼리 실행

  • Show Running Nodes

    
    cur = conn.cursor()
    cur.execute("SELECT * FROM system.runtime.nodes")
    rows = cur.fetchall()
    
    pd.DataFrame(rows, columns=['node', 'ip', 'version', 'coordinator', 'active'])
    
    --------------------------------------------
        node	ip	version	coordinator	active
        0	trino-coordinator-77999554d7-wh7v8	http://198.18.0.131:8080	432	True	active
        1	trino-worker-5c86775b75-pfsjw	http://198.18.0.19:8080	432	False	active
  • Create Table Schema

    
    create_table_query = """
    CREATE TABLE lakehouse.default.allstarfull(
            playerID VARCHAR,
            yearID VARCHAR,
            gameNum INTEGER,
            gameID VARCHAR,
            teamID VARCHAR,
            lgID VARCHAR,
            GP VARCHAR,
            startingPos VARCHAR
            )
              WITH (
              external_location = 's3a://NCP_OS_BUCKET/path/of/data',
              format = 'PARQUET'
            );
    """
            
    cur = conn.cursor()
    cur.execute(create_table_query)
  • Select Data

    
    cur.execute("select * from lakehouse.default.allstarfull")
    
    rows = cur.fetchall()
    
    for row in rows:
        print(row)

2. 참고자료

profile
데이터 엔지니어의 작업공간 / #PYTHON #SPARK #AWS #NCLOUD

0개의 댓글