Provisioning, HMS / Docker

Jeonghak Choยท2025๋…„ 6์›” 3์ผ

Provisioning

๋ชฉ๋ก ๋ณด๊ธฐ
36/44
post-thumbnail

ํ”„๋กœ๋น„์ €๋‹ ๐Ÿ“— HMS, ๋„์ปค ์ด๋ฏธ์ง€ ์ƒ์„ฑ

๐Ÿณ๏ธโ€๐ŸŒˆ [๊ถ๊ธˆํ•œ์ ]

  • HMS๋ฅผ MinIO ์™€ ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•
  • HMS๋ฅผ ๋ฉ”ํƒ€ ๋ฐ์ดํ„ฐ๋ฅผ PostgreSQL์— ์ €์žฅํ•˜๊ณ , ๋ฐ์ดํ„ฐ๋Š” MinIO์— ์ €์žฅ
  • HMS๋ฅผ ๋„์ปค ์ด๋ฏธ์ง€๋กœ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•

์‚ฌ์ „ ์ค€๋น„

  • MinIO ๊ฐ€ ์‹คํ–‰๋˜์–ด์•ผ ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ์—”๋“œ ํฌ์ธํŠธ๊ฐ€ http://172.31.144.1:9000๋กœ ์ง€์ •๋˜์—ˆ๋‹ค.
  • PostgreSQL์ด ์‹คํ–‰๋˜์–ด์•ผ ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” 14.17 ๋ฒ„์ „์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค.
  • ์‹คํ–‰๋˜๋Š” HMS๋ฅผ ํ…Œ์ŠคํŠธ ํ•˜๊ธฐ์œ„ํ•œ TRINO๊ฐ€ ์‹คํ–‰๋˜์–ด์•ผ ํ•œ๋‹ค.

๐Ÿ”—[๋ชฉ์ฐจ]

1๏ธโƒฃ ์ค€๋น„

HMS ์ด๋ฏธ์ง€ ์ƒ์„ฑ ํด๋” ๊ตฌ์กฐ

hive/
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ hms/
โ”‚   โ”œโ”€โ”€ hive-site.xml
โ”‚   โ””โ”€โ”€ core-site.xml
โ””โ”€โ”€ jars/
    โ”œโ”€โ”€ postgresql-42.2.5.jar
    โ”œโ”€โ”€ aws-java-sdk-bundle-1.11.1026.jar
    โ””โ”€โ”€ hadoop-aws-3.3.4.jar

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋‹ค์šด๋กœ๋“œ

jars

๋ฉ”ํƒ€ ์ €์žฅ์†Œ๋กœ ์‚ฌ์šฉํ•  PostgreSQL ๋“œ๋ผ์ด๋ฒ„์™€ MinIO์—ฐ๊ฒฐ์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋‹ค์šด๋กœ๋“œ ํ•œ๋‹ค.

wget https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.5/postgresql-42.2.5.jar
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.1026/aws-java-sdk-bundle-1.11.1026.jar
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar

19.4 MB/s

2๏ธโƒฃ ์„ค์ •

hive-site.xml

hive-site.xml์€ Metastore, JDBC ์—ฐ๊ฒฐ, Warehouse ๋””๋ ‰ํ„ฐ๋ฆฌ ์„ค์ • ๋“ฑ Hive ํŠนํ™” ์„ค์ •์„ ํฌํ•จํ•œ๋‹ค. ํ™˜๊ฒฝ์— ๋งž์ถ”์–ด S3 ์—”๋“œ ํฌ์ธํŠธ ๋“ฑ ์กฐ์ •ํ•ด์•ผํ•  ์†์„ฑ๋“ค์ด ์žˆ๋‹ค.

hive-site.xml ์†์„ฑ

hive-site.xml ์†์„ฑ

์†์„ฑ ์ด๋ฆ„ (name)๊ฐ’ (value)์„ค๋ช…
hive.server2.enable.doAsfalseHiveServer2๊ฐ€ ์š”์ฒญ์„ ์‹ค์ œ ์‚ฌ์šฉ์ž ๊ถŒํ•œ์œผ๋กœ ์‹คํ–‰ํ• ์ง€ ์—ฌ๋ถ€. false์ด๋ฉด ์‹คํ–‰ ์‚ฌ์šฉ์ž ๊ถŒํ•œ ๊ทธ๋Œ€๋กœ ์ˆ˜ํ–‰.
hive.tez.exec.inplace.progressfalseTez ์‹คํ–‰ ์ค‘ ์ง„ํ–‰ ์ƒํ™ฉ ํ‘œ์‹œ ์„ค์ •.
hive.exec.scratchdir/opt/hive/scratch_dirHive ์ž‘์—… ์ค‘๊ฐ„ ๊ฒฐ๊ณผ ์ €์žฅ์šฉ ์ž„์‹œ ๋””๋ ‰ํ„ฐ๋ฆฌ.
hive.user.install.directory/opt/hive/install_dir์‚ฌ์šฉ์ž ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ, ํŒŒ์ผ ๋“ฑ์„ ์ €์žฅํ•˜๋Š” ๊ธฐ๋ณธ ๋””๋ ‰ํ„ฐ๋ฆฌ.
tez.runtime.optimize.local.fetchtrueํ…Œ์ด๋ธ” ์กฐ์ธ ๋“ฑ์—์„œ ๋กœ์ปฌ ํŽ˜์น˜ ์ตœ์ ํ™” ํ™œ์„ฑํ™”.
hive.exec.submit.local.task.via.childfalse๋กœ์ปฌ ์ž‘์—…์„ ๋ถ€๋ชจ JVM์—์„œ ์‹คํ–‰ํ• ์ง€ ์—ฌ๋ถ€.
mapreduce.framework.namelocal์‹คํ–‰ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋กœ์ปฌ ๋ชจ๋“œ๋กœ ์„ค์ • (MapReduce๋‚˜ YARN์ด ์•„๋‹Œ ๋‹จ์ผ JVM์—์„œ ์‹คํ–‰).
tez.local.modetrueTez๋ฅผ ๋กœ์ปฌ ๋ชจ๋“œ์—์„œ ์‹คํ–‰ํ•˜๋„๋ก ์„ค์ •.
hive.execution.enginetezHive ์‹คํ–‰ ์—”์ง„์œผ๋กœ tez ์‚ฌ์šฉ (๊ธฐ๋ณธ์€ MapReduce).
hive.metastore.warehouse.dirs3a://mybucket/warehouseHive ํ…Œ์ด๋ธ”์˜ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ €์žฅ ์œ„์น˜ (S3A๋ฅผ ํ†ตํ•ด MinIO ์—ฐ๋™).
metastore.metastore.event.db.notification.api.authfalseHMS ์ด๋ฒคํŠธ ์•Œ๋ฆผ API ์ธ์ฆ ํ•„์š” ์—ฌ๋ถ€.
hive.metastore.uristhrift://hms:9083Hive Metastore์˜ Thrift ์ ‘๊ทผ ์ฃผ์†Œ.
javax.jdo.option.ConnectionURLjdbc:postgresql://psql:5432/hive_metastoreHive Metastore๊ฐ€ ์‚ฌ์šฉํ•  PostgreSQL ์—ฐ๊ฒฐ URL.
javax.jdo.option.ConnectionDriverNameorg.postgresql.DriverPostgreSQL JDBC ๋“œ๋ผ์ด๋ฒ„ ํด๋ž˜์Šค๋ช….
javax.jdo.option.ConnectionUserNamehiveMetastore DB ์ ‘์† ์‹œ ์‚ฌ์šฉํ•  ์‚ฌ์šฉ์ž ์ด๋ฆ„.
javax.jdo.option.ConnectionPasswordhiveMetastore DB ์ ‘์† ์‹œ ์‚ฌ์šฉํ•  ๋น„๋ฐ€๋ฒˆํ˜ธ.
fs.s3.endpointhttp://172.31.144.1:9000S3 ํ˜ธํ™˜ ์Šคํ† ๋ฆฌ์ง€(Minio)์˜ ์—”๋“œํฌ์ธํŠธ ์ฃผ์†Œ.
fs.s3.access.keyminioadminS3 ์ ‘์†์šฉ Access Key.
fs.s3.secret.keyminioadminS3 ์ ‘์†์šฉ Secret Key.
fs.s3.path.style.accesstrue๋ฒ„ํ‚ท ๊ฒฝ๋กœ ์Šคํƒ€์ผ์„ path-style(๋ฒ„ํ‚ท์ด URL ๊ฒฝ๋กœ์— ํฌํ•จ)๋กœ ์„ค์ •. MinIO์—์„œ ํ•„์ˆ˜.
fs.s3a.implorg.apache.hadoop.fs.s3a.S3AFileSystemHadoop์—์„œ S3A๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํŒŒ์ผ์‹œ์Šคํ…œ ๊ตฌํ˜„ ํด๋ž˜์Šค.
fs.s3a.aws.credentials.providerorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider์ •์  Access Key/Secret Key๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ธ์ฆ ์ œ๊ณต์ž.
fs.s3a.endpoint.regionmy_regionS3A์—์„œ ์‚ฌ์šฉํ•  ๋ฆฌ์ „ ์ด๋ฆ„ (MinIO์—์„  ์ž„์˜๊ฐ’ ๊ฐ€๋Šฅ).
<configuration>
    <property>
        <name>hive.server2.enable.doAs</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.tez.exec.inplace.progress</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.exec.scratchdir</name>
        <value>/opt/hive/scratch_dir</value>
    </property>
    <property>
        <name>hive.user.install.directory</name>
        <value>/opt/hive/install_dir</value>
    </property>
    <property>
        <name>tez.runtime.optimize.local.fetch</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.exec.submit.local.task.via.child</name>
        <value>false</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>local</value>
    </property>
    <property>
        <name>tez.local.mode</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.execution.engine</name>
        <value>tez</value>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>s3a://mybucket/warehouse</value>
    </property>
    <property>
        <name>metastore.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://hms:9083</value>
      <description>URI for the Hive Metastore server</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:postgresql://psql:5432/hive_metastore</value>
      <description>JDBC connection URL for the Hive Metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>org.postgresql.Driver</value>
      <description>JDBC Driver for PostgreSQL</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hive</value>
      <description>JDBC username for PostgreSQL</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>hive</value>
      <description>JDBC password for PostgreSQL</description>
    </property>
    <property>
      <name>fs.s3.endpoint</name>
      <value>http://172.31.144.1:9000</value>
    </property>

    <property>
      <name>fs.s3.access.key</name>
      <value>minioadmin</value>
    </property>

    <property>
      <name>fs.s3.secret.key</name>
      <value>minioadmin</value>
    </property>

    <property>
      <name>fs.s3.path.style.access</name>
      <value>true</value>
    </property>
      <property>
      <name>fs.s3a.impl</name>
      <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
    </property>

    <property>
      <name>fs.s3a.aws.credentials.provider</name>
      <value>org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</value>
    </property>

    <property>
      <name>fs.s3a.endpoint.region</name>
      <value>my_region</value>
    </property>
</configuration>

core-site.xml

core-site.xml์€ Hadoop ๊ณตํ†ต ์„ค์ • ํŒŒ์ผ์ด๋‹ค. ํŠนํžˆ S3A(Minio ํฌํ•จ) ๋“ฑ ์™ธ๋ถ€ ์Šคํ† ๋ฆฌ์ง€ ์—ฐ๋™์— ํ•„์š”ํ•œ ์„ค์ •๋“ค์„ ๋‹ด๋Š”๋‹ค. MinIO ์—”๋“œ ํฌ์ธํŠธ์™€ ์ ‘์† ๊ณ„์ • ์ •๋ณด๋Š” ํ™˜๊ฒฝ์— ๋งž๊ฒŒ ์กฐ์ •ํ•œ๋‹ค.

<configuration>
  <property>
    <name>fs.s3a.endpoint</name>
    <value>http://172.31.144.1:9000</value>
  </property>
  <property>
    <name>fs.s3a.access.key</name>
    <value>minioadmin</value>
  </property>
  <property>
    <name>fs.s3a.secret.key</name>
    <value>minioadmin</value>
  </property>
  <property>
    <name>fs.s3a.path.style.access</name>
    <value>true</value>
  </property>
</configuration>

3๏ธโƒฃ ์„ค์น˜

HMS ์ด๋ฏธ์ง€ ์ƒ์„ฑ - Dockerfile ์ •์˜

# hive/Dockerfile
FROM apache/hive:4.0.1

ENV SERVICE_NAME=metastore \
    DB_DRIVER=postgres

# ์„ค์ • ํŒŒ์ผ ๋ณต์‚ฌ
COPY hms/hive-site.xml /opt/hive/conf/hive-site.xml
COPY hms/core-site.xml /opt/hadoop/etc/hadoop/core-site.xml

# ํ•„์š”ํ•œ JAR ๋ณต์‚ฌ
COPY jars/postgresql-42.2.5.jar /opt/hive/lib/
COPY jars/aws-java-sdk-bundle-1.11.1026.jar /opt/hive/lib/
COPY jars/hadoop-aws-3.3.4.jar /opt/hive/lib/

HMS Custom ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ช…๋ น์–ด ์‹คํ–‰

  • ์ด๋ฏธ์ง€ ๋นŒ๋“œ
    ์ง์ ‘ ๋นŒ๋“œ๊ฐ€ ๊ฐ€๋Šฅํ•˜๊ณ , ์ด์–ด์ง€๋Š” docker-compose ๋ฅผ ์ด์šฉํ•ด์„œ ๋นŒ๋“œ ๋ฐ ์‹คํ–‰์„ ๋™์‹œ์— ํ•  ์ˆ˜ ์žˆ๋‹ค.
docker build -t hms:1 .
  • ์ด๋ฏธ์ง€ ํ™•์ธ
docker images
REPOSITORY                        TAG                            IMAGE ID       CREATED         SIZE
hms                               1                              c9ef4120789b   9 seconds ago   1.83GB

HMS ์‹คํ–‰

docker-compose ์‹คํ–‰

docker-compose up -d --build

docker-compose yaml ์ •์˜

# hive/docker-compose.yml
version: '3'
services:
  hms:
    build: .
    container_name: hms
    ports:
      - "9083:9083"
    environment:
      SERVICE_NAME: metastore
      DB_DRIVER: postgres
    networks:
      - mynetwork
networks:
  mynetwork:
    external: true

HMS ์‹คํ–‰ ์ปจํ…Œ์ด๋„ˆ ํ™•์ธ

d ps
CONTAINER ID   IMAGE                                 COMMAND                  CREATED             STATUS                       PORTS                                                             NAMES
8449b61fa160   hive_hms                              "sh -c /entrypoint.sh"   4 minutes ago       Up 4 minutes                 10000/tcp, 0.0.0.0:9083->9083/tcp, :::9083->9083/tcp, 10002/tcp   hms

4๏ธโƒฃ ๊ฒ€์ฆ

TRINO ์—ฐ๊ฒฐ

trino --server http://localhost:8080

TRINO ์นดํƒˆ๋กœ๊ทธ ํ™•์ธ

trino> SHOW CATALOGS;
  Catalog
------------
 iceberg
 mariadb
 postgresql
 system
(4 rows)

Query 20250601_104633_00000_pe3h3, FINISHED, 1 node
Splits: 11 total, 11 done (100.00%)
0.94 [0 rows, 0B] [0 rows/s, 0B/s]

TRINO๋กœ ํ…Œ์ด๋ธ” ์ƒ์„ฑ/ํ™•์ธ

trino> CREATE TABLE iceberg.default.customers (
    ->     id INTEGER,
    ->     name VARCHAR,
    ->     email VARCHAR,
    ->     created_at TIMESTAMP
    -> )
    -> WITH (
    ->     format = 'parquet',
    ->     partitioning = ARRAY['created_at']
    -> );
CREATE TABLE

trino> SHOW TABLES FROM iceberg.default;
   Table
-----------
 customers
(1 row)

Query 20250603_025310_00019_5r6vr, FINISHED, 1 node
Splits: 11 total, 11 done (100.00%)
0.12 [1 rows, 26B] [8 rows/s, 224B/s]
trino> DESCRIBE iceberg.default.customers;
   Column   |     Type     | Extra | Comment
------------+--------------+-------+---------
 id         | integer      |       |
 name       | varchar      |       |
 email      | varchar      |       |
 created_at | timestamp(6) |       |
(4 rows)

MinIO ๋ฒ„ํ‚ท ํ™•์ธ

0๊ฐœ์˜ ๋Œ“๊ธ€