
ํ๋ก๋น์ ๋ ๐ HMS, ๋์ปค ์ด๋ฏธ์ง ์์ฑ
๐ณ๏ธโ๐ [๊ถ๊ธํ์ ]
์ฌ์ ์ค๋น
- MinIO ๊ฐ ์คํ๋์ด์ผ ํ๋ค. ์ฌ๊ธฐ์๋ ์๋ ํฌ์ธํธ๊ฐ
http://172.31.144.1:9000๋ก ์ง์ ๋์๋ค.- PostgreSQL์ด ์คํ๋์ด์ผ ํ๋ค. ์ฌ๊ธฐ์๋
14.17๋ฒ์ ์ ์ฌ์ฉํ๊ณ ์๋ค.- ์คํ๋๋ HMS๋ฅผ ํ ์คํธ ํ๊ธฐ์ํ TRINO๊ฐ ์คํ๋์ด์ผ ํ๋ค.
๐[๋ชฉ์ฐจ]
hive/
โโโ Dockerfile
โโโ docker-compose.yml
โโโ hms/
โ โโโ hive-site.xml
โ โโโ core-site.xml
โโโ jars/
โโโ postgresql-42.2.5.jar
โโโ aws-java-sdk-bundle-1.11.1026.jar
โโโ hadoop-aws-3.3.4.jar
jars
๋ฉํ ์ ์ฅ์๋ก ์ฌ์ฉํ PostgreSQL ๋๋ผ์ด๋ฒ์ MinIO์ฐ๊ฒฐ์ ์ํ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ๋ค์ด๋ก๋ ํ๋ค.
wget https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.5/postgresql-42.2.5.jar wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.1026/aws-java-sdk-bundle-1.11.1026.jar wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar19.4 MB/s
hive-site.xml์ Metastore, JDBC ์ฐ๊ฒฐ, Warehouse ๋๋ ํฐ๋ฆฌ ์ค์ ๋ฑ Hive ํนํ ์ค์ ์ ํฌํจํ๋ค. ํ๊ฒฝ์ ๋ง์ถ์ด S3 ์๋ ํฌ์ธํธ ๋ฑ ์กฐ์ ํด์ผํ ์์ฑ๋ค์ด ์๋ค.
hive-site.xml ์์ฑhive-site.xml ์์ฑ
์์ฑ ์ด๋ฆ ( name)๊ฐ ( value)์ค๋ช hive.server2.enable.doAsfalseHiveServer2๊ฐ ์์ฒญ์ ์ค์ ์ฌ์ฉ์ ๊ถํ์ผ๋ก ์คํํ ์ง ์ฌ๋ถ. false์ด๋ฉด ์คํ ์ฌ์ฉ์ ๊ถํ ๊ทธ๋๋ก ์ํ.hive.tez.exec.inplace.progressfalseTez ์คํ ์ค ์งํ ์ํฉ ํ์ ์ค์ . hive.exec.scratchdir/opt/hive/scratch_dirHive ์์ ์ค๊ฐ ๊ฒฐ๊ณผ ์ ์ฅ์ฉ ์์ ๋๋ ํฐ๋ฆฌ. hive.user.install.directory/opt/hive/install_dir์ฌ์ฉ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ, ํ์ผ ๋ฑ์ ์ ์ฅํ๋ ๊ธฐ๋ณธ ๋๋ ํฐ๋ฆฌ. tez.runtime.optimize.local.fetchtrueํ ์ด๋ธ ์กฐ์ธ ๋ฑ์์ ๋ก์ปฌ ํ์น ์ต์ ํ ํ์ฑํ. hive.exec.submit.local.task.via.childfalse๋ก์ปฌ ์์ ์ ๋ถ๋ชจ JVM์์ ์คํํ ์ง ์ฌ๋ถ. mapreduce.framework.namelocal์คํ ํ๋ ์์ํฌ๋ฅผ ๋ก์ปฌ ๋ชจ๋๋ก ์ค์ (MapReduce๋ YARN์ด ์๋ ๋จ์ผ JVM์์ ์คํ). tez.local.modetrueTez๋ฅผ ๋ก์ปฌ ๋ชจ๋์์ ์คํํ๋๋ก ์ค์ . hive.execution.enginetezHive ์คํ ์์ง์ผ๋ก tez์ฌ์ฉ (๊ธฐ๋ณธ์ MapReduce).hive.metastore.warehouse.dirs3a://mybucket/warehouseHive ํ ์ด๋ธ์ ์ค์ ๋ฐ์ดํฐ ์ ์ฅ ์์น (S3A๋ฅผ ํตํด MinIO ์ฐ๋). metastore.metastore.event.db.notification.api.authfalseHMS ์ด๋ฒคํธ ์๋ฆผ API ์ธ์ฆ ํ์ ์ฌ๋ถ. hive.metastore.uristhrift://hms:9083Hive Metastore์ Thrift ์ ๊ทผ ์ฃผ์. javax.jdo.option.ConnectionURLjdbc:postgresql://psql:5432/hive_metastoreHive Metastore๊ฐ ์ฌ์ฉํ PostgreSQL ์ฐ๊ฒฐ URL. javax.jdo.option.ConnectionDriverNameorg.postgresql.DriverPostgreSQL JDBC ๋๋ผ์ด๋ฒ ํด๋์ค๋ช . javax.jdo.option.ConnectionUserNamehiveMetastore DB ์ ์ ์ ์ฌ์ฉํ ์ฌ์ฉ์ ์ด๋ฆ. javax.jdo.option.ConnectionPasswordhiveMetastore DB ์ ์ ์ ์ฌ์ฉํ ๋น๋ฐ๋ฒํธ. fs.s3.endpointhttp://172.31.144.1:9000S3 ํธํ ์คํ ๋ฆฌ์ง(Minio)์ ์๋ํฌ์ธํธ ์ฃผ์. fs.s3.access.keyminioadminS3 ์ ์์ฉ Access Key. fs.s3.secret.keyminioadminS3 ์ ์์ฉ Secret Key. fs.s3.path.style.accesstrue๋ฒํท ๊ฒฝ๋ก ์คํ์ผ์ path-style(๋ฒํท์ด URL ๊ฒฝ๋ก์ ํฌํจ)๋ก ์ค์ . MinIO์์ ํ์. fs.s3a.implorg.apache.hadoop.fs.s3a.S3AFileSystemHadoop์์ S3A๋ฅผ ์ฌ์ฉํ๋ ํ์ผ์์คํ ๊ตฌํ ํด๋์ค. fs.s3a.aws.credentials.providerorg.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider์ ์ Access Key/Secret Key๋ฅผ ์ฌ์ฉํ๋ ์ธ์ฆ ์ ๊ณต์. fs.s3a.endpoint.regionmy_regionS3A์์ ์ฌ์ฉํ ๋ฆฌ์ ์ด๋ฆ (MinIO์์ ์์๊ฐ ๊ฐ๋ฅ).
<configuration>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>hive.tez.exec.inplace.progress</name>
<value>false</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/opt/hive/scratch_dir</value>
</property>
<property>
<name>hive.user.install.directory</name>
<value>/opt/hive/install_dir</value>
</property>
<property>
<name>tez.runtime.optimize.local.fetch</name>
<value>true</value>
</property>
<property>
<name>hive.exec.submit.local.task.via.child</name>
<value>false</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>local</value>
</property>
<property>
<name>tez.local.mode</name>
<value>true</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>s3a://mybucket/warehouse</value>
</property>
<property>
<name>metastore.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hms:9083</value>
<description>URI for the Hive Metastore server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://psql:5432/hive_metastore</value>
<description>JDBC connection URL for the Hive Metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<description>JDBC Driver for PostgreSQL</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>JDBC username for PostgreSQL</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>JDBC password for PostgreSQL</description>
</property>
<property>
<name>fs.s3.endpoint</name>
<value>http://172.31.144.1:9000</value>
</property>
<property>
<name>fs.s3.access.key</name>
<value>minioadmin</value>
</property>
<property>
<name>fs.s3.secret.key</name>
<value>minioadmin</value>
</property>
<property>
<name>fs.s3.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</value>
</property>
<property>
<name>fs.s3a.endpoint.region</name>
<value>my_region</value>
</property>
</configuration>
core-site.xml์ Hadoop ๊ณตํต ์ค์ ํ์ผ์ด๋ค. ํนํ S3A(Minio ํฌํจ) ๋ฑ ์ธ๋ถ ์คํ ๋ฆฌ์ง ์ฐ๋์ ํ์ํ ์ค์ ๋ค์ ๋ด๋๋ค. MinIO ์๋ ํฌ์ธํธ์ ์ ์ ๊ณ์ ์ ๋ณด๋ ํ๊ฒฝ์ ๋ง๊ฒ ์กฐ์ ํ๋ค.
<configuration>
<property>
<name>fs.s3a.endpoint</name>
<value>http://172.31.144.1:9000</value>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>minioadmin</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>minioadmin</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
</configuration>
# hive/Dockerfile
FROM apache/hive:4.0.1
ENV SERVICE_NAME=metastore \
DB_DRIVER=postgres
# ์ค์ ํ์ผ ๋ณต์ฌ
COPY hms/hive-site.xml /opt/hive/conf/hive-site.xml
COPY hms/core-site.xml /opt/hadoop/etc/hadoop/core-site.xml
# ํ์ํ JAR ๋ณต์ฌ
COPY jars/postgresql-42.2.5.jar /opt/hive/lib/
COPY jars/aws-java-sdk-bundle-1.11.1026.jar /opt/hive/lib/
COPY jars/hadoop-aws-3.3.4.jar /opt/hive/lib/
docker build -t hms:1 .
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hms 1 c9ef4120789b 9 seconds ago 1.83GB
docker-compose up -d --build
# hive/docker-compose.yml
version: '3'
services:
hms:
build: .
container_name: hms
ports:
- "9083:9083"
environment:
SERVICE_NAME: metastore
DB_DRIVER: postgres
networks:
- mynetwork
networks:
mynetwork:
external: true
d ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8449b61fa160 hive_hms "sh -c /entrypoint.sh" 4 minutes ago Up 4 minutes 10000/tcp, 0.0.0.0:9083->9083/tcp, :::9083->9083/tcp, 10002/tcp hms
trino --server http://localhost:8080
trino> SHOW CATALOGS;
Catalog
------------
iceberg
mariadb
postgresql
system
(4 rows)
Query 20250601_104633_00000_pe3h3, FINISHED, 1 node
Splits: 11 total, 11 done (100.00%)
0.94 [0 rows, 0B] [0 rows/s, 0B/s]
trino> CREATE TABLE iceberg.default.customers (
-> id INTEGER,
-> name VARCHAR,
-> email VARCHAR,
-> created_at TIMESTAMP
-> )
-> WITH (
-> format = 'parquet',
-> partitioning = ARRAY['created_at']
-> );
CREATE TABLE
trino> SHOW TABLES FROM iceberg.default;
Table
-----------
customers
(1 row)
Query 20250603_025310_00019_5r6vr, FINISHED, 1 node
Splits: 11 total, 11 done (100.00%)
0.12 [1 rows, 26B] [8 rows/s, 224B/s]
trino> DESCRIBE iceberg.default.customers;
Column | Type | Extra | Comment
------------+--------------+-------+---------
id | integer | |
name | varchar | |
email | varchar | |
created_at | timestamp(6) | |
(4 rows)
