[데이터 플랫폼 운영 / 개발] - HBase, Phoenix 적용하기

Chan hae OH·2024년 1월 30일

HBase phoenix setting 데이터플랫폼운영

1. 시작글

안녕하세요.

데이터 엔지니어링 & 운영 업무를 하는 중 알게 된 지식이나 의문점들을 시리즈 형식으로 계속해서 작성해나가며

새로 알게 된 점이나 잘 못 알고 있었던 점을 더욱 기억에 남기기 위해 글을 꾸준히 작성 할려고 합니다.

HBase 의 경우 공식 문서와 구글링을 하여 작성하고 있습니다.

반드시 글을 읽어 주실 때 잘 못 말하고 있는 부분은 정정 요청 드립니다.

저의 지식에 큰 도움이 됩니다. :)

2. Apache Phoenix 란

Apache Phoenix 란 HBase NoSQL 을 기존의 SQL 을 사용하던 인력들이 더 사용하기 편하도록 SQL 을 제공하며, ACID 나 Transaction 을 제공하는 SQL Layer 라고 볼 수 있습니다.

"""
Apache Phoenix enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds:

the power of standard SQL and JDBC APIs with full ACID transaction capabilities and
the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store
Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce.
"""

참고 : https://phoenix.apache.org/

특히 Trino 는 Connector 를 HBase 가 아닌 Phoenix 를 제공하고 있기 때문에 함께 운영하기 위해서는 반드시 필요한 오픈소스라고 볼 수 있습니다.

3. Phoenix 설치

3.1 Phoenix 파일 다운

https://dlcdn.apache.org/phoenix/phoenix-5.1.3/phoenix-hbase-2.3-5.1.3-bin.tar.gz

3.2 HMaster, Region 서버 HBase 경로에 jar 배포

cp /opt/phoenix/phoenix-server-hbase-2.3-5.1.3.jar /opt/hbase/lib/

3.3 hbase-site.xml config 설정

HBase 에서 스키마를 사용하고 싶다면 아래와 같은 옵션을 추가해줍니다.

<property>
   <name>phoenix.schema.isNamespaceMappingEnabled</name>
   <value>true</value>
</property>

참고 : https://phoenix.apache.org/namspace_mapping.html

3.4 restart HBase

3.5 /opt/phoenix/bin 에서 sqlline.py 실행

python3 sqlline.py

Chan hae OH

Data Engineer

이전 포스트

[데이터 플랫폼 운영 / 개발] - 실시간 데이터 파이프라인 & 분석 서비스 구축 회고록 (분석계)

다음 포스트