해당 예제는 도커에 Python - Notebook을 띄우고 Hive를 연결하여 데이터가 나오기 까지의 예제이다.
도커허브에서 anaconda3을 기준으로 pull 받는다.
docker pull continuumio/anaconda3
1차적으로 정상동작하는지 띄워본다.
docker run -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash -c "\
    conda install jupyter -y --quiet && \
    mkdir -p /opt/notebooks && \
    jupyter notebook \
    --notebook-dir=/opt/notebooks --ip='*' --port=8888 \
    --no-browser --allow-root"정상 접속이 되면, 토큰을 물어보는데 이게 여간 귀찮은것이 아님,, (토큰시간이 만료되면 다시 재인증을 해야함)
해서 비밀번호 방식으로 변경하고자 한다.
참고 확인 사이트 : https://financedata.github.io/posts/jupyter-notebook-authentication.html
jupyter notebook --generate-config# Input
ipython
# Input
from IPython.lib import passwd
# Input
passwd()
# Output
Out[2]: 'sha1:f0bf7a023f60:25920410f68d70c03175e3fec4619c497b84193f'/root/.jupyter/jupyter_notebook_config.py 파일에 아래 내용을 추가 한다. 단, 해당 이미지에는 vi가 없으므로 echo를 통한 방법으로 한다.echo "c = get_config()" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.ip = '0.0.0.0'" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.open_browser = False" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.port = 8888" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.password = 'sha1:06234b148e8d:f698b724e1cbfdd2713f00c9e84ccfaffb1cㅁㅁㅁㅁ'" >> /root/.jupyter/jupyter_notebook_config.pyjupyter notebook만을 실행해도 된다. ( 설정파일에 작성하였으므로 )
해당 테스트를 하는 이유는, EMR에서 제공하는 jupyter hub 도커 이미지는 커널의 버전 문제로 인하여 hive관련 패키지가 설치되지 않았다. (이외에도 여러 문제발생) 해서 먼저 테스트를 진행한다.
conda install pyhive
# Output
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
  environment location: /opt/conda
  added / updated specs:
    - pyhive
The following packages will be downloaded:
    package                    |            build
    ---------------------------|-----------------
    cyrus-sasl-2.1.27          |       h758a394_8         275 KB
    libdb-6.2.32               |       hf484d3e_0        18.5 MB
    pyhive-0.6.1               |   py39h06a4308_0         368 KB
    sasl-0.2.1                 |   py39h48830cd_1          58 KB
    thrift-0.13.0              |   py39h2531618_0         119 KB
    thrift_sasl-0.4.2          |   py39h06a4308_1          11 KB
    ------------------------------------------------------------
                                           Total:        19.3 MB
The following NEW packages will be INSTALLED:
  cyrus-sasl         pkgs/main/linux-64::cyrus-sasl-2.1.27-h758a394_8
  libdb              pkgs/main/linux-64::libdb-6.2.32-hf484d3e_0
  pyhive             pkgs/main/linux-64::pyhive-0.6.1-py39h06a4308_0
  sasl               pkgs/main/linux-64::sasl-0.2.1-py39h48830cd_1
  thrift             pkgs/main/linux-64::thrift-0.13.0-py39h2531618_0
  thrift_sasl        pkgs/main/linux-64::thrift_sasl-0.4.2-py39h06a4308_1
Proceed ([y]/n)? y
Downloading and Extracting Packages
thrift-0.13.0        | 119 KB    | ################################################################################################################################################################################################################################################################################### | 100%
pyhive-0.6.1         | 368 KB    | ################################################################################################################################################################################################################################################################################### | 100%
sasl-0.2.1           | 58 KB     | ################################################################################################################################################################################################################################################################################### | 100%
libdb-6.2.32         | 18.5 MB   | ################################################################################################################################################################################################################################################################################### | 100%
cyrus-sasl-2.1.27    | 275 KB    | ################################################################################################################################################################################################################################################################################### | 100%
thrift_sasl-0.4.2    | 11 KB     | ################################################################################################################################################################################################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: donepip install은 지정한것만 설치되었는데 아나콘다로 하니 의존되는 다른 패키지도 설치가 되었다.
결론 ->아나콘다 쓰자
from pyhive import hive
conn = hive.Connection(host='[IP]', port=10000, database='partner')
cursor = conn.cursor()
cursor.execute('select * from partner.db_business')
for row in cursor.fetchall():
    print(row)# Output
(10001, '', 2, '2021-05-26 13:32:20.784929', '2021-05-26 13:32:20.784929', False, 'null')
(10004, '', 2, '2021-05-26 18:18:11.791122', '2021-05-26 18:18:11.791122', False, 'null')
(10007, '', 3, '2021-05-27 14:46:11.417005', '2021-05-27 14:46:11.417005', False, 'null')