해당 예제는 도커에 Python - Notebook을 띄우고 Hive를 연결하여 데이터가 나오기 까지의 예제이다.
도커허브에서 anaconda3을 기준으로 pull 받는다.
docker pull continuumio/anaconda3
1차적으로 정상동작하는지 띄워본다.
docker run -i -t -p 8888:8888 continuumio/anaconda3 /bin/bash -c "\
conda install jupyter -y --quiet && \
mkdir -p /opt/notebooks && \
jupyter notebook \
--notebook-dir=/opt/notebooks --ip='*' --port=8888 \
--no-browser --allow-root"
정상 접속이 되면, 토큰을 물어보는데 이게 여간 귀찮은것이 아님,, (토큰시간이 만료되면 다시 재인증을 해야함)
해서 비밀번호 방식으로 변경하고자 한다.
참고 확인 사이트 : https://financedata.github.io/posts/jupyter-notebook-authentication.html
jupyter notebook --generate-config
# Input
ipython
# Input
from IPython.lib import passwd
# Input
passwd()
# Output
Out[2]: 'sha1:f0bf7a023f60:25920410f68d70c03175e3fec4619c497b84193f'
/root/.jupyter/jupyter_notebook_config.py
파일에 아래 내용을 추가 한다. 단, 해당 이미지에는 vi가 없으므로 echo를 통한 방법으로 한다.echo "c = get_config()" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.ip = '0.0.0.0'" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.open_browser = False" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.port = 8888" >> /root/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.password = 'sha1:06234b148e8d:f698b724e1cbfdd2713f00c9e84ccfaffb1cㅁㅁㅁㅁ'" >> /root/.jupyter/jupyter_notebook_config.py
jupyter notebook
만을 실행해도 된다. ( 설정파일에 작성하였으므로 )
해당 테스트를 하는 이유는, EMR에서 제공하는 jupyter hub 도커 이미지는 커널의 버전 문제로 인하여 hive관련 패키지가 설치되지 않았다. (이외에도 여러 문제발생) 해서 먼저 테스트를 진행한다.
conda install pyhive
# Output
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /opt/conda
added / updated specs:
- pyhive
The following packages will be downloaded:
package | build
---------------------------|-----------------
cyrus-sasl-2.1.27 | h758a394_8 275 KB
libdb-6.2.32 | hf484d3e_0 18.5 MB
pyhive-0.6.1 | py39h06a4308_0 368 KB
sasl-0.2.1 | py39h48830cd_1 58 KB
thrift-0.13.0 | py39h2531618_0 119 KB
thrift_sasl-0.4.2 | py39h06a4308_1 11 KB
------------------------------------------------------------
Total: 19.3 MB
The following NEW packages will be INSTALLED:
cyrus-sasl pkgs/main/linux-64::cyrus-sasl-2.1.27-h758a394_8
libdb pkgs/main/linux-64::libdb-6.2.32-hf484d3e_0
pyhive pkgs/main/linux-64::pyhive-0.6.1-py39h06a4308_0
sasl pkgs/main/linux-64::sasl-0.2.1-py39h48830cd_1
thrift pkgs/main/linux-64::thrift-0.13.0-py39h2531618_0
thrift_sasl pkgs/main/linux-64::thrift_sasl-0.4.2-py39h06a4308_1
Proceed ([y]/n)? y
Downloading and Extracting Packages
thrift-0.13.0 | 119 KB | ################################################################################################################################################################################################################################################################################### | 100%
pyhive-0.6.1 | 368 KB | ################################################################################################################################################################################################################################################################################### | 100%
sasl-0.2.1 | 58 KB | ################################################################################################################################################################################################################################################################################### | 100%
libdb-6.2.32 | 18.5 MB | ################################################################################################################################################################################################################################################################################### | 100%
cyrus-sasl-2.1.27 | 275 KB | ################################################################################################################################################################################################################################################################################### | 100%
thrift_sasl-0.4.2 | 11 KB | ################################################################################################################################################################################################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
pip install은 지정한것만 설치되었는데 아나콘다로 하니 의존되는 다른 패키지도 설치가 되었다.
결론 ->아나콘다 쓰자
from pyhive import hive
conn = hive.Connection(host='[IP]', port=10000, database='partner')
cursor = conn.cursor()
cursor.execute('select * from partner.db_business')
for row in cursor.fetchall():
print(row)
# Output
(10001, '', 2, '2021-05-26 13:32:20.784929', '2021-05-26 13:32:20.784929', False, 'null')
(10004, '', 2, '2021-05-26 18:18:11.791122', '2021-05-26 18:18:11.791122', False, 'null')
(10007, '', 3, '2021-05-27 14:46:11.417005', '2021-05-27 14:46:11.417005', False, 'null')