5개의 컴퓨터를 사용하여 hadoop 클러스터 위에서 돌아가는 spark 클러스터를 설치한다.
Spark 버전 :3.0.1
이미 hadoop cluster가 구축되어있다는 가정 하에 진행한다.
[hadoop] Cluster 설치 참고
server01 | server02 | server03 | server04 | server05 |
---|---|---|---|---|
NameNode | SecondaryNameNode | |||
NodeManager | DataNode | DataNode | DataNode | DataNode |
ResourceManager | ||||
JobHistoryServer | ||||
Master | Worker | Worker | Worker | Worker |
HistoryServer |
SPARK_HOME
이 설정되어 있지 않으면, spark 설치 경로가 아닌 다른 경로에서 spark-shell
등의 프로그램을 실행했을 때, eventLog가 안 나온다.
~/.bashrc
파일SPARK_HOME=/usr/local/spark
export PATH=\$PATH:\$SPARK_HOME/bin:\$SPARK_HOME/sbin
> source ~/.bashrc
> wget https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz
> sudo mkdir -p $SPARK_HOME && sudo tar -zvxf spark-3.0.1-bin-hadoop3.2.tgz -C $SPARK_HOME --strip-components 1
> sudo chown -R $USER:$USER $SPARK_HOME
$SPARK_HOME/conf/slaves
파일server02
server03
server04
server05
$SPARK_HOME/conf/spark-env.sh
파일export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export SPARK_MASTER_HOST=master
export HADOOP_HOME=/usr/local/hadoop
export YARN_CONF_DIR=\$HADOOP_HOME/etc/hadoop
export HADOOP_CONF_DIR=\$HADOOP_HOME/etc/hadoop
$SPARK_HOME/conf/spark-defaults.conf
파일spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir file:///usr/local/spark/eventLog
spark.history.fs.logDirectory file:///usr/local/spark/eventLog
> mkdir -p $SPARK_HOME/eventLog
> $HADOOP_HOME/bin/hdfs namenode -format -force
> $HADOOP_HOME/sbin/start-dfs.sh
> $HADOOP_HOME/sbin/start-yarn.sh
> $HADOOP_HOME/bin/mapred --daemon start historyserver
> $SPARK_HOME/sbin/start-all.sh
> $SPARK_HOME/sbin/start-history-server.sh
> $SPARK_HOME/sbin/stop-all.sh
> $SPARK_HOME/sbin/stop-history-server.sh
> rm -rf $SPARK_HOME/eventLog/*
> $HADOOP_HOME/sbin/stop-dfs.sh
> $HADOOP_HOME/sbin/stop-yarn.sh
> $HADOOP_HOME/bin/mapred --daemon stop historyserver
> rm -rf $HADOOP_HOME/data/namenode/*
모든 DataNode에서 rm -rf $HADOOP_HOME/data/datanode/*
명령을 수행한다.