spark-default.conf : 아무것도 없음
spark-env.sh : SPARK_DIST_CLASSPATH로 /etc/hadoop/conf만 지정
java 설치
yum install -y wget
yum install java-1.8.0-openjdk-devel.x86_64
java -version
하둡 버전에 맞는 spark 설치
wget https://dlcdn.apache.org/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz --no-check-certificate
tar -xvzf spark-3.2.2-bin-hadoop3.2.tgz
spark 환경변수 세팅
vi ~.bashrc
export SPARK_HOME=/home/spark/spark3.2
export PATH=SPARK_HOME/bin
UI port 4040 오픈
firewall-cmd --permanent --zone=public --add-port=4040/tcp
firewall-cmd --reload
worker3t서버의 /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/hive/conf/hive-site.xml을
EdgeNode의 /home/spark/spark3.2/conf에 복붙.
spark-shell
import org.apache.spark.sql.hive.HiveContext
var hiveContext = new HiveContext(sc)
var rows=hiveContext.sql("show databases")
rows.show()
yarn 구성
worker3t서버에서 /etc/hadoop/conf 파일을 mkdir /etc/hadoop/conf 해당 디렉토리에 cp
/etc/hosts 추가
192.168.0.131 master1t.woorifg.com master1t
192.168.0.132 master2t.woorifg.com master2t
192.168.0.133 worker1t.woorifg.com worker1t
192.168.0.134 worker2t.woorifg.com worker2t
192.168.0.135 worker3t.woorifg.com worker3t
192.168.0.200 hiproj
pyspark
yum install epel-release
yum install python-pip
pip install pyspark
spark = SparkSession.builder.appName("sample").config("hive.metastore.uris", "thrift://ip:port").enableHiveSupport().getOrCreate()
spark.sql("use db_name")
df = spark.sql("SELECT * FROM table")
df.show()
옵션 값 참고
conf/spark-defaults.conf : spark-submit할 때 우선시 여겨지는 옵션 값들