spark설치

오민석·2022년 11월 4일
0

spark-default.conf : 아무것도 없음
spark-env.sh : SPARK_DIST_CLASSPATH로 /etc/hadoop/conf만 지정

  1. java 설치
    yum install -y wget
    yum install java-1.8.0-openjdk-devel.x86_64
    java -version

  2. 하둡 버전에 맞는 spark 설치
    wget https://dlcdn.apache.org/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz --no-check-certificate
    tar -xvzf spark-3.2.2-bin-hadoop3.2.tgz

  3. spark 환경변수 세팅
    vi ~.bashrc
    export SPARK_HOME=/home/spark/spark3.2
    export PATH=PATH:PATH:SPARK_HOME/bin

  4. UI port 4040 오픈
    firewall-cmd --permanent --zone=public --add-port=4040/tcp
    firewall-cmd --reload

  5. worker3t서버의 /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/hive/conf/hive-site.xml을
    EdgeNode의 /home/spark/spark3.2/conf에 복붙.

  6. spark-shell
    import org.apache.spark.sql.hive.HiveContext
    var hiveContext = new HiveContext(sc)
    var rows=hiveContext.sql("show databases")
    rows.show()

  1. yarn 구성
    worker3t서버에서 /etc/hadoop/conf 파일을 mkdir /etc/hadoop/conf 해당 디렉토리에 cp

  2. /etc/hosts 추가
    192.168.0.131 master1t.woorifg.com master1t
    192.168.0.132 master2t.woorifg.com master2t
    192.168.0.133 worker1t.woorifg.com worker1t
    192.168.0.134 worker2t.woorifg.com worker2t
    192.168.0.135 worker3t.woorifg.com worker3t
    192.168.0.200 hiproj

  3. pyspark
    yum install epel-release
    yum install python-pip
    pip install pyspark

  1. pyspark comman입력
    from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("sample").config("hive.metastore.uris", "thrift://ip:port").enableHiveSupport().getOrCreate()
spark.sql("use db_name")
df = spark.sql("SELECT * FROM table")
df.show()


옵션 값 참고

conf/spark-defaults.conf : spark-submit할 때 우선시 여겨지는 옵션 값들

0개의 댓글