msinfo32
로 시스템 정보 확인
OS 이름 Microsoft Windows 10
프로세서 AMD Ryzen 9 3900X 12-Core Processor, 3793Mhz, 12 코어, 24 논리 프로세서
설치된 실제 메모리(RAM) 32.0GB
OS : centos 8
mem : 8 (8192)
cpu : 3 cpu
$wget https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
$tar -xvzf spark-3.1.2-bin-hadoop3.2.tgz
$cp -r spark-3.1.2-bin-hadoop3.2 /home/hadoop/spark-3.1.2
$grep -c process /proc/cpuinfo
$vi .bashrc
$export SPARK_HOME=/home/hadoop/spark-3.1.2
$export PATH=$PATH:$HADOOP_HOME/sbin:~~~~:$FLUME_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
$source .bashrc
$echo $SPARK_HOME
$spark-submit --version
$spark-submit --help
설치 완료
pyspark 접속 확인
spark sample examples : https://spark.apache.org/examples.html
github examples : https://github.com/apache/spark/tree/master/examples/src/main/python
$hdfs namenode -format
$start-dfs.sh
$start-yarn.sh
$jps
10529 NodeManager
9747 NameNode
10087 SecondaryNameNode
10312 ResourceManager
9897 DataNode
10781 Jps
$cd /home/hadoop/spark-3.1.2/bin
$ls
$ls ../sbin
$cd /home/hadoop/spark-3.1.2/conf
$cp spark-env.sh.template spark-env.sh
$vi spark-env.sh
export SPARK_WORKER_INSTANCES=2
# Master start
$sh start-master.sh
# Master 확인
$jps
4178 ResourceManager
3604 NameNode
5447 Jps
4488 NodeManager
3755 DataNode
3948 SecondaryNameNode
5390 Master
# 메모리 확인
$free -g
$cd /home/hadoop/spark-3.1.2/sbin
# Slave start
$sh start-slave.sh spark://hadoop00:7077 -m 2g -c 1
# Slave 2개 확인
$jps
4178 ResourceManager
5491 Worker
3604 NameNode
5604 Jps
4488 NodeManager
3755 DataNode
3948 SecondaryNameNode
5549 Worker
5390 Master
$cd /home/hadoop/spark-3.1.2/bin
$sh spark-shell --master spark://hadoop00:7077
# :quit으로 나옴
scala> val lines = sc.textFile("README.md")
lines: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:24
scala> lines.count()
res1: Long = 108
December 31, 2021
부로 EOS 되었다.Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs i n mirrorlist
yum
사용을 할 수 있다.$sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-Linux-*
$sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-Linux-*
$yum repolist
repo id repo name
appstream CentOS Linux 8 - AppStream
baseos CentOS Linux 8 - BaseOS
epel Extra Packages for Enterprise Linux 8 - x86_64
epel-modular Extra Packages for Enterprise Linux Modular 8 - x86_64
extras CentOS Linux 8 - Extras
$yum install python3
$pip3 -V
pip 9.0.3 from /usr/lib/python3.6/site-packages (python 3.6)
$python3 -V
Python 3.6.8
큰 도움이 되었습니다 :)