해당 문서는 m1 맥북프로에서 pyspark 환경 설정을 진행한 내용을 정리하기 위해 작성된 문서입니다.(보통 로컬 설치 안하지만 필요시를 위해)
- MacBook Pro
- 14형, 2021년 모델
- Apple M1 Pro 칩
- 메모리 32GB
# https://docs.conda.io/en/latest/miniconda.html
$ wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-MacOSX-arm64.sh
$ ./Miniconda3-py38_4.12.0-MacOSX-arm64.sh
$ brew tap adoptopenjdk/openjdk
$ brew install --cask adoptopenjdk11
아래 명령어로 잘 설치되었는지 확인
java --version
openjdk 11.0.11 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode)
$ wget https://dlcdn.apache.org/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz
$ tar xvf spark-3.3.1-bin-hadoop3.tgz
# ~/.zshrc에 아래 정보 추가
SPARK_HOME=$CURRENT_DIR/spark-3.3.1-bin-hadoop3
HADOOP_HOME=$CURRENT_DIR/spark-3.3.1-bin-hadoop3
PATH=CURRENT_DIR/spark-3.3.1-bin-hadoop3/bin:$PATH
아래처럼 나오면 setting 완료
$ spark-submit --version
22/11/27 16:44:25 WARN Utils: Your hostname, hyunhoui-MacBookPro.local resolves to a loopback address: 127.0.0.1; using 172.20.10.6 instead (on interface en0)
22/11/27 16:44:25 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 11.0.11
Branch HEAD
Compiled by user yumwang on 2022-10-15T09:47:01Z
Revision fbbcf9434ac070dd4ced4fb9efe32899c6db12a9
Url https://github.com/apache/spark
Type --help for more information.
$ conda create -n pyspark python=3.8 -y
$ conda activate pyspark
$ pip install pyspark
잘 생성되었는지 확인
$ python
>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.appName('test-spark').master('local').getOrCreate()
>>> data = spark.read.csv(CSV_PATH, header=True)
>>> data.show(1, False, vertical=True)
결과 확인
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('test-spark').master('local').getOrCreate()
data = spark.read.csv("s3a://test-hyunho/data/OnlineRetail.csv", header=True)
data.show(1, False, vertical=True)
-RECORD 0-----------------------------------------
InvoiceNo | 536365
StockCode | 85123A
Description | WHITE HANGING HEART T-LIGHT HOLDER
Quantity | 6
InvoiceDate | 12/1/2010 8:26
UnitPrice | 2.55
CustomerID | 17850
Country | United Kingdom
only showing top 1 row
항상 잘보고 있습니다