Hadoop 설치

Han Hanju·2021년 6월 22일
0
post-thumbnail

Root 계정 생성

  • 일반계정 로그인
  • root계정 생성
sudo passwd root
  • /etc/ssh/sshd_config 수정
PermitRootLogin yes
  • 설정 적용
systemctl restart sshd

Java version 변경

  • java 설치 확인
(base) root@aidw-010:/usr/lib/jvm# ll
합계 24
drwxr-xr-x   4 root root 4096  528 18:03 ./
drwxr-xr-x 140 root root 4096  526 17:42 ../
-rw-r--r--   1 root root 2047  421 18:15 .java-1.11.0-openjdk-amd64.jinfo
-rw-r--r--   1 root root 2764  421 20:46 .java-1.8.0-openjdk-amd64.jinfo
lrwxrwxrwx   1 root root   25  717  2019 default-java -> java-1.11.0-openjdk-amd64/
lrwxrwxrwx   1 root root   21  421 18:15 java-1.11.0-openjdk-amd64 -> java-11-openjdk-amd64/
lrwxrwxrwx   1 root root   20  421 20:46 java-1.8.0-openjdk-amd64 -> java-8-openjdk-amd64/
drwxr-xr-x   7 root root 4096  429 14:19 java-11-openjdk-amd64/
drwxr-xr-x   7 root root 4096  528 18:03 java-8-openjdk-amd64/
(base) root@aidw-010:/usr/lib/jvm#
  • OpenJDK 8 설치
sudo apt update
sudo apt install openjdk-8-jdk
  • java기본정보 변경

(base) root@aidw-010:/usr/lib/jvm# update-alternatives --config java
대체 항목 java에 대해 (/usr/bin/java 제공) 2개 선택이 있습니다.

  선택       경로                                          우선순▒ 상태
------------------------------------------------------------
* 0            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      자동 모드
  1            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      수동 모드
  2            /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java   1081      수동 모드

현재 선택[*]을 유지하려면 <엔터>를 누르고, 아니면 선택 번호를 입력하시오:
  • 환경변수 변경
(base) root@aidw-010:/usr/lib/jvm# vi /etc/profile

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

SSH 설정

  • namenode가 설치될곳에 각각 ssh공개키를 만들어준다.
(base) root@aidw-001:~/.ssh# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:y1tbQQ19ZPlfiTjy5f4qJTSYaHDjHU4OlaoeFXa5fco root@aidw-004
The key's randomart image is:
+---[RSA 3072]----+
|        ..o .. .+|
|     . * *   o.o.|
|      = % * o o.o|
|       * O B + .o|
|      + S = O   o|
|     o . . E +  .|
|    . . o . =    |
|     .   o + .   |
|        . . ..o. |
+----[SHA256]-----+

(base) root@aidw-001:~/.ssh# ll
합계 28K
drwx------  3 root root 4.0K  622 14:18 ./
drwx------ 44 root root 4.0K  622 14:09 ../
drwxr-xr-x  2 root root 4.0K  622 14:18 back/
-rw-------  1 root root 2.6K  622 14:18 id_rsa
-rw-r--r--  1 root root  567  622 14:18 id_rsa.pub
-rw-r--r--  1 root root 4.4K  57 15:57 known_hosts
  • ssh로 연결할 서버에 공개키를 복사해준다.
    • ssh-copy-id 명령어는 대상 서버의 .ssh 디렉터리 안에 있는 authorized_keys 파일에 공개키를 입력하며, 다음과 같은 형식으로 실행한다.
root@aidw-001:~/.ssh# ssh-copy-id -i ~/.ssh/id_rsa.pub root@aidw-004
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@aidw-004's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@aidw-004'"
and check to make sure that only the key(s) you wanted were added.

root@aidw-001:~/.ssh# ssh aidw-004
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-74-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

94 updates can be applied immediately.
12 of these updates are standard security updates.
추가 업데이트를 확인하려면 apt list --upgradable 을 실행하세요.

Last login: Tue Jun 22 14:09:22 2021 from 1.209.179.131
(base) root@aidw-004:~#
  • aidw-004에 authorized_keys가 공개키정보로 포함된걸 확인 할 수 있다.
drwx------  3 root root 4.0K  622 14:23 ./
drwx------ 44 root root 4.0K  622 14:09 ../
-rw-------  1 root root  567  622 14:23 authorized_keys
drwxr-xr-x  2 root root 4.0K  622 14:18 back/
-rw-------  1 root root 2.6K  622 14:18 id_rsa
-rw-r--r--  1 root root  567  622 14:18 id_rsa.pub
-rw-r--r--  1 root root 4.4K  57 15:57 known_hosts
(base) root@aidw-004:~/.ssh#

Hadoop 3.3.0 설치

  • Download
    • Doenload 서버: master
    • 위치: /usr/local
cd /usr/local; \
sudo wget http://apache.mirror.cdnetworks.com/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz; \
sudo tar xzvf hadoop-3.3.0.tar.gz; \
sudo rm -rf hadoop-3.3.0.tar.gz; \
sudo mv hadoop-3.3.0 hadoop
  • 환경변수 설정(전체서버)
  [root@hadoop-master ~]# vim /etc/profile

  export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.275.b01-1.el8_3.x86_64"
  export HADOOP_HOME="/usr/local/hadoop"
  export PATH="$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:"

하둡 구성

  • 진행노드: master
  • 위치: /usr/local/hadoop/etc/hadoop
  • 설정파일
    • core.site.xml : HDFS, Map Reduce 환경 정보
    • hdfs.site.xml : HDFS 환경 정보
    • yarn-site.xml : yarn 환경 정보
    • mapred-site.xml : Map Reduce 환경 정보
    • hadoop-env.sh : Hadoop 실행 시 필요한 shell script 환경 변수

-core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/dw/hadoop/tmp</value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>aidw-001:2181,aidw-004:2181,aidw-003:2181</value>
    </property>
</configuration>
  • hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/dw/hadoop/namenode</value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/dw/hadoop/datanode</value>
    </property>
    <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/dw/hadoop/journalnode</value>
    </property>

    <property>
      <name>dfs.nameservices</name>
      <value>mycluster</value>
    </property>
    <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>nn1,nn2</value>
    </property>
    
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn1</name>
      <value>aidw-001:8020</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn2</name>
      <value>aidw-004:8020</value>
    </property>

    <property>
      <name>dfs.namenode.http-address.mycluster.nn1</name>
      <value>aidw-001:9870</value>
    </property>
    <property>
      <name>dfs.namenode.http-address.mycluster.nn2</name>
      <value>aidw-004:9870</value>
    </property>

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://aidw-001:8485;aidw-004:8485;aidw-003:8485/mycluster</value>
    </property>
    <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>shell(/bin/true)</value>
    </property>

    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
</configuration>
  • yarn-env.sh
    • 아래 추가
JAVA_HEAP_MAX=Xmx1000m
  • yarn-site.xml
<?xml version="1.0"?>

<configuration>

	<!-- Site specific YARN configuration properties -->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
	<property>
		<name>yarn.nodemanager.local-dirs</name>
		<value>/home/hadoop/hadoopdata/yarn/nm-local-dir</value>
	</property>
	<property>
		<name>yarn.resourcemanager.fs.state-store.uri</name>
		<value>/home/hadoop/hadoopdata/yarn/system/rmstore</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>aidw-001</value>
	</property>
 	<property>
		<name>yarn.web-proxy.address</name>
		<value>0.0.0.0:8089</value>
	</property>
  <property>
    <name>yarn.application.classpath</name>
    <value>
      /usr/local/hadoop/share/hadoop/mapreduce/*,
      /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
      /usr/local/hadoop/share/hadoop/common/*,
      /usr/local/hadoop/share/hadoop/common/lib/*,
      /usr/local/hadoop/share/hadoop/hdfs/*,
      /usr/local/hadoop/share/hadoop/hdfs/lib/*,
      /usr/local/hadoop/share/hadoop/yarn/*,
      /usr/local/hadoop/share/hadoop/yarn/lib/*
    </value>
  </property>

	<!-- for Resource Manager HA configuration -->
	<property>
		<name>yarn.resourcemanager.ha.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>yarn.resourcemanager.cluster-id</name>
		<value>cluster1</value>
	</property>
	<property>
		<name>yarn.resourcemanager.ha.rm-ids</name>
		<value>rm1,rm2</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname.rm1</name>
		<value>aidw-001</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname.rm2</name>
		<value>aidw-004</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm1</name>
		<value>aidw-001:8088</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm2</name>
		<value>aidw-004:8088</value>
	</property>
	<property>
		<name>hadoop.zk.address</name>
		<value>aidw-001:2181,aidw-004:2181,aidw-003:2181</value>
	</property>

</configuration>
  • mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
</configuration>
  • hadoop-env.sh
    • JAVA_HOME의 경우 필수적이나 나머지는 선택사항입니다.
  export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.275.b01-1.el8_3.x86_64"
  export HADOOP_HOME="/usr/local/hadoop"
  export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop"
  export HADOOP_LOG_DIR="$HADOOP_HOME/logs"
  export HADOOP_PID_DIR="$HADOOP_HOME/pids"

Hadoop forder 생성

root@aidw-001:/dw/hadoop# ll
total 16
drwxr-xr-x 4 root root 4096  622 11:28 ./
drwxr-xr-x 7 aidw aidw 4096  69 17:53 ../
drwx------ 3 root root 4096  622 15:04 datanode/
drwxr-xr-x 3 root root 4096  622 15:04 namenode/
root@aidw-001:/dw/hadoop#

Hadoop 배포

  • 설정 해 놓은 Hadoop파일들을 worker node들에게 전송 합니다.
root@aidw-001:/dw/hadoop# scp -r /usr/local/hadoop/ root@aidw-005:/usr/local

Worker 설정

  • 진행노드: master
  • 위치: /usr/local/hadoop/etc/hadoop/workers
aidw-001 ## master
aidw-002
aidw-003

Master 설정

  • 진행노드: master
  • 위치: /usr/local/hadoop/etc/hadoop/masters
aidw-001

Hadoop 시작

  • 진행 노드 : master
  • Filesystem format
  [root@aidw-001 hadoop]# hadoop namenode -format
  WARNING: Use of this script to execute namenode is deprecated.
  WARNING: Attempting to execute replacement "hdfs namenode" instead.

  WARNING: /usr/local/hadoop/pids does not exist. Creating.
  WARNING: /usr/local/hadoop/logs does not exist. Creating.
  2021-03-09 04:37:15,754 INFO namenode.NameNode: STARTUP_MSG:
  /************************************************************
  STARTUP_MSG: Starting NameNode
  STARTUP_MSG:   host = hadoop-master/10.0.0.5
  STARTUP_MSG:   args = [-format]
  STARTUP_MSG:   version = 3.3.0
  ...
  • Hadoop 시작
  [hadoop@hadoop-master logs]$ start-all.sh
  WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
  WARNING: This is not a recommended production deployment configuration.
  WARNING: Use CTRL-C to abort.
  Starting namenodes on [hadoop-master]
  Starting datanodes
  Starting secondary namenodes [hadoop-master]
  Starting resourcemanager
  Starting nodemanagers
  • MasterIP:9870, MasterIP:8088에서 정상작동 확인.

Reference

https://excelsior-cjh.tistory.com/73 zookeeper 적용

https://nasa1515.github.io/data/2021/03/08/hadoop.html

https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html (HA 적용 예제)

https://m.blog.naver.com/PostView.naver?isHttpsRedirect=true&blogId=juner84&logNo=220489980731 (HA 적용 예제)

https://oboki.net/workspace/data-engineering/hadoop/%EA%B0%80%EC%9A%A9%EC%84%B1%EC%9D%84-%EA%B3%A0%EB%A0%A4%ED%95%9C-hadoop-2-x-cluster-%EC%84%A4%EC%B9%98/ (HA 적용 예제)

https://tdoodle.tistory.com/entry/Hadoop-Resource-Manager-HA-%EA%B5%AC%EC%84%B1%ED%95%98%EA%B8%B0 (HA 적용 예제)

profile
Data Analytics Engineer

0개의 댓글

관련 채용 정보