5개의 컴퓨터를 사용해서 Hadoop 클러스터를 설치한다.
Hadoop 버전 :3.3.0
server01 | server02 | server03 | server04 | server05 |
---|---|---|---|---|
NameNode | SecondaryNameNode | |||
NodeManager | DataNode | DataNode | DataNode | DataNode |
ResourceManager | ||||
JobHistoryServer |
> sudo apt-get install openjdk-8-jdk -y
/etc/hosts
파일xxx.xxx.xxx.xxx server01
xxx.xxx.xxx.xxx server02
xxx.xxx.xxx.xxx server03
xxx.xxx.xxx.xxx server04
xxx.xxx.xxx.xxx server05
> sudo apt-get install openssh-server -y
> sudo service ssh start
> ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
> chmod 0600 ~/.ssh/authorized_keys
모든 서버의 ~/.ssh/id_rsa.pub
파일의 내용을 합쳐서 각각의 서버의 ~/.ssh/authorized_keys
파일에 추가한다.
> sudo curl -o /usr/local/hadoop-3.3.0.tar.gz https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
> sudo mkdir -p /usr/local/hadoop && sudo tar -xvzf /usr/local/hadoop-3.3.0.tar.gz -C /usr/local/hadoop --strip-components 1
> sudo rm -rf /usr/local/hadoop-3.3.0.tar.gz
> sudo chown -R $USER:$USER /usr/local/hadoop
~/.bashrc
파일JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
HADOOP_HOME=/usr/local/hadoop
YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export JAVA_HOME HADOOP_HOME YARN_CONF_DIR HADOOP_CONF_DIR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
$HADOOP_HOME/etc/hadoop/core-site.xml
파일<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://server01:9000</value>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
파일<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/data/datanode</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>server02:50090</value>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/yarn-site.xml
파일하나의 컨테이너에 할당할 수 있는 값
yarn.scheduler.maximum-allocation-vcores
yarn.scheduler.minimum-allocation-vcores
yarn.scheduler.maximum-allocation-mb
yarn.scheduler.minimum-allocation-mb
클러스터의 각 노드에서 컨테이너 운영에 설정할 수 있는 Vcores, 메모리 총량
메모리 최대값 설정 시 노드의 OS를 운영할 기본적인 용량(대략 4G)를 제외하고 설정
yarn.nodemanager.resource.cpu-vcores
yarn.nodemanager.resource.memory-mb
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>server01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/mapred-site.xml
파일<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/hadoop-env.sh
파일export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
$HADOOP_HOME/etc/hadoop/workers
파일server02
server03
server04
server05
> $HADOOP_HOME/bin/hdfs namenode -format -force
> $HADOOP_HOME/sbin/start-dfs.sh
> $HADOOP_HOME/sbin/start-yarn.sh
> $HADOOP_HOME/bin/mapred --daemon start historyserver
> $HADOOP_HOME/sbin/stop-dfs.sh
> $HADOOP_HOME/sbin/stop-yarn.sh
> $HADOOP_HOME/bin/mapred --daemon stop historyserver
> rm -rf $HADOOP_HOME/data/namenode/*
> rm -rf $HADOOP_HOME/data/datanode/*
호스트 설정 부분에서 하는 ip 주소는 실제 slave 로 사용할 pc 의 ip 주소인가요?