Hadoop Multi Node Installation on Centos 6.X (Non-Secure Mode)
JDK должна быть установлена
Инсталляция также как и в single. Разница только в конфигах.
Имеем 4 виртуальные машины Centos
Обязательные пакеты для работы. Устанавливаются на все компьютеры кластера.
# yum install -y \
openssh-clients
Конфиг везде разный, но выглядит приблизительно следующим образом:
# vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoopmaster1.localdomain
# vi /etc/hosts
192.168.1.10 hadoopmaster1.localdomain hadoopmaster1
192.168.1.11 hadoopslave1.localdomain hadoopslave1
192.168.1.12 hadoopslave2.localdomain hadoopslave2
192.168.1.13 hadoopslave3.localdomain hadoopslave3
Делаем возможность подключиться к localhost по SSH без пароля (hadoopmaster1, hadoopslave1, hadoopslave2, hadoopslave3)
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0700 ~/.ssh/authorized_keys
$ ssh localhost
Делаем возможность подключиться к узлам по SSH без пароля между (hadoopmaster1 и hadoopslave1, hadoopslave2, hadoopslave3)
hadoopslave1
$ scp ~/.ssh/id_dsa.pub hadoop@hadoopmaster1:/tmp/id_dsa_slave1.pub
hadoopmaster1
$ cat /tmp/id_dsa_slave1.pub >> ~/.ssh/authorized_keys
$ scp ~/.ssh/id_dsa.pub hadoop@hadoopslave1:/tmp/id_dsa_master1.pub
hadoopslave1
$ cat /tmp/id_dsa_master1.pub >> ~/.ssh/authorized_keys
$ ssh hadoopmaster1
$ ssh hadoopslave1
=====
hadoopslave2
$ scp ~/.ssh/id_dsa.pub hadoop@hadoopmaster1:/tmp/id_dsa_slave2.pub
hadoopmaster1
$ cat /tmp/id_dsa_slave2.pub >> ~/.ssh/authorized_keys
$ scp ~/.ssh/id_dsa.pub hadoop@hadoopslave2:/tmp/id_dsa_master1.pub
hadoopslave2
$ cat /tmp/id_dsa_master1.pub >> ~/.ssh/authorized_keys
$ ssh hadoopmaster1
$ ssh hadoopslave2
=====
hadoopslave3
$ scp ~/.ssh/id_dsa.pub hadoop@hadoopmaster1:/tmp/id_dsa_slave3.pub
hadoopmaster1
$ cat /tmp/id_dsa_slave3.pub >> ~/.ssh/authorized_keys
$ scp ~/.ssh/id_dsa.pub hadoop@hadoopslave3:/tmp/id_dsa_master1.pub
hadoopslave3
$ cat /tmp/id_dsa_master1.pub >> ~/.ssh/authorized_keys
$ ssh hadoopmaster1
$ ssh hadoopslave3
hadoopmaster1
$ mkdir -p ~/hadoop_data/hdfs/namenode
$ vi /opt/hadoop/2.7.1/etc/hadoop/hdfs-site.xml
***
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop_data/hdfs/namenode</value>
</property>
</configuration>
hadoopslave1, hadoopslave2, hadoopslave3
$ mkdir -p ~/hadoop_data/hdfs/datanode
$ vi /opt/hadoop/2.7.1/etc/hadoop/hdfs-site.xml
***
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop_data/hdfs/datanode</value>
</property>
</configuration>
На всех компьютерах, работающих в кластере (hadoopmaster1, hadoopslave1, hadoopslave2, hadoopslave3)
$ vi /opt/hadoop/2.7.1/etc/hadoop/core-site.xml
***
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopmaster1:9000</value>
</property>
</configuration>
$ vi /opt/hadoop/2.7.1/etc/hadoop/yarn-site.xml
***
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopmaster1:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopmaster1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopmaster1:8050</value>
</property>
</configuration>
$ cp /opt/hadoop/2.7.1/etc/hadoop/mapred-site.xml.template /opt/hadoop/2.7.1/etc/hadoop/mapred-site.xml
$ vi /opt/hadoop/2.7.1/etc/hadoop/mapred-site.xml
***
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
$ vi /opt/hadoop/2.7.1/etc/hadoop/masters
hadoopmaster1
$ vi /opt/hadoop/2.7.1/etc/hadoop/slaves
hadoopslave1
hadoopslave2
hadoopslave3
Запуск на hadoopmaster1
$ hadoop namenode -format
$ start-all.sh
$ jps
3744 SecondaryNameNode
4005 Jps
2169 NameNode
1599 ResourceManager
hadoopslave1, hadoopslave2, hadoopslave3
$ jps
1552 NodeManager
1265 DataNode
1739 Jps
Можно подключиться браузером и проверить, сколько нод в кластере.
Summary
http://192.168.1.10:50070/
http://192.168.1.10:50090/
All Applications
http://192.168.1.10:8088/
Links:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
http://www.youtube.com/watch?v=DteSiloXesw