集群规划环境说明
Windows下安装VirtualBox,用CentOS安装CentOS7 64bit虚拟机并克隆,共创建4台虚拟机,CentOS用户名root,密码123456。地址和机器名分别为:
IP | 角色 | 名字 |
---|---|---|
192.168.11.10 | namenode | master.hadoop |
192.168.11.11 | namenode | slaver1.hadoop |
192.168.11.12 | namenode | slaver2.hadoop |
192.168.11.13 | namenode | slaver3.hadoop |
namenode节点不允许存储数据
下载所需软件,分别为: jdk-8u45-linux-x64.tar.gz hadoop-2.2.0.tar.gz (下载地址:http://apache.fayea.com/apache-mirror/hadoop/common/stable/hadoop-2.2.0.tar.gz)
在BIOS中设置Intel Virtual Technology修改为Enabled。
配置CentOS 7
- 配置CentOS 7的静态IP
在CentOS虚拟机的控制台下,修改/etc/sysconfig/network-scripts/ifcfg-ens33文件的配置。
$ vi /etc/sysconfig/network-scripts/ifcfg-ens33
修改的配置内容如下。
TYPE=Ethernet
BOOTPROTO=static
NAME=ens33
DEVICE=ens33
ONBOOT=yes
IPADDR=192.168.11.10
GATEWAY=192.168.11.2
NETMASK=255.255.255.0
DNS1=192.168.11.2
修改/etc/resolv.conf文件,
$ vi /etc/resolv.conf
添加以下内容。
nameserver 192.168.11.2
修改完配置文件后,输入service network restart
命令,重启网络。
- 关闭防火墙
1)关闭防火墙1 $ systemctl stop firewalld.service #停止firewall $ systemctl disable firewalld.service #禁止firewall开机启动
2) 关闭防火墙2 $ vi /etc/sysconfig/selinux
然后输入以下命令彻底关闭selinux。
$ setenforce 0
修改防火墙配置文件后,最好重启下电脑。
Hosts文件配置
配置 /etc/hosts 文件
[root@localhost ~]# vi /etc/hosts
添加如下内容:
192.168.11.10 master.hadoop
192.168.11.11 slaver1.hadoop
192.168.11.12 slaver2.hadoop
192.168.11.13 slaver3.hadoop
永久修改主机名
CentOS上永久修改主机名。
[root@localhost ~]# vi /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=localhost.localdomain
GATEWAY=192.168.0.1
修改network的HOSTNAME项。点前面是主机名,点后面是域名。没有点就是主机名。修改后的内容如下:
[root@localhost ~]# cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=master.hadoop
GATEWAY=192.168.0.1
或者 配置namenode节点
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=master.hadoop
GATEWAY=192.168.11.2
配置datanode节点
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=slaver1.hadoop
GATEWAY=192.168.11.2
重启计算机,使主机名生效。
禁用IPV6
1 修改/etc/sysconfig/network,
[root@master ~]# vi /etc/sysconfig/network
追加:
NETWORKING_IPV6=no
2 修改/etc/hosts,
[root@master ~]# vi /etc/hosts
把ipv6的本地主机域名解析的行注释掉:
#::1 localhost localhost6 localhost6.localdomain6
3 让系统不加载ipv6相关模块,需要修改modprobe相关设定文件:
[root@master ~]# vi /etc/modprobe.d/anaconda.conf
install ipv6 /bin/true
4 重启系统,然后确认:
[root@master ~]# lsmod | grep -i ipv6
[root@master ~]# ifconfig | grep -i inet6
如果上述2个命令执行的结果没有任何显示,那么说明ipv6已经被完全禁止了。
建立hduser用户和hadoop组
1.创建用户组与用户名
[root@master ~]# groupadd hadoop
[root@master ~]# useradd -g hadoop hduser
2.设置密码
[root@master ~]# passwd hduser
这里设置密码为123456
3.为hduser用户增加sudo权限 如果运行sudo命令时有如下提示:
ssh is not in the sudoers file.This incident will be reported
解决方法如下:
1)、进入超级用户模式。也就是输入"su -",系统会让你输入超级用户密码,输入密码后就进入了超级用户模式。
2)、添加文件的写权限。也就是输入命令"chmod u+w /etc/sudoers"。
3)、编辑/etc/sudoers文件。也就是输入命令"vi /etc/sudoers",输入"i"进入编辑模式,找到这一行:"root ALL=(ALL) ALL"在起下面添加"xxx ALL=(ALL) ALL"(这里的xxx是你的用户名),然后保存(就是先摁一下Esc键,然后输入":wq")退出。 hduser ALL=(ALL) ALL
4)、撤销文件的写权限。也就是输入命令"chmod u-w /etc/sudoers"。
4.提升权限
[root@master ~]# su hduser
[hduser@master root]$ cd ~
[hduser@master ~]$ sudo -i
[sudo] password for hduser:
[root@master ~]#
安装SSH Client
1.在虚拟机中安装SSH Client 安装openssh-clients
[root@master ~]# yum install -y openssh-clients
测试是否可以连接master.hadoop
[root@master ~]# ssh root@master.hadoop
2.在本地操作系统中安装SSHSecureShellClient-3.2.9.zip,测试是否可以连接192.168.11.10
SSH设置账户无密码登录
1,每台机器执行命令生成秘钥文件
会在当前目录下生成 .ssh隐藏目录
在.ssh目录下会发现 id_rsa, id_rsa.pub , 在hadoop每个节点都执行以上命令。
2,每个节点下的 .ssh目录下的 id_rsa.pub文件的内容都保存到 authorized_keys 文件下。
节点master
节点 slaver1
节点 slaver2
节点 slaver3
修改 authorized_keys 文件
[root@master~]# vi authorized_keys
以hduser账号登录,把 .ssh/authorized_keys文件拷贝到节点slaver1,slaver2,slaver3下的.ssh目录下。
3, 给所有者(hduser)权限
chmod 600 authorized_keys
安装Java
使用hduser账户登录 下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 安装方法参考:http://docs.oracle.com/javase/8/docs/technotes/guides/install/linux_jdk.html#BJFGGEFG
使用CentOS7卸载自带jdk安装自己的JDK1.8 https://blog.csdn.net/hui_2016/article/details/69941850
rpm -qa | grep java
然后通过 rpm -e --nodeps 后面跟系统自带的jdk名这个命令来删除系统自带的jdk,
1.解压jdk压缩包
[hduser@master ~]$ tar zxvf jdk-8u45-linux-x64.tar.gz
保证/usr/local/java目录不存在时执行如下命令,如果存在请先删除java目录。
[hduser@master ~]$ sudo mv jdk1.8.0_45 /usr/local/java
设置java文件夹的权限:
[hduser@master ~]$ sudo chown -R hduser:hadoop /usr/local/java
2.设置环境变量
[hduser@master ~]$ sudo vi /etc/profile
添加环境变量
export JAVA_HOME=/usr/local/java/
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
使环境变量立即生效
[hduser@master ~]$ source /etc/profile
3.验证java是否安装成功,如果显示如下内容说明安装成功。
[hduser@master ~]$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) Client VM (build 25.45-b02, mixed mode)
创建所需目录
[hduser@master ~]$ sudo mkdir -p /app/hadoop
[hduser@master ~]$ sudo chown -R hduser:hadoop /app/hadoop
[hduser@master ~]$ mkdir -p /app/hadoop/dfs/name
# ...and if you want to tighten up security, chmod from 755 to 750...
[hduser@master ~]$ chmod 750 /app/hadoop/dfs/name
[hduser@master ~]$ mkdir -p /app/hadoop/dfs/data
# ...and if you want to tighten up security, chmod from 755 to 750...
[hduser@master ~]$ chmod 750 /app/hadoop/dfs/data
[hduser@master ~]$ mkdir -p /app/hadoop/tmp
# ...and if you want to tighten up security, chmod from 755 to 750...
[hduser@master ~]$ chmod 750 /app/hadoop/tmp
[hduser@master ~]$ mkdir -p /app/hadoop/tmp/node
# ...and if you want to tighten up security, chmod from 755 to 750...
[hduser@master ~]$ chmod 750 /app/hadoop/tmp/node
[hduser@master ~]$ mkdir -p /app/hadoop/tmp/app-logs
# ...and if you want to tighten up security, chmod from 755 to 750...
[hduser@master ~]$ chmod 750 /app/hadoop/tmp/app-logs
安装Hadoop
下载地址:http://apache.claz.org/hadoop/common/hadoop-2.2.0/
1.解压hadoop安装包
[hduser@master ~]$ tar zxvf hadoop-2.2.0.tar.gz
保证/usr/local/hadoop目录不存在时执行如下命令,如果存在请先删除hadoop目录。
[hduser@master ~]$ sudo mv hadoop-2.2.0 /usr/local/hadoop
[hduser@master ~]$ sudo chown -R hduser:hadoop /usr/local/hadoop
[hduser@master ~]$ vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
如下修改:
# The java implementation to use. Required.
export JAVA_HOME=/usr/local/java
[hduser@master ~]$ vi $HADOOP_HOME/etc/hadoop/yarn-env.sh
如下修改:
# some Java parameters
export JAVA_HOME=/usr/local/java
[hduser@master ~]$ vi $HADOOP_HOME/etc/hadoop/mapred-env.sh
如下修改:
# some Java parameters
export JAVA_HOME=/usr/local/java
修改 $HADOOP_HOME/etc/hadoop/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master.hadoop:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description></description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
参数hadoop.proxyuser.hduser.hosts,hadoop.proxyuser.hduser.groups依据执行hadoop命令的用户名不同而不同,如hduser用户的配置如上;如果是root则写成:hadoop.proxyuser.root.hosts,hadoop.proxyuser.root.groups
修改 $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master.hadoop:50060</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/app/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/app/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.http.address</name>
<value>master.hadoop:50070</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>1073741824</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
修改 $HADOOP_HOME/etc/hadoop/mapred-site.xml
[hduser@master ~]$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
[hduser@master ~]$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
添加以下内容
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master.hadoop:9020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master.hadoop:9888</value>
</property>
</configuration>
修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>master.hadoop:9001</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master.hadoop:9030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master.hadoop:9088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master.hadoop:9025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master.hadoop:9040</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
修改 $HADOOP_HOME/etc/hadoop/slaves
[hduser@master ~]$ vi $HADOOP_HOME/etc/hadoop/slaves
删除localhost 添加如下三行:
slaver1.hadoop
slaver2.hadoop
slaver3.hadoop
启动停止hadoop
1.启动hdfs
[hduser@master ~]$ start-dfs.sh
通过jps命令查看hdfs进程启动情况,在master节点下执行jps,应该包含如下进程:
[hduser@master ~]$ jps
21808 NameNode
22099 Jps
21999 SecondaryNameNode
在slaver节点下执行jps,应该包含如下进程:
[hduser@slaver1 ~]$ jps
1558 Jps
1469 DataNode
2.启动yarn
[hduser@master ~]$ start-yarn.sh
通过jps命令查看yarn进程启动情况,在master节点下执行jps,应该包含如下进程:
[hduser@master ~]$ jps
22229 Jps
21808 NameNode
22156 ResourceManager
21999 SecondaryNameNode
在slaver节点下执行jps,应该包含如下进程:
[hduser@slaver1 ~]$ jps
1734 Jps
1621 NodeManager
1469 DataNode
3.停止yarn
[hduser@master ~]$ stop-yarn.sh
4.停止hdfs
[hduser@master ~]$ stop-dfs.sh
5.启动hadoop
[hduser@master ~]$ start-all.sh
start-all.sh 脚本包含start-dfs.sh和start-yarn.sh 两个脚本,所以启动时可以直接执行此脚本。通过jps命令查看hadoop进程启动情况,在master节点下执行jps,应该包含如下进程:
[hduser@master ~]$ jps
3319 NameNode
3704 Jps
3494 SecondaryNameNode
3632 ResourceManager
在slaver节点下执行jps,应该包含如下进程:
[hduser@slaver1 ~]$ jps
2217 NodeManager
2324 Jps
2115 DataNode
6.停止hadoop
[hduser@master ~]$ stop-all.sh
启动停止hadoop命令只在master.hadoop中执行,不需要在slaver节点上执行。执行以上命令的用户必须是hduser,不能是root,root会出现权限错误。
启动停止WebHDFS
启动WebHDFS。
[hduser@master ~]$ httpfs.sh start
Setting HTTPFS_HOME: /usr/local/hadoop
Setting HTTPFS_CONFIG: /usr/local/hadoop/etc/hadoop
Sourcing: /usr/local/hadoop/etc/hadoop/httpfs-env.sh
Setting HTTPFS_LOG: /usr/local/hadoop/logs
Setting HTTPFS_TEMP: /usr/local/hadoop/temp
Setting HTTPFS_HTTP_PORT: 14000
Setting HTTPFS_ADMIN_PORT: 14001
Setting HTTPFS_HTTP_HOSTNAME: master.hadoop
Setting CATALINA_BASE: /usr/local/hadoop/share/hadoop/httpfs/tomcat
Setting HTTPFS_CATALINA_HOME: /usr/local/hadoop/share/hadoop/httpfs/tomcat
Setting CATALINA_OUT: /usr/local/hadoop/logs/httpfs-catalina.out
Setting CATALINA_PID: /tmp/httpfs.pid
Using CATALINA_OPTS:
Adding to CATALINA_OPTS: -Dhttpfs.home.dir=/usr/local/hadoop -Dhttpfs.config.dir=/usr/local/hadoop/etc/hadoop -Dhttpfs.log.dir=/usr/local/hadoop/logs -Dhttpfs.temp.dir=/usr/local/hadoop/temp -Dhttpfs.admin.port=14001 -Dhttpfs.http.port=14000 -Dhttpfs.http.hostname=master.hadoop
Using CATALINA_BASE: /usr/local/hadoop/share/hadoop/httpfs/tomcat
Using CATALINA_HOME: /usr/local/hadoop/share/hadoop/httpfs/tomcat
Using CATALINA_TMPDIR: /usr/local/hadoop/share/hadoop/httpfs/tomcat/temp
Using JRE_HOME: /usr/local/java
Using CLASSPATH: /usr/local/hadoop/share/hadoop/httpfs/tomcat/bin/bootstrap.jar
Using CATALINA_PID: /tmp/httpfs.pid
在浏览器中打开如下URL。
http://192.168.0.100:14000/
可以查看到如下内容,表示启动成功。
HttpFs service, service base URL at /webhdfs/v1.
查看目录状态。
http://192.168.0.100:14000/webhdfs/v1/user?user.name=hduser&op=GETFILESTATUS
curl查询文件夹。
[hduser@master ~]$ curl -i -X GET "http://192.168.0.100:14000/webhdfs/v1/user?user.name=hduser&op=GETFILESTATUS"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth="u=hduser&p=hduser&t=simple&e=1468111841566&s=WFsFggdWW8fURsl4OXhMjacLAKk="; Version=1; Path=/
Content-Type: application/json
Transfer-Encoding: chunked
Date: Sat, 09 Jul 2016 14:50:41 GMT
{"FileStatus":{"pathSuffix":"","type":"DIRECTORY","length":0,"owner":"hduser","group":"supergroup","permission":"755","accessTime":0,"modificationTime":1462320410882,"blockSize":0,"replication":0}}
停止WebHDFS。
[hduser@master ~]$ httpfs.sh stop
Setting HTTPFS_HOME: /usr/local/hadoop
Setting HTTPFS_CONFIG: /usr/local/hadoop/etc/hadoop
Sourcing: /usr/local/hadoop/etc/hadoop/httpfs-env.sh
Setting HTTPFS_LOG: /usr/local/hadoop/logs
Setting HTTPFS_TEMP: /usr/local/hadoop/temp
Setting HTTPFS_HTTP_PORT: 14000
Setting HTTPFS_ADMIN_PORT: 14001
Setting HTTPFS_HTTP_HOSTNAME: master.hadoop
Setting CATALINA_BASE: /usr/local/hadoop/share/hadoop/httpfs/tomcat
Setting HTTPFS_CATALINA_HOME: /usr/local/hadoop/share/hadoop/httpfs/tomcat
Setting CATALINA_OUT: /usr/local/hadoop/logs/httpfs-catalina.out
Setting CATALINA_PID: /tmp/httpfs.pid
Using CATALINA_OPTS:
Adding to CATALINA_OPTS: -Dhttpfs.home.dir=/usr/local/hadoop -Dhttpfs.config.dir=/usr/local/hadoop/etc/hadoop -Dhttpfs.log.dir=/usr/local/hadoop/logs -Dhttpfs.temp.dir=/usr/local/hadoop/temp -Dhttpfs.admin.port=14001 -Dhttpfs.http.port=14000 -Dhttpfs.http.hostname=master.hadoop
Using CATALINA_BASE: /usr/local/hadoop/share/hadoop/httpfs/tomcat
Using CATALINA_HOME: /usr/local/hadoop/share/hadoop/httpfs/tomcat
Using CATALINA_TMPDIR: /usr/local/hadoop/share/hadoop/httpfs/tomcat/temp
Using JRE_HOME: /usr/local/java
Using CLASSPATH: /usr/local/hadoop/share/hadoop/httpfs/tomcat/bin/bootstrap.jar
Using CATALINA_PID: /tmp/httpfs.pid
访问地址
编辑主机C:\Windows\System32\drivers\etc\hosts文件,增加如下内容:
192.168.11.10 master.hadoop
192.168.11.11 slaver1.hadoop
192.168.11.12 slaver2.hadoop
192.168.11.13 slaver3.hadoop
查看Hadoop资源管理器
http://master.hadoop:50070/
http://master.hadoop:50060/
http://master.hadoop:9088/cluster
测试HDFS
1.首先准备两个本地文档
[hduser@master ~]$ vi dfstest1.txt
[hduser@master ~]$ vi dfstest2.txt
在其中写入一些单词,以空格分开
2.在hdfs中新建目录
[hduser@master ~]$ hdfs dfs -mkdir /myfile
3.查看文件的上传是否成功
[hduser@master ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x - hduser supergroup 0 2014-11-19 08:53 /myfile
4.上传本地文件到hdfs中指定的目录;
[hduser@master ~]$ hdfs dfs -put /home/hduser/dfstest*.txt /myfile
[hduser@master ~]$ hdfs dfs -ls /myfile
Found 2 items
-rw-r--r-- 3 hduser supergroup 135 2014-11-19 08:56 /myfile/dfstest1.txt
-rw-r--r-- 3 hduser supergroup 280 2014-11-19 08:56 /myfile/dfstest2.txt
测试经典示例wordcount
1.运行wordcount
[hduser@master ~]$ hdfs dfs -mkdir /myfile
[hduser@master ~]$ hdfs dfs -put /home/hduser/word.txt /myfile/word.txt
[hduser@master ~]$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /myfile /myfileout
2.查看运行结果
[hduser@master ~]$ hdfs dfs -ls /myfileout/
Found 2 items
-rw-r--r-- 3 hduser supergroup 0 2014-11-19 09:01 /myfileout/_SUCCESS
-rw-r--r-- 3 hduser supergroup 399 2014-11-19 09:01 /myfileout/part-r-00000
查看MapReduce输出结果:
[hduser@master ~]$ hdfs dfs -cat /myfileout/part-r-00000
Hello 1
How 1
This 1
a 1
are 2
is 1
ok 1
test 1
word, 1
you 2
至此,Hadoop2.2.0分布式集群部署完成。
查看hadoop日志
/usr/local/hadoop/logs
关于作者
王硕,十年软件开发经验,业余产品经理,精通Java/Python/Go等,喜欢研究技术,著有《PyQt快速开发与实战》《Python 3.* 全栈开发》,多个业余开源项目托管在GitHub上,欢迎微博交流: