|
网上的转载,但超详细http://blog.csdn.net/xiaojin21cen/article/details/42611073
我主要关注第三个问题,他们是主、备关系,但是datanode必须要求一致的吧
问题导读:
1、hadoop ha是通过什么配置实现自动切换的?
2、配置中mapred与mapreduce的区别是什么?
3、hadoop ha两个namenode之间的关系是什么?
-- hadoop 版本:2.4.0
-- 安装包名:
hadoop-2.4.0.tar.gz 或者源码版本 hadoop-2.4.0-src.tar.gz(我hadoop、hbase、hive均是用的源码编译安装)
-- 安装参考:
http://www.netfoucs.com/article/book_mmicky/79985.html
http://www.byywee.com/page/M0/S934/934356.html
http://www.itpub.net/thread-1631536-1-1.html
http://demo.netfoucs.com/u014393917/article/details/25913363
http://www.aboutyun.com/thread-8294-1-1.html
-- 找不到本地库
参考:http://www.ercoppa.org/Linux-Com ... -hadoop-library.htm
-- lzo支持,
参考:http://blog.csdn.net/zhangzhaokun/article/details/17595325
http://slaytanic.blog.51cto.com/2057708/1162287/
http://hi.baidu.com/qingchunranzhi/item/3662ed5ed29d37a1adc85709
-- 安装以下RPM包:
yum -y install openssh*
yum -y install man*
yum -y install compat-libstdc++-33*
yum -y install libaio-0.*
yum -y install libaio-devel*
yum -y install sysstat-9.*
yum -y install glibc-2.*
yum -y install glibc-devel-2.* glibc-headers-2.*
yum -y install ksh-2*
yum -y install libgcc-4.*
yum -y install libstdc++-4.*
yum -y install libstdc++-4.*.i686*
yum -y install libstdc++-devel-4.*
yum -y install gcc-4.*x86_64*
yum -y install gcc-c++-4.*x86_64*
yum -y install elfutils-libelf-0*x86_64* elfutils-libelf-devel-0*x86_64*
yum -y install elfutils-libelf-0*i686* elfutils-libelf-devel-0*i686*
yum -y install libtool-ltdl*i686*
yum -y install ncurses*i686*
yum -y install ncurses*
yum -y install readline*
yum -y install unixODBC*
yum -y install zlib
yum -y install zlib*
yum -y install openssl*
yum -y install patch
yum -y install git
yum -y -y install lzo-devel zlib-devel gcc autoconf automake libtool
yum -y install lzop
yum -y install lrzsz
yum -y -y install lzo-devel zlib-devel gcc autoconf automake libtool
yum -y install nc
yum -y install glibc
yum -y install java-1.7.0-openjdk
yum -y install gzip
yum -y install zlib
yum -y install gcc
yum -y install gcc-c++
yum -y install make
yum -y install protobuf
yum -y install protoc
yum -y install cmake
yum -y install openssl-devel
yum -y install ncurses-devel
yum -y install unzip
yum -y install telnet
yum -y install telnet-server
yum -y install wget
yum -y install svn
yum -y install ntpdate
-- hive 安装,参考:http://kicklinux.com/hive-deploy/
5台服务器设计图
IP地址主机名NameNodeJournalNodeDataNodeZookeeperHbaseHive192.168.117.194funshion-hadoop194是是否是是否192.168.117.195funshion-hadoop195是是否是是否192.168.117.196funshion-hadoop196否是是是是(Master)是(Mysql)192.168.117.197funshion-hadoop197否是是是是否192.168.117.198funshion-hadoop198否是是是是否 -- 配置Linux、安装JDK
--参考:linux(ubuntu)安装Java jdk环境变量设置及小程序测试
-- Step 1. 建立用户hadoop的ssh无密码登陆
--参考:
linux(ubuntu)无密码互通、相互登录高可靠文档
CentOS6.4之图解SSH无验证双向登陆配置
-- Step 2. zookeeper配置(配置奇数台zk集群,我用的5台)
-- 参考:Zookeeper集群环境安装过程详解
-- Step 3. Hadoop集群配置:
-- Step 3.1 vi $HADOOP_HOME/etc/hadoop/slaves
funshion-hadoop196
funshion-hadoop197
funshion-hadoop198
-- Step 3.2 vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh (添加 JAVA_HOME 环境变量、本地library库)
export JAVA_HOME=/usr/java/latest
export LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native"
-- 注意:${HADOOP_PREFIX}/lib/native 下的内容如下:
[hadoop@funshion-hadoop194 native]$ pwd
/usr/local/hadoop/lib/native
[hadoop@funshion-hadoop194 native]$ ls -l
total 8640
-rw-r--r--. 1 hadoop hadoop 2850660 Jun 9 14:58 hadoop-common-2.4.0.jar
-rw-r--r--. 1 hadoop hadoop 1509888 Jun 9 14:58 hadoop-common-2.4.0-tests.jar
-rw-r--r--. 1 hadoop hadoop 178637 Jun 9 14:58 hadoop-lzo-0.4.20-SNAPSHOT.jar
-rw-r--r--. 1 hadoop hadoop 145385 Jun 9 14:58 hadoop-nfs-2.4.0.jar
-rw-r--r--. 1 hadoop hadoop 983042 Jun 6 19:36 libhadoop.a
-rw-r--r--. 1 hadoop hadoop 1487284 Jun 6 19:36 libhadooppipes.a
lrwxrwxrwx. 1 hadoop hadoop 18 Jun 6 19:42 libhadoop.so -> libhadoop.so.1.0.0
-rwxr-xr-x. 1 hadoop hadoop 586664 Jun 6 19:36 libhadoop.so.1.0.0
-rw-r--r--. 1 hadoop hadoop 582040 Jun 6 19:36 libhadooputils.a
-rw-r--r--. 1 hadoop hadoop 298178 Jun 6 19:36 libhdfs.a
lrwxrwxrwx. 1 hadoop hadoop 16 Jun 6 19:42 libhdfs.so -> libhdfs.so.0.0.0
-rwxr-xr-x. 1 hadoop hadoop 200026 Jun 6 19:36 libhdfs.so.0.0.0
drwxrwxr-x. 2 hadoop hadoop 4096 Jun 6 20:37 Linux-amd64-64
-- Step 3.3 vi $HADOOP_HOME/etc/hadoop/core-site.xml
-- (注意:fs.default.FS参数在两个namenode节点均一样,即5台机器的core-site.xml文件内容完全一样)
fs.defaultFS
hdfs://mycluster
dfs.ha.fencing.methods
sshfence
dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa_nn2
ha.zookeeper.quorum
funshion-hadoop194:2181,funshion-hadoop195:2181,funshion-hadoop196:2181,funshion-hadoop197:2181,funshion-hadoop198:2181
io.compression.codecs
org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec
io.compression.codec.lzo.class
com.hadoop.compression.lzo.LzoCodec
io.file.buffer.size
131072
hadoop.tmp.dir
/home/hadoop/tmp
Abase for other temporary directories.
hadoop.proxyuser.hadoop.hosts
*
hadoop.proxyuser.hadoop.groups
*
hadoop.native.lib
true
ha.zookeeper.session-timeout.ms
60000
ms
ha.failover-controller.cli-check.rpc-timeout.ms
60000
ipc.client.connect.timeout
20000
-- 注意:属性值dfs.ha.fencing.ssh.private-key-files的值id_rsa_nn2 是privatekey(即/home/hadoop/.ssh/目录id_rsa文件的拷贝,且权限为600)
dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa_nn2
-- Step 3.4 vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
dfs.nameservices
mycluster
dfs.ha.namenodes.mycluster
nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1
funshion-hadoop194:8020
dfs.namenode.rpc-address.mycluster.nn2
funshion-hadoop195:8020
dfs.namenode.servicerpc-address.mycluster.nn1
funshion-hadoop194:53310
:q
dfs.namenode.servicerpc-address.mycluster.nn2
funshion-hadoop195:53310
dfs.namenode.http-address.mycluster.nn1
funshion-hadoop194:50070
dfs.namenode.http-address.mycluster.nn2
funshion-hadoop195:50070
dfs.namenode.shared.edits.dir
qjournal://funshion-hadoop194:8485;funshion-hadoop195:8485;funshion-hadoop196:8485;funshion-hadoop197:8485;funshion-hadoop198:8485/mycluster
dfs.journalnode.edits.dir
/home/hadoop/mydata/journal
dfs.client.failover.proxy.provider.mycluster
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.automatic-failover.enabled
true
dfs.namenode.name.dir
file:///home/hadoop/mydata/name
dfs.datanode.data.dir
file:///home/hadoop/mydata/data
dfs.replication
2
dfs.image.transfer.bandwidthPerSec
1048576
-- Step 3.5 vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
mapreduce.jobhistory.address
funshion-hadoop194:10020
mapreduce.jobhistory.webapp.address
funshion-hadoop194:19888
mapreduce.map.output.compress
true
mapreduce.map.output.compress.codec
com.hadoop.compression.lzo.LzoCodec
mapred.child.env
LD_LIBRARY_PATH=/usr/local/hadoop/lib/native
mapred.child.java.opts
-Xmx2048m
mapred.reduce.child.java.opts
-Xmx2048m
mapred.map.child.java.opts
-Xmx2048m
mapred.remote.os
Linux
Remote MapReduce framework's OS, can be either Linux or Windows
-- 注意:1、以mapred.开头的形式去指定属性名,都是一种过时的形式,建议使用mapreduce.
比如:mapred.compress.map.output 属性应该对应修改成:mapreduce.map.output.compress
具体可以查阅:http://hadoop.apache.org/docs/r2 ... /mapred-default.xml 文件,
当然,好像还有少量属性名是没有修改的,比如:mapred.child.java.opts、mapred.child.env
-- 注意:/usr/local/hadoop/lib/native 目录下有如下内容:
[hadoop@funshion-hadoop194 sbin]$ ls -l /usr/local/hadoop/lib/native
total 12732
-rw-r--r-- 1 hadoop hadoop 2850900 Jun 20 19:22 hadoop-common-2.4.0.jar
-rw-r--r-- 1 hadoop hadoop 1509411 Jun 20 19:22 hadoop-common-2.4.0-tests.jar
-rw-r--r-- 1 hadoop hadoop 178559 Jun 20 18:38 hadoop-lzo-0.4.20-SNAPSHOT.jar
-rw-r--r-- 1 hadoop hadoop 1407039 Jun 20 19:25 hadoop-yarn-common-2.4.0.jar
-rw-r--r-- 1 hadoop hadoop 106198 Jun 20 18:37 libgplcompression.a
-rw-r--r-- 1 hadoop hadoop 1124 Jun 20 18:37 libgplcompression.la
-rwxr-xr-x 1 hadoop hadoop 69347 Jun 20 18:37 libgplcompression.so
-rwxr-xr-x 1 hadoop hadoop 69347 Jun 20 18:37 libgplcompression.so.0
-rwxr-xr-x 1 hadoop hadoop 69347 Jun 20 18:37 libgplcompression.so.0.0.0
-rw-r--r-- 1 hadoop hadoop 983042 Jun 20 18:10 libhadoop.a
-rw-r--r-- 1 hadoop hadoop 1487284 Jun 20 18:10 libhadooppipes.a
lrwxrwxrwx 1 hadoop hadoop 18 Jun 20 18:27 libhadoop.so -> libhadoop.so.1.0.0
-rwxr-xr-x 1 hadoop hadoop 586664 Jun 20 18:10 libhadoop.so.1.0.0
-rw-r--r-- 1 hadoop hadoop 582040 Jun 20 18:10 libhadooputils.a
-rw-r--r-- 1 hadoop hadoop 298178 Jun 20 18:10 libhdfs.a
lrwxrwxrwx 1 hadoop hadoop 16 Jun 20 18:27 libhdfs.so -> libhdfs.so.0.0.0
-rwxr-xr-x 1 hadoop hadoop 200026 Jun 20 18:10 libhdfs.so.0.0.0
-rw-r--r-- 1 hadoop hadoop 906318 Jun 20 19:17 liblzo2.a
-rwxr-xr-x 1 hadoop hadoop 929 Jun 20 19:17 liblzo2.la
-rwxr-xr-x 1 hadoop hadoop 562376 Jun 20 19:17 liblzo2.so
-rwxr-xr-x 1 hadoop hadoop 562376 Jun 20 19:17 liblzo2.so.2
-rwxr-xr-x 1 hadoop hadoop 562376 Jun 20 19:17 liblzo2.so.2.0.0
-- Step 3.6 vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
yarn.resourcemanager.connect.retry-interval.ms
60000
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.cluster-id
rm-cluster
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.ha.id
rm1
yarn.resourcemanager.hostname.rm1
funshion-hadoop194
yarn.resourcemanager.hostname.rm2
funshion-hadoop195
yarn.resourcemanager.recovery.enabled
true
yarn.resourcemanager.store.class
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
yarn.resourcemanager.zk-address
funshion-hadoop194:2181,funshion-hadoop195:2181,funshion-hadoop196:2181,funshion-hadoop197:2181,funshion-hadoop198:2181
yarn.resourcemanager.address.rm1
${yarn.resourcemanager.hostname.rm1}:23140
yarn.resourcemanager.scheduler.address.rm1
${yarn.resourcemanager.hostname.rm1}:23130
yarn.resourcemanager.webapp.https.address.rm1
${yarn.resourcemanager.hostname.rm1}:23189
yarn.resourcemanager.webapp.address.rm1
${yarn.resourcemanager.hostname.rm1}:23188
yarn.resourcemanager.resource-tracker.address.rm1
${yarn.resourcemanager.hostname.rm1}:23125
yarn.resourcemanager.admin.address.rm1
${yarn.resourcemanager.hostname.rm1}:23141
yarn.resourcemanager.address.rm2
${yarn.resourcemanager.hostname.rm2}:23140
yarn.resourcemanager.scheduler.address.rm2
${yarn.resourcemanager.hostname.rm2}:23130
yarn.resourcemanager.webapp.https.address.rm2
${yarn.resourcemanager.hostname.rm2}:23189
yarn.resourcemanager.webapp.address.rm2
${yarn.resourcemanager.hostname.rm2}:23188
yarn.resourcemanager.resource-tracker.address.rm2
${yarn.resourcemanager.hostname.rm2}:23125
yarn.resourcemanager.admin.address.rm2
${yarn.resourcemanager.hostname.rm2}:23141
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.scheduler.fair.allocation.file
${yarn.home.dir}/etc/hadoop/fairscheduler.xml
yarn.nodemanager.local-dirs
/home/hadoop/logs/yarn_local
yarn.nodemanager.log-dirs
/home/hadoop/logs/yarn_log
yarn.nodemanager.remote-app-log-dir
/home/hadoop/logs/yarn_remotelog
yarn.app.mapreduce.am.staging-dir
/home/hadoop/logs/yarn_userstag
mapreduce.jobhistory.intermediate-done-dir
/home/hadoop/logs/yarn_intermediatedone
mapreduce.jobhistory.done-dir
/var/lib/hadoop/dfs/yarn_done
yarn.log-aggregation-enable
true
yarn.nodemanager.resource.memory-mb
2048
yarn.nodemanager.vmem-pmem-ratio
4.2
yarn.nodemanager.resource.cpu-vcores
2
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
Classpath for typical applications.
yarn.application.classpath
$HADOOP_HOME/etc/hadoop,
$HADOOP_HOME/share/hadoop/common/*,
$HADOOP_HOME/share/hadoop/common/lib/*,
$HADOOP_HOME/share/hadoop/hdfs/*,
$HADOOP_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_HOME/share/hadoop/mapreduce/*,
$HADOOP_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_HOME/share/hadoop/yarn/*,
$HADOOP_HOME/share/hadoop/yarn/lib/*
-- 注意:两个namenode,funshion-hadoop194直接用上面的配置,
-- funshion-hadoop195的话,只需修改一个地方:修改yarn.resourcemanager.ha.id 属性值为 rm2
|
|
|