hadoop2.3配置文件 - Hadoop - 运维网 - Powered by Discuz! Archiver

论坛 › Hadoop › hadoop2.3配置文件

Matthewl 发表于 2018-10-31 08:54:20

hadoop2.3配置文件

Hadoop集群配置（最全面总结）http://blog.csdn.net/hguisu/article/details/7237395
cdh 下载 hadoop http://archive-primary.cloudera.com/cdh5/cdh/5/

配置过程详述
　　大家从官网下载的apache hadoop2.3.0的代码是32位操作系统下编译的，不能使用64位的jdk。我下面部署的hadoop代码是自己的64位机器上重新编译过的。服务器都是64位的，本配置尽量模拟真实环境。大家可以以32位的操作系统做练习，这是没关系的。
　　在这里我们选用4台机器进行示范，各台机器的职责如下表格所示
hadoop1hadoo2hadoop3hadoop4是NameNode吗?是，属集群c1是，属集群c1是，属集群c2是，属集群c2是DataNode吗？是是是是是JournalNode吗？是是是不是是ZooKeeper吗？是是是不是是ZKFC吗?是是是是　　配置文件一共包括6个，分别是hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml和slaves。除了hdfs-site.xml文件在不同集群配置不同外，其余文件在四个节点的配置是完全一样的，可以复制。

文件hadoop-env.sh
　　就是修改这一行内容，修改后的结果如下
　　export JAVA_HOME=/usr/java/jdk1.7.0_45
　　【这里的JAVA_HOME的值是jdk的安装路径。如果你那里不一样，请修改为自己的地址】
文件core-site.xml
　　
　　
　　fs.defaultFS
　　hdfs://cluster1
　　
　　【这里的值指的是默认的HDFS路径。当有多个HDFS集群同时工作时，用户如果不写集群名称，那么默认使用哪个哪？在这里指定！该值来自于hdfs-site.xml中的配置】
　　
　　ha.zookeeper.quorum
　　hadoop1:2181,hadoop2:2181,hadoop3:2181
　　
　　【这里是ZooKeeper集群的地址和端口。注意，数量一定是奇数，且不少于三个节点】
　　fs.defaultFS
　　hdfs://cluster1
　　

　　

　　ha.zookeeper.quorum
　　hadoop1:2181,hadoop2:2181,hadoop3:2181
　　

　　该文件只配置在hadoop1和hadoop2上。
　　
　　
　　dfs.replication
　　2
　　
　　【指定DataNode存储block的副本数量。默认值是3个，我们现在有4个DataNode，该值不大于4即可。】
　　
　　dfs.nameservices
　　cluster1,cluster2
　　
　　【使用federation时，使用了2个HDFS集群。这里抽象出两个NameService实际上就是给这2个HDFS集群起了个别名。名字可以随便起，相互不重复即可】
　　
　　dfs.ha.namenodes.cluster1
　　hadoop1,hadoop2
　　
　　【指定NameService是cluster1时的namenode有哪些，这里的值也是逻辑名称，名字随便起，相互不重复即可】
　　
　　dfs.namenode.rpc-address.cluster1.hadoop101
　　hadoop101:9000
　　
　　【指定hadoop1的RPC地址】
　　
　　dfs.namenode.http-address.cluster1.hadoop101
　　hadoop101:50070
　　
　　【指定hadoop1的http地址】
　　
　　dfs.namenode.rpc-address.cluster1.hadoop102
　　hadoop2:9000
　　
　　【指定hadoop2的RPC地址】
　　
　　dfs.namenode.http-address.cluster1.hadoop102
　　hadoop102:50070
　　
　　【指定hadoop2的http地址】
　　
　　dfs.namenode.shared.edits.dir
　　qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/cluster1
　　
　　【指定cluster1的两个NameNode共享edits文件目录时，使用的JournalNode集群信息】
　　
　　dfs.ha.automatic-failover.enabled.cluster1
　　true
　　
　　【指定cluster1是否启动自动故障恢复，即当NameNode出故障时，是否自动切换到另一台NameNode】
　　
　　dfs.client.failover.proxy.provider.cluster1
　　org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
　　
　　【指定cluster1出故障时，哪个实现类负责执行故障切换】
　　
　　dfs.ha.namenodes.cluster2
　　hadoop3,hadoop4
　　
　　【指定NameService是cluster2时，两个NameNode是谁，这里是逻辑名称，不重复即可。以下配置与cluster1几乎全部相似，不再添加注释】
　　dfs.journalnode.edits.dir
　　/usr/local/hadoop/tmp/journal
　　

　　dfs.namenode.name.dir
　　file:///home/grid/hadoop/hdfs/name
　　

　　自动切换
　　dfs.ha.automatic-failover.enabled.cluster2
　　true
　　

　　

　　dfs.client.failover.proxy.provider.cluster2
　　org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
　　

　　
　　dfs.ha.fencing.methods
　　sshfence
　　

　　

　　dfs.ha.fencing.ssh.private-key-files
　　/home/grid/.ssh/id_rsa
　　

　　yarn-site.xml文件
　　yarn.resourcemanager.hostname
　　hadoop1
　　
　　
　　The address of the applications manager interface in the RM.
　　yarn.resourcemanager.address
　　${yarn.resourcemanager.hostname}:8032
　　
　　
　　The address of the scheduler interface.
　　yarn.resourcemanager.scheduler.address
　　${yarn.resourcemanager.hostname}:8030
　　
　　
　　The http address of the RM web application.
　　yarn.resourcemanager.webapp.address
　　${yarn.resourcemanager.hostname}:8088
　　
　　
　　The https adddress of the RM web application.
　　yarn.resourcemanager.webapp.https.address
　　${yarn.resourcemanager.hostname}:8090
　　
　　
　　yarn.resourcemanager.resource-tracker.address
　　${yarn.resourcemanager.hostname}:8031
　　
　　
　　The address of the RM admin interface.
　　yarn.resourcemanager.admin.address
　　${yarn.resourcemanager.hostname}:8033
　　
　　

　　The>　　yarn.resourcemanager.scheduler.class
　　org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
　　
　　
　　fair-scheduler conf location
　　yarn.scheduler.fair.allocation.file
　　${yarn.home.dir}/etc/hadoop/fairscheduler.xml
　　
　　
　　List of directories to store localized files in. An application's localized file directory will be found in:
　　${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
　　Individual containers' work directories, called container_${contid}, will
　　be subdirectories of this.
　　
　　yarn.nodemanager.local-dirs
　　/home/dongxicheng/hadoop/yarn/local
　　
　　
　　Whether to enable log aggregation
　　yarn.log-aggregation-enable
　　true
　　
　　
　　Where to aggregate logs to.
　　yarn.nodemanager.remote-app-log-dir
　　/tmp/logs
　　
　　
　　Amount of physical memory, in MB, that can be allocated for containers.
　　yarn.nodemanager.resource.memory-mb
　　30720
　　
　　
　　Number of CPU cores that can be allocated for containers.
　　yarn.nodemanager.resource.cpu-vcores
　　12
　　
　　
　　the valid service name should only contain a-zA-Z0-9_ and can not start with numbers
　　yarn.nodemanager.aux-services
　　mapreduce_shuffle
　　
　　fairscheduler.xml文件
　　

　　
　　102400 mb, 50 vcores
　　153600 mb, 100 vcores
　　200
　　300
　　1.0
　　root,yarn,search,hdfs
　　
　　
　　102400 mb, 30 vcores
　　153600 mb, 50 vcores
　　
　　
　　102400 mb, 30 vcores
　　153600 mb, 50 vcores
　　
　　

　　hdfs-site.xml文件
　　dfs.replication
　　2
　　
　　
　　dfs.nameservices
　　cluster1,cluster2
　　
　　
　　dfs.ha.namenodes.cluster1
　　hadoop1,hadoop2
　　
　　
　　dfs.namenode.rpc-address.cluster1.hadoop1
　　hadoop1:9000
　　
　　
　　dfs.namenode.http-address.cluster1.hadoop1
　　hadoop1:50070
　　
　　
　　dfs.namenode.rpc-address.cluster1.hadoop2
　　hadoop2:9000
　　
　　
　　dfs.namenode.http-address.cluster1.hadoop2
　　hadoop2:50070
　　
　　
　　dfs.namenode.shared.edits.dir
　　qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/cluster1
　　
　　
　　dfs.ha.automatic-failover.enabled.cluster1
　　true
　　
　　
　　dfs.client.failover.proxy.provider.cluster1
　　org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
　　
　　
　　dfs.ha.namenodes.cluster2
　　hadoop3,hadoop4
　　
　　
　　dfs.namenode.rpc-address.cluster2.hadoop3
　　hadoop3:9000
　　
　　
　　dfs.namenode.http-address.cluster2.hadoop3
　　hadoop3:50070
　　
　　
　　dfs.namenode.rpc-address.cluster2.hadoop4
　　hadoop4:9000
　　
　　
　　dfs.namenode.http-address.cluster2.hadoop4
　　hadoop4:50070
　　
　　
　　

　　dfs.ha.automatic-failover.enabled.cluster2
　　true
　　
　　
　　dfs.client.failover.proxy.provider.cluster2
　　org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
　　
　　
　　dfs.journalnode.edits.dir
　　/usr/local/hadoop/tmp/journal
　　
　　
　　dfs.ha.fencing.methods
　　sshfence
　　
　　
　　dfs.ha.fencing.ssh.private-key-files
　　/home/grid/.ssh/id_rsa
　　
　　
　　dfs.namenode.name.dir
　　file:///home/grid/hadoop/hdfs/name
　　
　　
　　dfs.datanode.data.dir
　　file:///home/grid/hadoop/hdfs/data
　　
　　core-site.xml文件
　　fs.defaultFS
　　hdfs://cluster1
　　

　　

　　ha.zookeeper.quorum
　　hadoop1:2181,hadoop2:2181,hadoop3:2181
　　

页: [1]

查看完整版本: hadoop2.3配置文件