install hadoop-2.5 without HDFS HA /Federation

cencenhai · 发表于 2016-12-11 10:20:06

　　I. installation mode
　　same as hadoop 1.x ,there are several mode to install hadoop:
　　1.standalone
　　just run it on one machine,includes running of mapreduce.
　　2.pseudo
　　setup it with hdfs mode,and this case contains two types:
　　 a.run hdfs only
　　in this case,mapreds also run in local mode ,yes ,you can see the job name called as job_localxxxxxx
　　b.run hdfs with yarn
　　yes ,this is same as the distributed mode
　　3.distributed mode/cluster mode
　　compare to item 2,this item only has some more configures and more than one nodes.
　　II.configures for cluster mode

file	property	value	default val	summary
core-site.xml	hadoop.tmp.dir	/usr/local/hadoop/data-2.5.1/tmp	/tmp/hadoop-${user.name}	path to a tmp dir, some sub dirs will be 　　filecache,usercache,nmPrivate.so thisdir shoult not set todir 'tmp' for productenvironment;
	fs.defaultFS	hdfs://host1:9000	file:///	the name of the default file system.this will determine the installation mode ;the correspondent deprecated one is: fs.default.name; The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
hdfs-site.xml	dfs.nameservices	hadoop-cluster1		Comma-separated list of nameservices.here is single NN only but HA
	dfs.namenode.secondary.http-address	host1:50090	0.0.0.0:50090	The secondary namenode http server address and port.
	dfs.namenode.name.dir	file:///usr/local/hadoop/data-2.5.1/dfs/name	file://${hadoop.tmp.dir}/dfs/name	Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
	dfs.datanode.data.dir	file:///usr/local/hadoop/data-2.5.1/dfs/data	file://${hadoop.tmp.dir}/dfs/data	Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
	dfs.replication	1	3	the replication factor to assign data blocks
	dfs.webhdfs.enabled	true	true	Enable WebHDFS (REST API) in Namenodes and Datanodes.
yarn-site.xml	yarn.nodemanager.aux-services	mapreduce_shuffle	the auxiliary service name	the valid service name should only contain a-zA-Z0-9_ and can not start with numbers
	yarn.resourcemanager.address	host1:8032	${yarn.resourcemanager.hostname}:8032	The address of the applications manager interface in the RM
	yarn.resourcemanager.scheduler.address	host1:8030	${yarn.resourcemanager.hostname}:8030	the scheduler address of RM
	yarn.resourcemanager.resource-tracker.address	host1:8031	${yarn.resourcemanager.hostname}:8031
	yarn.resourcemanager.admin.address	host1:8033	${yarn.resourcemanager.hostname}:8033	admin addr
	yarn.resourcemanager.webapp.address	host1:50030	${yarn.resourcemanager.hostname}:8088	the webp ui addr for RM ;here is set to job tracker addr that same as hadoop 1.x
mapred-site.xml	mapreduce.framework.name	yarn	local	The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.
	mapreduce.jobhistory.address	host1:10020	0.0.0.0:10020	MapReduce JobHistory Server IPC host:port
	mapreduce.jobhistory.webapp.address	host1:19888	0.0.0.0:19888	MapReduce JobHistory Server Web UI host:port

　　III.the results of running MR in yarn
　　below are logs from mapreduce run with pseudo mode:
　　hadoop@ubuntu:/usr/local/hadoop/hadoop-2.5.1$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount wc wc-out
　　14/11/05 18:19:23 INFO client.RMProxy: Connecting to ResourceManager at namenode/192.168.1.25:8032
　　14/11/05 18:19:24 INFO input.FileInputFormat: Total input paths to process : 22
　　14/11/05 18:19:24 INFO mapreduce.JobSubmitter: number of splits:22
　　14/11/05 18:19:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415182439385_0001
　　14/11/05 18:19:25 INFO impl.YarnClientImpl: Submitted application application_1415182439385_0001
　　14/11/05 18:19:25 INFO mapreduce.Job: The url to track the job: http://namenode:50030/proxy/application_1415182439385_0001/
　　14/11/05 18:19:25 INFO mapreduce.Job: Running job: job_1415182439385_0001
　　14/11/05 18:19:32 INFO mapreduce.Job: Job job_1415182439385_0001 running in uber mode : false
　　14/11/05 18:19:32 INFO mapreduce.Job: map 0% reduce 0%
　　14/11/05 18:19:44 INFO mapreduce.Job: map 9% reduce 0%
　　14/11/05 18:19:45 INFO mapreduce.Job: map 27% reduce 0%
　　14/11/05 18:19:54 INFO mapreduce.Job: map 32% reduce 0%
　　14/11/05 18:19:55 INFO mapreduce.Job: map 45% reduce 0%
　　14/11/05 18:19:56 INFO mapreduce.Job: map 50% reduce 0%
　　14/11/05 18:20:02 INFO mapreduce.Job: map 55% reduce 17%
　　14/11/05 18:20:03 INFO mapreduce.Job: map 59% reduce 17%
　　14/11/05 18:20:05 INFO mapreduce.Job: map 68% reduce 20%
　　14/11/05 18:20:06 INFO mapreduce.Job: map 73% reduce 20%
　　14/11/05 18:20:08 INFO mapreduce.Job: map 73% reduce 24%
　　14/11/05 18:20:11 INFO mapreduce.Job: map 77% reduce 24%
　　14/11/05 18:20:12 INFO mapreduce.Job: map 82% reduce 24%
　　14/11/05 18:20:13 INFO mapreduce.Job: map 91% reduce 24%
　　14/11/05 18:20:14 INFO mapreduce.Job: map 95% reduce 30%
　　14/11/05 18:20:16 INFO mapreduce.Job: map 100% reduce 30%
　　14/11/05 18:20:17 INFO mapreduce.Job: map 100% reduce 100%
　　14/11/05 18:20:18 INFO mapreduce.Job: Job job_1415182439385_0001 completed successfully
　　14/11/05 18:20:18 INFO mapreduce.Job: Counters: 49
　　File System Counters
　　FILE: Number of bytes read=54637
　　FILE: Number of bytes written=2338563
　　FILE: Number of read operations=0
　　FILE: Number of large read operations=0
　　FILE: Number of write operations=0
　　HDFS: Number of bytes read=59677
　　HDFS: Number of bytes written=28233
　　HDFS: Number of read operations=69
　　HDFS: Number of large read operations=0
　　HDFS: Number of write operations=2
　　Job Counters
　　Launched map tasks=22
　　Launched reduce tasks=1
　　Data-local map tasks=22
　　Total time spent by all maps in occupied slots (ms)=185554
　　Total time spent by all reduces in occupied slots (ms)=30206
　　Total time spent by all map tasks (ms)=185554
　　Total time spent by all reduce tasks (ms)=30206
　　Total vcore-seconds taken by all map tasks=185554
　　Total vcore-seconds taken by all reduce tasks=30206
　　Total megabyte-seconds taken by all map tasks=190007296
　　Total megabyte-seconds taken by all reduce tasks=30930944
　　Map-Reduce Framework
　　Map input records=1504
　　Map output records=5727
　　Map output bytes=77326
　　Map output materialized bytes=54763
　　Input split bytes=2498
　　Combine input records=5727
　　Combine output records=2838
　　Reduce input groups=1224
　　Reduce shuffle bytes=54763
　　Reduce input records=2838
　　Reduce output records=1224
　　Spilled Records=5676
　　Shuffled Maps =22
　　Failed Shuffles=0
　　Merged Map outputs=22
　　GC time elapsed (ms)=1707
　　CPU time spent (ms)=14500
　　Physical memory (bytes) snapshot=5178937344
　　Virtual memory (bytes) snapshot=22517506048
　　Total committed heap usage (bytes)=3882549248
　　Shuffle Errors
　　BAD_ID=0
　　CONNECTION=0
　　IO_ERROR=0
　　WRONG_LENGTH=0
　　WRONG_MAP=0
　　WRONG_REDUCE=0
　　File Input Format Counters
　　Bytes Read=57179
　　File Output Format Counters
　　Bytes Written=28233
　　FAQs
　　1.2014-01-22 09:38:20,733 INFO [AsyncDispatcher event handler] rmapp.RMAppImpl (RMAppImpl.java:transition(788)) - Application application_1390354688375_0001 failed 2 times due to AM Container for appattempt_1390354688375_0001_000002 exited with exitCode: 127 due to: Exception from container-launch:
　　 this maybe occur if you dont setup a JAVA_HOME in yarn-env.sh and hadoop-env.sh,and remember to restart yarn:)
　　2.occurs two jobs by running 'grep' example
　　it's normal!at first ,i think it's some wrong,but when i run wordcount again,the result shows one job only .so i think it's the nature of this example.
　　ref:
　　apache install hadoop 2

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] install hadoop-2.5 without HDFS HA /Federation

浏览过的版块

扫码加入运维网微信交流群