dfs.namenode.name.dir
Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.
If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.namenode.hosts /dfs.namenode.hosts.exclude
List of permitted/excluded DataNodes.
If necessary, use these files to control the list of allowable datanodes.
dfs.blocksize
268435456
HDFS blocksize of 256MB for large file-systems.
dfs.namenode.handler.count
100
More NameNode server threads to handle RPCs from large number of DataNodes.
DataNode配置:
Parameter
Value
Notes
dfs.datanode.data.dir
Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.
If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.
conf/yarn-site.xml
ResourceManager和NodeManager配置:
Parameter
Value
Notes
yarn.acl.enable
true /false
Enable ACLs? Defaults to false.
yarn.admin.acl
Admin ACL
ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access.
yarn.log-aggregation-enable false
Configuration to enable or disable log aggregation
ResourceManager配置:
Parameter
Value
Notes
yarn.resourcemanager.address
ResourceManager host:port for clients to submit jobs. host:port
yarn.resourcemanager.scheduler.address
ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources. host:port
yarn.resourcemanager.resource-tracker.address
ResourceManager host:port for NodeManagers. host:port
yarn.resourcemanager.admin.address
ResourceManager host:port for administrative commands. host:port
yarn.resourcemanager.webapp.address
ResourceManager web-ui host:port. host:port
yarn.resourcemanager.scheduler.class
ResourceManager Scheduler class.
CapacityScheduler (recommended), FairScheduler(also recommended), or FifoScheduler
yarn.scheduler.minimum-allocation-mb
Minimum limit of memory to allocate to each container request at the Resource Manager.
In MBs
yarn.scheduler.maximum-allocation-mb
Maximum limit of memory to allocate to each container request at the Resource Manager.
In MBs
List of permitted/excluded NodeManagers.
If necessary, use these files to control the list of allowable NodeManagers.
NodeManager配置:
Parameter
Value
Notes
yarn.nodemanager.resource.memory-mb
Resource i.e. available physical memory, in MB, for givenNodeManager
Defines total available resources on the NodeManager to be made available to running containers
yarn.nodemanager.vmem-pmem-ratio
Maximum ratio by which virtual memory usage of tasks may exceed physical memory
The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
yarn.nodemanager.local-dirs
Comma-separated list of paths on the local filesystem where intermediate data is written.
Multiple paths help spread disk i/o.
yarn.nodemanager.log-dirs
Comma-separated list of paths on the local filesystem where logs are written.
Multiple paths help spread disk i/o.
yarn.nodemanager.log.retain-seconds 10800
Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
yarn.nodemanager.remote-app-log-dir /logs
HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
yarn.nodemanager.remote-app-log-dir-suffix logs
Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
yarn.nodemanager.aux-services
mapreduce_shuffle
Shuffle service that needs to be set for Map Reduce applications.
运行历史配置:
Parameter
Value
Notes
yarn.log-aggregation.retain-seconds -1
How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.
yarn.log-aggregation.retain-check-interval-seconds -1
Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.
conf/mapred-site.xml
MapReduce应用配置:
Parameter
Value
Notes
mapreduce.framework.name
yarn
Execution framework set to Hadoop YARN.
mapreduce.map.memory.mb
1536
Larger resource limit for maps.
mapreduce.map.java.opts
-Xmx1024M
Larger heap-size for child jvms of maps.
mapreduce.reduce.memory.mb
3072
Larger resource limit for reduces.
mapreduce.reduce.java.opts
-Xmx2560M
Larger heap-size for child jvms of reduces.
mapreduce.task.io.sort.mb
512
Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.factor
100
More streams merged at once while sorting files.
mapreduce.reduce.shuffle.parallelcopies
50
Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.
MapReduce 执行历史服务配置:
Parameter
Value
Notes
mapreduce.jobhistory.address
MapReduce JobHistory Server host:port
Default port is 10020.
mapreduce.jobhistory.webapp.address
MapReduce JobHistory Server Web UIhost:port
Default port is 19888.
mapreduce.jobhistory.intermediate-done-dir
/mr-history/tmp
Directory where history files are written by MapReduce jobs.
mapreduce.jobhistory.done-dir
/mr-history/done
Directory where history files are managed by the MR JobHistory Server. Hadoop机架感知
HDFS和YARN服务可机架感知的
NameNode 和ResourceManager通过调用api来获取集群中每个从节点的机架信息。
api以dns名称(或ip)作为一个机架id
这个模块也是可配置的,通过topology.node.switch.mapping.impl来配置,可以通过命令行参数topology.script.file.name来配置,如果topology.script.file.name没有配置那么默认其ip为机架id。