yarn.nodemanager.health-checker.script.path
Node health script
Script to check for node's health status.
yarn.nodemanager.health-checker.script.opts
Node health script options
Options for script to check for node's health status.
yarn.nodemanager.health-checker.script.interval-ms
Node health script interval
Time interval for running health script.
yarn.nodemanager.health-checker.script.timeout-ms
Node health script timeout interval
Timeout for health script execution.
当一些物理磁盘出现坏道时监控程序不会提示错误。NodeManager 有能力对物理磁盘做周期性检测(特别是nodemanager-local-dirs and nodemanager-log-dirs)当目录损坏数达到配置的阀值(yarn.nodemanager.disk-health-checker.min-healthy-disks配置的)之后整个节点就会被标记为不正常的。同时这些信息也会上报给资源管理器(resource manager),检测脚本也会检测启动盘。
dfs.namenode.secondary.keytab.file /etc/security/keytab/sn.service.keytab
Kerberos keytab file for the NameNode.
dfs.namenode.secondary.kerberos.principal
sn/_HOST@REALM.TLD
Kerberos principal name for the Secondary NameNode.
dfs.namenode.secondary.kerberos.https.principal
host/_HOST@REALM.TLD
HTTPS Kerberos principal name for the Secondary NameNode.
DataNode配置:
Parameter
Value
Notes
dfs.datanode.data.dir.perm
700
dfs.datanode.address 0.0.0.0:2003
dfs.datanode.https.address 0.0.0.0:2005
dfs.datanode.keytab.file /etc/security/keytab/dn.service.keytab
Kerberos keytab file for the DataNode.
dfs.datanode.kerberos.principal
dn/_HOST@REALM.TLD
Kerberos principal name for the DataNode.
dfs.datanode.kerberos.https.principal
host/_HOST@REALM.TLD
HTTPS Kerberos principal name for the DataNode. conf/yarn-site.xml:
WebAppProxy:
WebAppProxy在应用和用户之间提供了一个web输出,如果是在安全模式下那么当用户不安全访问的时候就会被警告,跟普通的web应用一样。
Parameter
Value
Notes
yarn.web-proxy.address
WebAppProxy host:port for proxy to AM web apps.
host:port if this is the same as yarn.resourcemanager.webapp.address or it is not defined then theResourceManager will run the proxy otherwise a standalone proxy server will need to be launched.
yarn.web-proxy.keytab /etc/security/keytab/web-app.service.keytab
Kerberos keytab file for the WebAppProxy.
yarn.web-proxy.principal
wap/_HOST@REALM.TLD
Kerberos principal name for the WebAppProxy.
LinuxContainerExecutor:
YARN框架使用的ContainerExecutor 定义了多少个容器被启动和控制。
如下在Hadoop YARN是也是有效的:
ContainerExecutor
Description
DefaultContainerExecutor
The default executor which YARN uses to manage container execution. The container process has the same Unix user as the NodeManager.
LinuxContainerExecutor
Supported only on GNU/Linux, this executor runs the containers as the user who submitted the application. It requires all user accounts to be created on the cluster nodes where the containers are launched. It uses a setuid executable that is included in the Hadoop distribution. The NodeManager uses this executable to launch and kill containers. The setuid executable switches to the user who has submitted the application and launches or kills the containers. For maximum security, this executor sets up restricted permissions and user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files etc. Particularly note that, because of this, except the application owner and NodeManager, no other user can access any of the local files/directories including those localized as part of the distributed cache.
构建LinuxContainerExecutor 执行如下脚本:
$ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/
通过 -Dcontainer-executor.conf.dir传过来的路径集群节点上必须有且是本地的路径,执行文件必须在$HADOOP_YARN_HOME/bin中有。执行文件必须有权限:6050 or --Sr-s--- ,NodeManager 的unix用户必须同组,这个组必须是个特殊的组,如果其他应用程序具有这个组的权限那么他将是不安全的,这个组的名称需要在 yarn.nodemanager.linux-container-executor.group 属性中配置涉及到conf/yarn-site.xml and conf/container-executor.cfg两个文件。
如:NodeManager 的启动用户为yarn 为hadoop组,users组中有如下两个用户yarn 和alice(应用程序提交者) 同时alice 不属于hadoop组如上所述那么setuid/setgid 执行文件必须设置权限为 6050 or --Sr-s--- ,yarn 用户和hadoop 组(这样alice 就不能执行了)。
LinuxTaskController 需要的目录 yarn.nodemanager.local-dirs andyarn.nodemanager.log-dirs他们的权限设置为755 。
conf/container-executor.cfg:
执行文件需要一个配置文件container-executor.cfg上面mvn提到的,此文件必须为运行NodeManager 的用户所有(如上面的yarn ),任意组那么权限为:0400 or r--------.
执行文件需要下属参数在conf/container-executor.cfg配置,以key-value对出现,并且一行一个。
Parameter
Value
Notes
yarn.nodemanager.linux-container-executor.group hadoop
Unix group of the NodeManager. The group owner of the container-executor binary should be this group. Should be same as the value with which the NodeManager is configured. This configuration is required for validating the secure access of the container-executor binary.
banned.users
hfds,yarn,mapred,bin
Banned users.
allowed.system.users
foo,bar
Allowed system users.
min.user.id
1000
Prevent other super-users.
LinuxContainerExecutor中涉及到的本地文件系统权限如下:
Filesystem
Path
User:Group
Permissions
local
container-executor
root:hadoop
--Sr-s---
local
conf/container-executor.cfg
root:hadoop
r--------
local
yarn.nodemanager.local-dirs
yarn:hadoop
drwxr-xr-x
local
yarn.nodemanager.log-dirs
yarn:hadoop
drwxr-xr-x
ResourceManager配置:
Parameter
Value
Notes
yarn.resourcemanager.keytab /etc/security/keytab/rm.service.keytab
Kerberos keytab file for the ResourceManager.
yarn.resourcemanager.principal
rm/_HOST@REALM.TLD
Kerberos principal name for the ResourceManager.
NodeManager配置:
Parameter
Value
Notes
yarn.nodemanager.keytab /etc/security/keytab/nm.service.keytab
Kerberos keytab file for the NodeManager.
yarn.nodemanager.principal
nm/_HOST@REALM.TLD
Kerberos principal name for the NodeManager.
yarn.nodemanager.container-executor.class
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
Use LinuxContainerExecutor.
yarn.nodemanager.linux-container-executor.group hadoop
Unix group of the NodeManager. conf/mapred-site.xml
MapReduce JobHistory Server配置:
Parameter
Value
Notes
mapreduce.jobhistory.address
MapReduce JobHistory Server host:port
Default port is 10020.
mapreduce.jobhistory.keytab /etc/security/keytab/jhs.service.keytab
Kerberos keytab file for the MapReduce JobHistory Server.
mapreduce.jobhistory.principal
jhs/_HOST@REALM.TLD
Kerberos principal name for the MapReduce JobHistory Server. 操作hadoop集群
一旦配置完成之后就把所有HADOOP_CONF_DIR 里面的文件拷贝到其他节点上
此章节会说明不同的unix用户启动不同的hadoop服务,采用的unix系统用户和用户组 hadoop启动
启动hadoop集群你需要启动HDFS and YARN 集群
hdfs用户格式hadoop文件系统执行如下命令: