介绍hadoop中的hadoop和hdfs命令

blueice · 发表于 2017-12-16 20:32:22

　　有些hive安装文档提到了hdfs dfs -mkdir ，也就是说hdfs也是可以用的，但在2.8.0中已经不那么处理了，之所以还可以使用，是为了向下兼容.
　　本文简要介绍一下有关的命令，以便对hadoop的命令有一个大概的影响，并在想使用的时候能够知道从哪里可以获得帮助。
　　概述
　　在$HADOOP_HOME/bin下可以看到hadoop和hdfs的脚本。
　　hdfs的相当一部分的功能可以使用hdoop来替代（目前），但hdfs有自己的一些独有的功能。hadoop主要面向更广泛复杂的功能。
　　本文介绍hadoop,hdfs和yarn的命令，目的是为了给予自己留下一个大概的映像！
　　第一部分 hadoop命令
　　参见http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/CommandsManual.html
　　Usage: hadoop [--config confdir] [--loglevel loglevel] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]
GENERIC_OPTIONDescription中文说明-archives <comma separated list of archives>Specify comma separated archives to be unarchived on the compute machines. Applies only to job.为某个作业提交一串的压缩文件（以逗号分隔),目的是让作业加压，并在计算节点计算-conf <configuration file>Specify an application configuration file.设定配置文件-D <property>=<value>Use value for given property.让hadoop命令使用特性属性值-files <comma separated list of files>Specify comma separated files to be copied to the map reduce cluster. Applies only to job.设定逗号分隔的文件列表，这些文件被复制到mr几圈。只针对job-fs <file:///> or <hdfs://namenode:port>Specify default filesystem URL to use. Overrides ‘fs.defaultFS’ property from configurations.设定hadoop命令需要用到的文件系统，会覆盖fs.defaultFS的配置-jt <local> or <resourcemanager:port>Specify a ResourceManager. Applies only to job.设定一个资源管理器。只针对job-libjars <comma seperated list of jars>Specify comma separated jar files to include in the> 　　一般情况下，以上的通用选项可以不需要用到。
　　下面介绍命令command
　　archive
　　checknative
　　classpath
　　credential
　　distcp
　　fs
　　jar
　　key
　　trace
　　version
　　CLASSNAME
　　1.1 archive
　　创建一个hadoop压缩文件，详细的可以参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-archives/HadoopArchives.html
　　hadoop的压缩文件不同于普通的压缩文件，是特有格式（不能使用rar,zip,tar之类的解压缩).后缀是har.压缩目录包含元数据和数据。
　　压缩的目的，主要是为了减少可用空间，和传输的数据量。
　　注：hadoop官方文档没有过多的解释。如此是否意味着har文件仅仅为mapreduce服务？如果我们不用mapreduce,那么是否可以不关注这个。
　　创建压缩文件
　　hadoop archive -archiveName name -p <parent> [-r <replication factor>] <src>* <dest>
　　例如
　　把目录/foor/bar下的内容压缩为zoo.har并存储在/outputdir下
　　hadoop archive -archiveName zoo.har -p /foo/bar -r 3 /outputdir
　　把目录/user/haoop/dir1和/user/hadoop/dir2下的文件压缩为foo.har，并存储到/user/zoo中
　　hadoop archive -archiveName foo.har -p /user/ hadoop/dir1 hadoop/dir2 /user/zoo
　　解压
　　把文件foo.har中的dir目录解压到 /user/zoo/newdir下
　　hdfs dfs -cp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir
　　以并行方式解压
　　hadoop distcp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir
　　查看解压文件
　　hdfs dfs -ls -R har:///user/zoo/foo.har/
　　1.2 checknative
　　hadoop checknative [-a] [-h]
　　-a 检查所有的库
　　-h 显示帮助
　　检查hadoop的原生代码，一般人用不到。具体可以参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/NativeLibraries.html
　　1.3>

　　hadoop>　　打印hadoop jar或者库的类路径
　　1.4 credential
　　hadoop credential <subcommand> [options]
　　管理凭证供应商的凭证、密码和secret(有关秘密信息）。
　　查看帮助
　　hadoop credential -list
　　注：暂时没有涉略，大概是用于有关安全认证的。
　　1.5 distcp
　　功能:复制文件或者目录
　　详细参考： http://hadoop.apache.org/docs/r2.8.0/hadoop-distcp/DistCp.html
　　distcp就是distributed copy的缩写（望文生义),主要用于集群内/集群之间复制文件。需要使用到mapreduce。
　　原文用了相当的篇幅介绍这个功能，估计这个功能有不少用处，毕竟搬迁巨量文件还是挺复杂的，值得专门写这个工具。
　　简单复制1

[hadoop@bigdata ~]$ hadoop distcp /tmp/input/hadoop /tmp/input/haoop1

　　17/06/07 15:57:53 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/07 15:57:53 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
　　17/06/07 15:57:54 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/tmp/input/hadoop], targetPath=/tmp/input/haoop1, targetPathExists=false, filtersFile='null'}
　　17/06/07 15:57:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
　　17/06/07 15:57:56 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 31; dirCnt = 1
　　17/06/07 15:57:56 INFO tools.SimpleCopyListing: Build file listing completed.
　　17/06/07 15:57:56 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
　　17/06/07 15:57:56 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
　　17/06/07 15:57:57 INFO tools.DistCp: Number of paths in the copy list: 31
　　17/06/07 15:57:57 INFO tools.DistCp: Number of paths in the copy list: 31
　　17/06/07 15:57:58 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
　　17/06/07 15:57:59 INFO mapreduce.JobSubmitter: number of splits:20
　　17/06/07 15:58:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1496800112089_0001
　　17/06/07 15:58:01 INFO impl.YarnClientImpl: Submitted application application_1496800112089_0001
　　17/06/07 15:58:01 INFO mapreduce.Job: The url to track the job: http://bigdata.lzf:8099/proxy/application_1496800112089_0001/
　　17/06/07 15:58:01 INFO tools.DistCp: DistCp job-id: job_1496800112089_0001
　　17/06/07 15:58:01 INFO mapreduce.Job: Running job: job_1496800112089_0001
　　17/06/07 15:58:24 INFO mapreduce.Job: Job job_1496800112089_0001 running in uber mode : false
　　--注：后面太多，省略了
　　结果查看下（列出部分）
　　hadoop fs -ls /tmp/intput/hadoop1

[hadoop@bigdata ~]$ hadoop fs -ls /tmp/input/haoop1

　　17/06/07 16:05:58 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/07 16:05:58 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
　　Found 30 items
　　-rw-r--r-- 1 hadoop supergroup    4942 2017-06-07 15:59 /tmp/input/haoop1/capacity-scheduler.xml
　　-rw-r--r-- 1 hadoop supergroup    1335 2017-06-07 15:58 /tmp/input/haoop1/configuration.xsl
　　-rw-r--r-- 1 hadoop supergroup       318 2017-06-07 15:59 /tmp/input/haoop1/container-executor.cfg
　　-rw-r--r-- 1 hadoop supergroup    1443 2017-06-07 15:59 /tmp/input/haoop1/core-site.xml
　　-rw-r--r-- 1 hadoop supergroup    3804 2017-06-07 16:00 /tmp/input/haoop1/hadoop-env.cmd
　　-rw-r--r-- 1 hadoop supergroup    4755 2017-06-07 16:00 /tmp/input/haoop1/hadoop-env.sh
　　-rw-r--r-- 1 hadoop supergroup    2490 2017-06-07 15:58 /tmp/input/haoop1/hadoop-metrics.properties
　　-rw-r--r-- 1 hadoop supergroup    2598 2017-06-07 15:59 /tmp/input/haoop1/hadoop-metrics2.properties
　　-rw-r--r-- 1 hadoop supergroup    9683 2017-06-07 16:00 /tmp/input/haoop1/hadoop-policy.xml
　　-rw-r--r-- 1 hadoop supergroup    1527 2017-06-07 15:58 /tmp/input/haoop1/hdfs-site.xml
　　-rw-r--r-- 1 hadoop supergroup    1449 2017-06-07 15:59 /tmp/input/haoop1/httpfs-env.sh
　　-rw-r--r-- 1 hadoop supergroup    1657 2017-06-07 15:59 /tmp/input/haoop1/httpfs-log4j.properties
　　........
　　路径可以使用uri，例如  hadoop distcp hdfs://bigdata.lzf:9001/tmp/input/hadoop  hdfs://bigdata.lzf:9001/tmp/input/hadoop1
　　源可以是多个例如  hadoop distcp hdfs://bigdata.lzf:9001/tmp/input/hadoop hdfs://bigdata.lzf:9001/tmp/input/test hdfs://bigdata.lzf:9001/tmp/input/hadoop1
　　注意：复制的总是提示
　　17/06/07 16:09:52 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
　　17/06/07 16:09:52 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
　　这个不用管，通过8099的配置可以看到，使用的是 mapreduce.task.io.sort.mb,mapreduce.task.io.sort.factor
　　1.6 fs
　　这个是比较常用的一个命令，和hdfs dfs基本等价，但还是有一些区别。
　　http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/FileSystemShell.html
　　说明很详细，用法很简单。
　　·appendToFile
　　·cat
　　·checksum
　　·chgrp
　　·chmod
　　·chown
　　·copyFromLocal
　　·copyToLocal
　　·count
　　·cp
　　·createSnapshot
　　·deleteSnapshot
　　·df
　　·du
　　·dus
　　·expunge
　　·find
　　·get
　　·getfacl
　　·getfattr
　　·getmerge
　　·help
　　·ls
　　·lsr
　　·mkdir
　　·moveFromLocal
　　·moveToLocal
　　·mv
　　·put
　　·renameSnapshot
　　·rm
　　·rmdir
　　·rmr
　　·setfacl
　　·setfattr
　　·setrep
　　·stat
　　·tail
　　·test
　　·text
　　·touchz
　　·truncate
　　·usage
　　这些参数很容易阅读理解，和linux的常见文件系统命令基本一致。
　　这里介绍几个有意思，且常用的。
　　从本地文件系统复制数据到hadoop uri
　　hadoop fs -copyFromLocal <localsrc> URI
　　这个命令很多情况下等同于put，只不过前者只能在本地文件系统下用。
　　例如：

[hadoop@bigdata ~]$ hadoop fs -copyFromLocal start-hadoop.sh /log

　　17/06/07 17:10:21 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/07 17:10:21 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
　　--通过uri,强制覆盖

[hadoop@bigdata ~]$ hadoop fs -copyFromLocal -f start-hadoop.sh hdfs://bigdata.lzf:9001/log

　　17/06/07 17:12:03 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/07 17:12:03 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
　　复制uri中文件到本地copyToLocal
　　hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>
　　命令等同于get,只不过只能复制到本地中而已。
　　例如
　　hadoop fs -copyToLocal -f hdfs://bigdata.lzf:9001/log/start-hadoop.sh /home/hadoop/testdir
　　计数count
　　hadoop fs -count [-q] [-h] [-v] [-x] [-t [<storage type>]] [-u] <paths>
　　这个命令还是挺有用的。
　　Count the number of directories, files and bytes under the paths that match the specified file pattern. Get the quota and the usage. The output columns with -count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE, PATHNAME
　　计算目录，文件个数和字节数
　　例如：

[hadoop@bigdata ~]$ hadoop fs -count /tmp/input/hadoop

　　17/06/07 17:41:04 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/07 17:41:04 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
　　1 30 83564 /tmp/input/hadoop
　　通过这个命令，了解下存储的文件情况。
　　复制cp
　　Usage: hadoop fs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest>
　　目标必须是目录，源可以多个。
　　和distcp有点类似，不过这个只能在同个hadoop集群内？且distcp需要使用mapreduce
　　例如：
　　hadoop fs -cp /tmp/input/hadoop1/hadoop/*.* /tmp/input/hadoop
　　创建快照
　　http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html
　　主要功能是备份
　　删除快照
　　略
　　显示可用空间df
　　hadoop fs -df [-h] URI [URI ...]

[hadoop@bigdata ~]$ hadoop fs -df -h /

　　17/06/07 17:51:31 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/07 17:51:31 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library

　　Filesystem >　　hdfs://bigdata.lzf:9001 46.5 G 1.5 M 35.8 G 0%
　　计算目录字节大小du
　　hadoop fs -du [-s] [-h] [-x] URI [URI ...]
　　部分功能可以用count替代

[hadoop@bigdata ~]$ hadoop fs -du -h /

　　17/06/07 17:52:56 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/07 17:52:56 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
　　0    /input
　　63    /log
　　0    /test
　　1.0 M  /tmp
　　9    /warehouse

[hadoop@bigdata ~]$ hadoop fs -du -h -s /

　　17/06/07 17:53:19 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/07 17:53:19 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
　　1.0 M /

[hadoop@bigdata ~]$

　　参数基本同Linux的
　　清空回收站数据expunge
　　hadoop fs -expunge
　　永久删除过期的文件，并创建新的检查点。检查点比fs.trash.interval老的数据，会再下次的这个操作的时候清空。
　　查找find
　　hadoop fs -find <path> ... <expression> ...
　　查找根据文件名称查找，而不是文件内容。

[hadoop@bigdata ~]$ hadoop fs -find / -name hadoop -print

　　17/06/08 11:59:04 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/08 11:59:04 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
　　/tmp/hadoop-yarn/staging/hadoop
　　/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop
　　/tmp/hive/hadoop
　　/tmp/input/hadoop
　　/tmp/input/hadoop1/hadoop
　　或者使用iname(不考虑大小写)
　　hadoop fs -find / -iname hadoop -print

[hadoop@bigdata ~]$ hadoop fs -find / -name hadooP -print

　　17/06/08 12:00:59 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/08 12:00:59 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library

[hadoop@bigdata ~]$ hadoop fs -find / -iname hadooP -print

　　17/06/08 12:01:06 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
　　17/06/08 12:01:06 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
　　/tmp/hadoop-yarn/staging/hadoop
　　/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop
　　/tmp/hive/hadoop
　　/tmp/input/hadoop
　　/tmp/input/hadoop1/hadoop
　　下载文件到本地get
　　类似于copyToLocal.但有crc校验
　　hadoop fs -get [-ignorecrc] [-crc] [-p] [-f] <src> <localdst>
　　例如：
　　hadoop fs -get /tmp/input/hadoop/*.xml /home/hadoop/testdir/
　　查看文件或者目录属性 getfattr
　　hadoop fs -getfattr [-R] -n name | -d [-e en] <path>
　　-n name和 -d是互斥的，-d表示获取所有属性。-R表示循环获取； -e en 表示对获取的内容编码，en的可以取值是 “text”, “hex”, and “base64”.
　　例如
　　hadoop fs -getfattr -d /file
　　hadoop fs -getfattr -R -n user.myAttr /dir
　　从实际例子看，暂时不知道有什么特别用处。
　　合并文件getmerge
　　hadoop fs -getmerge -nl  /src  /opt/output.txt
　　hadoop fs -getmerge -nl  /src/file1.txt /src/file2.txt  /output.txt
　　例如：
　　hadoop fs -getmerge -nl /tmp/input/hadoop/hadoop-env.sh /tmp/input/hadoop/slaves /home/hadoop/testdir/merget-test.txt
　　注：目标是本地文件，不是uri文件
　　罗列文件列表ls
　　hadoop fs -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] <args>
　　mkdir hadoop fs -mkdir [-p] <paths> --创建目录
　　moveFromLocal  hadoop fs -moveFromLocal <localsrc> <dst>  --从本地上传，类似Put
　　集群内移动目录mv
　　hadoop fs -mv URI [URI ...] <dest>
　　源可以是多个。
　　例如  hadoop fs -mv /tmp/input/hadoop1/hadoop/slaves /tmp/input/hadoop1/
　　上传文件put
　　hadoop fs -put  [-f] [-p] [-l] [-d] [ - | <localsrc1>  .. ]. <dst>
　　类似于copyFromLocal
　　删除文件rm    hadoop fs -rm [-f] [-r |-R] [-skipTrash] [-safely] URI [URI ...]
　　删除目录rmdir  hadoop fs -rmdir [--ignore-fail-on-non-empty] URI [URI ...]
　　显示文件部分内容tail  hadoop fs -tail [-f] URI
　　其余略
　　1.7 jar
　　使用hadoop来运行一个jar
　　hadoop jar <jar> [mainClass] args...
　　但hadoop建议使用yarn jar 来替代hadoop jar
　　yarn jar的命令参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar
　　1.8 key
　　管理密匙供应商的密匙
　　具体略
　　1.9 trace
　　查看和修改跟踪设置，具体参考  http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/Tracing.html
　　1.10 version
　　查看版本信息
　　hadoop version
　　1.11>
　　利用hadoop运行某个类
　　语法：hadoop CLASSNAME
　　以下内容来自 http://www.thebigdata.cn/Hadoop/1924.html
　　使用hadoop>
　　export  HADOOP_CLASSPATH=/home/hadoop/jardir/*.jar:/home/hadoop/workspace/hdfstest/bin/
　　其中/home/hadoop/jardir/包含了我所有的hadoop的jar包。
　　/home/hadoop/workspace/hdfstest/bin/就是我的开发class的所在目录。
　　我使用eclipse写java开发，由于eclipse有自动编译的功能，写好之后，就可以直接在命令行运行hadoop>
　　你同样可以将你的工程打成runable jar包（将所有的jar包打包）。然后运行hadoop jar jar包名类型参数1 。每一次都要打成jar包，这对于测试来说极不方便的。。。
　　这个主要就是为了方便开发人员测试的。
　　第二部分 hdfs 命令
　　直接在cli下输入hdfs可以获得官方的帮助
　　dfs                   run a filesystem command on the file systems supported in Hadoop.
　　>　　namenode -format    format the DFS filesystem
　　secondarynamenode run the DFS secondary namenode
　　namenode          run the DFS namenode
　　journalnode       run the DFS journalnode
　　zkfc                   run the ZK Failover Controller daemon
　　datanode          run a DFS datanode
　　debug                run a Debug Admin to execute HDFS debug commands
　　dfsadmin          run a DFS admin client
　　haadmin             run a DFS HA admin client
　　fsck                   run a DFS filesystem checking utility
　　balancer          run a cluster balancing utility
　　jmxget             get JMX exported values from NameNode or DataNode.
　　mover             run a utility to move block replicas across
　　storage types
　　oiv                   apply the offline fsimage viewer to an fsimage
　　oiv_legacy          apply the offline fsimage viewer to an legacy fsimage
　　oev                apply the offline edits viewer to an edits file
　　fetchdt             fetch a delegation token from the NameNode
　　getconf             get config values from configuration
　　groups             get the groups which users belong to
　　snapshotDiff       diff two snapshots of a directory or diff the
　　current directory contents with a snapshot
　　lsSnapshottableDir list all snapshottable dirs owned by the current user
　　Use -help to see options
　　portmap             run a portmap service
　　nfs3                run an NFS version 3 gateway
　　cacheadmin          configure the HDFS cache
　　crypto             configure HDFS encryption zones
　　storagepolicies    list/get/set block storage policies
　　version             print the version
　　或者直接通过 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html  获得官方的帮助
　　需要阅读的内容太多，先提供一个清单，简要说明每个命令是做什么，并重点介绍几个内容
命令语法功能概要描述classpathhdfs>       [-list-corruptfileblocks |

[-move | -delete | -openforwrite]

[-files [-blocks [-locations | -racks | -replicaDetails]]]

[-includeSnapshots]

[-storagepolicies] [-blockId <blk_Id>]

　　运行hdfs文件系统检验
　　管理员有必要常常执行这个命令
getconf　　　　hdfs getconf -namenodes
　　hdfs getconf -secondaryNameNodes
　　hdfs getconf -backupNodes
　　hdfs getconf -includeFile
　　hdfs getconf -excludeFile
　　hdfs getconf -nnRpcAddresses
　　hdfs getconf -confKey [key]
获取配置信息groups　　 hdfs groups [username ...]获取用户的组信息lsSnapshottableDir hdfs lsSnapshottableDir [-help]获取快照表目录jmxget hdfs jmxget [-localVM ConnectorURL | -port port | -server mbeanserver | -service service]　　从特定服务获取jmx信息
　　原文用的是dump/倒出
oev　　hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
　　参考 http://lxw1234.com/archives/2015/08/442.htm
离线编辑查看器oiv　　hdfs oiv [OPTIONS] -i INPUT_FILE
　　参考 http://lxw1234.com/archives/2015/08/440.htm
离线映像编辑查看器snapshotDiff　　hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
　　具体参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html#Get_Snapshots_Difference_Report
比较不同快照的差异version hdfs version查看版本信息balancer　　hdfs balancer

[-threshold <threshold>]

[-policy <policy>]

[-exclude [-f <hosts-file> | <comma-separated list of hosts>]]

[-include [-f <hosts-file> | <comma-separated list of hosts>]]

[-source [-f <hosts-file> | <comma-separated list of hosts>]]

[-blockpools <comma-separated list of blockpool>

[-idleiterations <idleiterations>]

　　详细参考
　　http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer
　　运行集群均衡
　　非常重要命令
　　由于各种原因，需要重新均衡数据节点。例如添加了新节点之后
cacheadmin　　hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]
　　详细参考
　　http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html#cacheadmin_command-line_interface
　　缓存管理
　　非常重要命令
　　官方微了这个写了一大篇的文章进行描述。
datanode hdfs datanode [-regular | -rollback | -rollingupgrade rollback]　　数据节点管理
　　用于启动数据节点和滚动升级中进行回滚
dfsadmin hdfs dfsadmin [GENERIC_OPTIONS]

[-report [-live] [-dead] [-decommissioning]]

[-safemode enter | leave | get | wait | forceExit]

[-saveNamespace]

[-rollEdits]

[-restoreFailedStorage true |false |check]

[-refreshNodes]

[-setQuota <quota> <dirname>...<dirname>]

[-clrQuota <dirname>...<dirname>]

[-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>]

[-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>]

[-finalizeUpgrade]

[-rollingUpgrade [<query> |<prepare> |<finalize>]]

[-metasave filename]

[-refreshServiceAcl]

[-refreshUserToGroupsMappings]

[-refreshSuperUserGroupsConfiguration]

[-refreshCallQueue]

[-refresh <host:ipc_port> <key> [arg1..argn]]

[-reconfig <datanode |...> <host:ipc_port> <start |status>]

[-printTopology]

[-refreshNamenodes datanodehost:port]

[-deleteBlockPool datanode-host:port blockpoolId [force]]

[-setBalancerBandwidth <bandwidth in bytes per second>]

[-getBalancerBandwidth <datanode_host:ipc_port>]

[-allowSnapshot <snapshotDir>]

[-disallowSnapshot <snapshotDir>]

[-fetchImage <local directory>]

[-shutdownDatanode <datanode_host:ipc_port> [upgrade]]

[-getDatanodeInfo <datanode_host:ipc_port>]

[-evictWriters <datanode_host:ipc_port>]

[-triggerBlockReport [-incremental] <datanode_host:ipc_port>]

[-help [cmd]]

　　文件管理
　　核心命令--至关重要
haadmin hdfs haadmin -checkHealth <serviceId>　　hdfs haadmin -failover [--forcefence] [--forceactive] <serviceId> <serviceId>
　　hdfs haadmin -getServiceState <serviceId>
　　hdfs haadmin -help <command>
　　hdfs haadmin -transitionToActive <serviceId> [--forceactive]
　　hdfs haadmin -transitionToStandby <serviceId>
　　高可靠管理
　　核心命令-至关重要
journalnode　　hdfs journalnode
　　参考
　　http://blog.csdn.net/kiwi_kid/article/details/53514314
　　http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Administrative_commands
　　http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html
　　http://blog.csdn.net/dr_guo/article/details/50975851 --搭建ha集群参考
运行一个名称节点见同步服务mover　　hdfs mover [-p <files/dirs> | -f <local file name>]
　　参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool
运行数据迁移。用于迁移压缩文件。类似于均衡器。定时均衡有关数据namenode hdfs namenode [-backup] |

[-checkpoint] |

[-format [-clusterid cid ] [-force] [-nonInteractive] ] |

[-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |

[-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] ] |

[-rollback] |

[-rollingUpgrade <rollback |started> ] |

[-finalize] |

[-importCheckpoint] |

[-initializeSharedEdits] |

[-bootstrapStandby [-force] [-nonInteractive] [-skipSharedEditsCheck] ] |

[-recover [-force] ] |

[-metadataVersion ]

　　名称节点管理（核心命令-至关重要）
　　进行备份，格式化，升级，回滚，恢复等等至关重要的操作。
nfs3　　hdfs  nfs3
　　参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service
　　启动一个nfs3网关，能够以类似操作系统文件浏览方式来浏览hdfs文件。
　　通过这个东西，有的时候能够更方便地操作
portmap　　hdfs  portmap
　　参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service
和nfs服务器一起使用secondarynamenode　　hdfs secondarynamenode [-checkpoint [force]] | [-format] | [-geteditsize]
　　参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Secondary_NameNode
关于第二名称节点 storagepolicies　　hdfs storagepolicies
　　参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
　　压缩存储策略管理
　　在某些环境下很有利。也许以后不存在所谓ssd的问题，仅仅是内存还是磁盘的问题
zkfc　　hdfs zkfc [-formatZK [-force] [-nonInteractive]]
　　参考  http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Administrative_commands
　　管理动物园管理员节点
　　和journalnoe有关
　　高可靠的重要部分
debug hdfs debug verifyMeta -meta <metadata-file> [-block <block-file>] 检验hdfs的元数据和块文件。　　hdfs debug computeMeta -block <block-file> -out <output-metadata-file>
　　谨慎使用，官方告警：
　　Use at your own risk! If the block file is corrupt and you overwrite it’s meta file, it will show up as ‘good’ in HDFS,
　　but you can’t read the data. Only use as a last measure, and when you are 100% certain the block file is good.
　　通过块文件计算元数据
hdfs debug recoverLease -path <path> [-retries <num-retries>]　　恢复租约?
　　恢复特定路径的租约
　　第三部分  yarn命令
　　细节参考 http://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html
　　下表列出命令概览
yarn命令概览命令　　语法和概述备注 applicationyarn application [options]打开应用报告或者终止应用 applicationattemptyarn applicationattempt [options]打印应用尝试报告 classpathyarn>yarn daemonlog -getlevel <host:httpport> <classname>

　　yarn daemonlog -setlevel <host:httpport> <classname> <level>
　　例如：
　　bin/yarn daemonlog -setlevel 127.0.0.1:8088 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl DEBUG
获取/设置类的日志级别 nodemanageryarn nodemanager启动yarn的节点管理器 proxyserveryarn proxyserver启动web代理服务器 resourcemanageryarn resourcemanager [-format-state-store]启动yarn资源管理亲戚 rmadmin Usage: yarn rmadmin　　-refreshQueues
　　-refreshNodes [-g [timeout in seconds]]
　　-refreshNodesResources
　　-refreshSuperUserGroupsConfiguration
　　-refreshUserToGroupsMappings
　　-refreshAdminAcls
　　-refreshServiceAcl
　　-getGroups [username]
　　-addToClusterNodeLabels <"label1(exclusive=true),label2(exclusive=false),label3">
　　-removeFromClusterNodeLabels <label1,label2,label3> (label splitted by ",")
　　-replaceLabelsOnNode <"node1[:port]=label1,label2 node2[:port]=label1,label2"> [-failOnUnknownNodes]
　　-directlyAccessNodeLabelStore
　　-refreshClusterMaxPriority
　　-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout])
　　-transitionToActive [--forceactive] <serviceId>
　　-transitionToStandby <serviceId>
　　-failover [--forcefence] [--forceactive] <serviceId> <serviceId>
　　-getServiceState <serviceId>
　　-checkHealth <serviceId>
　　-help [cmd]
管理资源管理器 scmadmin　　yarn scmadmin [options]
　　yarn scmadmin -runCleanerTask
执行共享缓存管理 sharedcachemanageryarn sharedcachemanager 启动共享缓存管理器 timelineserveryarn timelineserver 启动时间线服务器　　　　　　第四部分总结
　　1. 有很多重要的命令
　　2. 了解所有这些命令，必须耗费许多时间，并必须在一个完善的环境下进行！
　　3. 不要在blog中插入太多表格，否则会倒霉的。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] 介绍hadoop中的hadoop和hdfs命令

浏览过的版块

扫码加入运维网微信交流群