使用以下命令就可以调用jar包中的类:
$ hadoop jar $HADOOP_HOME/xxx.jar
(1). Hadoop Test
当不带参数调用hadoop-test-0.20.2-cdh3u3.jar时,会列出所有的测试程序:
$ hadoop jar $HADOOP_HOME/hadoop-test-0.20.2-cdh3u3.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
MRReliabilityTest: A program that tests the>
TestDFSIO: Distributed i/o benchmark.
dfsthroughput: measure hdfs throughput
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
testarrayfile: A test for flat files of binary key/value pairs.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does>
testfilesystem: A test for FileSystem read/write.
testipc: A test for ipc.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testrpc: A test for rpc.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testsetfile: A test for flat files of binary key/value pairs.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
* NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
-maps
-reduces
-startTime . default is launch time + 2 mins. This is not mandatory
以上结果表示平均作业完成时间是14秒。 (2). Hadoop Examples
除了上文提到的测试,Hadoop还自带了一些例子,比如WordCount和TeraSort,这些例子在hadoop-examples-0.20.2-cdh3u3.jar中。执行以下命令会列出所有的示例程序:
$ hadoop jar $HADOOP_HOME/hadoop-examples-0.20.2-cdh3u3.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
dbcount: An example job that count the pageview counts from a database.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sleep: A job that sleeps at each map and reduce task.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
WordCount在 Running Hadoop On CentOS (Single-Node Cluster) 一文中已有介绍,这里就不再赘述。 TeraSort
一个完整的TeraSort测试需要按以下三步执行:
用TeraGen生成随机数据
对输入数据运行TeraSort
用TeraValidate验证排好序的输出数据
并不需要在每次测试时都生成输入数据,生成一次数据之后,每次测试可以跳过第一步。
TeraGen的用法如下:
$ hadoop jar hadoop-*examples*.jar teragen
以下命令运行TeraGen生成1GB的输入数据,并输出到目录/examples/terasort-input:
$ hadoop jar $HADOOP_HOME/hadoop-examples-0.20.2-cdh3u3.jar teragen \
10000000 /examples/terasort-input
TeraGen产生的数据每行的格式如下:
\r\n
其中:
key是一些随机字符,每个字符的ASCII码取值范围为[32, 126]
rowid是一个整数,右对齐
filler由7组字符组成,每组有10个字符(最后一组8个),字符从’A'到’Z'依次取值
以下命令运行TeraSort对数据进行排序,并将结果输出到目录/examples/terasort-output:
$ hadoop jar $HADOOP_HOME/hadoop-examples-0.20.2-cdh3u3.jar terasort \
/examples/terasort-input /examples/terasort-output
以下命令运行TeraValidate来验证TeraSort输出的数据是否有序,如果检测到问题,将乱序的key输出到目录/examples/terasort-validate
$ hadoop jar $HADOOP_HOME/hadoop-examples-0.20.2-cdh3u3.jar teravalidate \
/examples/terasort-output /examples/terasort-validate
(3). Hadoop Gridmix2
Gridmix是Hadoop自带的基准测试程序,是对其它几个基准测试程序的进一步封装,包括产生数据、提交作业、统计完成时间等功能模块。Gridmix自带了各种类型的作业,分别为streamSort、javaSort、combiner、monsterQuery、webdataScan和webdataSort。 编译
$ cd $HADOOP_HOME/src/benchmarks/gridmix2
$ ant
Hive Query Benchmarks (hivebench):这个负载的开发基于SIGMOD 09的一篇论文“A Comparison of Approaches to Large-Scale Data Analysis”和HIVE-396,包含执行典型OLAP查询的Hive查询(Aggregation and Join),使用自动生成的Web数据,Web数据中的链接符合Zipfian分布