运行hadoop基准测试

ls0398 发表于 2015-11-11 10:47:00

　　转自：http://blog.iyunv.com/azhao_dn/article/details/6930909

由于需要为hadoop集群采购新的服务器，需要对服务器在hadoop环境下的性能进行测试，所以特地整理了一下hadoop集群自带的测试用例：

[*]
bin/hadoop jar hadoop-*test*.jar
运行上述命令，可以得到hadoop-*test*.jar自带的测试程序
view
plaincopy
[*]
An example program must be given as the first argument.
[*]
Valid program names are:
[*]
DFSCIOTest: Distributed i/o benchmark of libhdfs.
[*]
DistributedFSCheck: Distributed checkup of the file system consistency.
[*]
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
[*]
TestDFSIO: Distributed i/o benchmark.
[*]
dfsthroughput: measure hdfs throughput
[*]
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
[*]
loadgen: Generic map/reduce load generator
[*]
mapredtest: A map/reduce test check.
[*]
mrbench: A map/reduce benchmark that can create many small jobs
[*]
nnbench: A benchmark that stresses the namenode.
[*]
testarrayfile: A test for flat files of binary key/value pairs.
[*]
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
[*]
testfilesystem: A test for FileSystem read/write.
[*]
testipc: A test for ipc.
[*]
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
[*]
testrpc: A test for rpc.
[*]
testsequencefile: A test for flat files of binary key value pairs.
[*]
testsequencefileinputformat: A test for sequence file input format.
[*]
testsetfile: A test for flat files of binary key/value pairs.
[*]
testtextinputformat: A test for text input format.
[*]
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

其中最常用到的是DFSCIOTest，DFSCIOTest的命令参数如下：
view
plaincopy

[*]
$ bin/hadoop jar hadoop-*test*.jar TestDFSIO
[*]
TestDFSIO.0.0.4
[*]
Usage: TestDFSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes]

hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 hadoop jar hadoop-*test*.jar TestDFSIO -read -nrFiles 10 -fileSize 1000hadoop jar hadoop-*test*.jar TestDFSIO -clean

[*]
view
plaincopy
[*]
bin/hadoop jar hadoop-*examples*.jar

运行上述命令，可以得到hadoop-*example*.jar自带的测试程序
view
plaincopy

[*]
An example program must be given as the first argument.
[*]
Valid program names are:
[*]
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
[*]
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
[*]
dbcount: An example job that count the pageview counts from a database.
[*]
grep: A map/reduce program that counts the matches of a regex in the input.
[*]
join: A job that effects a join over sorted, equally partitioned datasets
[*]
multifilewc: A job that counts words from several files.
[*]
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
[*]
pi: A map/reduce program that estimates Pi using monte-carlo method.
[*]
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
[*]
randomwriter: A map/reduce program that writes 10GB of random data per node.
[*]
secondarysort: An example defining a secondary sort to the reduce.
[*]
sleep: A job that sleeps at each map and reduce task.
[*]
sort: A map/reduce program that sorts the data written by the random writer.
[*]
sudoku: A sudoku solver.
[*]
teragen: Generate data for the terasort
[*]
terasort: Run the terasort
[*]
teravalidate: Checking results of terasort
[*]
wordcount: A map/reduce program that counts the words in the input files.

其中最常用的是teragen/terasort/teravalidate，一个完整的terasort测试由三个步骤组成：1）teragen产生数据；2）terasort执行排序；3）teravalidate验证排序结果。其运行命令参数如下：
hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir>hadoop jar hadoop-*examples*.jar terasort <input dir> <output dir>hadoop jar hadoop-*examples*.jar teravalidate <terasort output dir (= input data)> <teravalidate output dir>
teravalidate执行验证操作时会输出排序错误的key，当输出结果为空时，表示排序正确

[*]NameNode基准测试nnbench
view
plaincopy
[*]
$ bin/hadoop jar hadoop-*test*.jar nnbench
[*]
NameNode Benchmark 0.4
[*]
Usage: nnbench <options>
[*]
Options:
[*]
   -operation <Available operations are create_write open_read rename delete. This option is mandatory>
[*]
      * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
[*]
   -maps <number of maps. default is 1. This is not mandatory>
[*]
   -reduces <number of reduces. default is 1. This is not mandatory>
[*]
   -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory
[*]
   -blockSize <Block size in bytes. default is 1. This is not mandatory>
[*]
   -bytesToWrite <Bytes to write. default is 0. This is not mandatory>
[*]
   -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
[*]
   -numberOfFiles <number of files to create. default is 1. This is not mandatory>
[*]
   -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
[*]
   -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
[*]
   -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
[*]
   -help: Display the help statement

运行案例：

$ hadoop jar hadoop-*test*.jar nnbench -operation create_write \ -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 \ -replicationFactorPerFile 3 -readFileAfterOpen true \ -baseDir /benchmarks/NNBench-`hostname -s`

[*]MapRed基准测试mrbench
view
plaincopy
[*]
bin/hadoop jar hadoop-*test*.jar nnbench --help
[*]
NameNode Benchmark 0.4
[*]
Usage: nnbench <options>
[*]
Options:
[*]
   -operation <Available operations are create_write open_read rename delete. This option is mandatory>
[*]
      * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
[*]
   -maps <number of maps. default is 1. This is not mandatory>
[*]
   -reduces <number of reduces. default is 1. This is not mandatory>
[*]
   -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory
[*]
   -blockSize <Block size in bytes. default is 1. This is not mandatory>
[*]
   -bytesToWrite <Bytes to write. default is 0. This is not mandatory>
[*]
   -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
[*]
   -numberOfFiles <number of files to create. default is 1. This is not mandatory>
[*]
   -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
[*]
   -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
[*]
   -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
[*]
   -help: Display the help statement

[*]gridmix测试：gridmix测试是将hadoop自带基准测试进一步打包，一次运行所有测试
view
plaincopy
[*]
1）编译：<pre name="code" class="html">cd src/benchmarks/gridmix2
[*]
ant

2）修改配置文件：vi gridmix-env-2
view
plaincopy

[*]
export HADOOP_INSTALL_HOME=/home/test/hadoop
[*]
export HADOOP_VERSION=hadoop-0.20.203.0
[*]
export HADOOP_HOME=${HADOOP_INSTALL_HOME}/${HADOOP_VERSION}
[*]
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
[*]
export USE_REAL_DATASET=
[*]

[*]
export APP_JAR=${HADOOP_HOME}/hadoop-core-0.20.203.0.jar
[*]
export EXAMPLE_JAR=${HADOOP_HOME}/hadoop-examples-0.20.203.0.jar
[*]
export STREAMING_JAR=${HADOOP_HOME}/contrib/streaming/hadoop-streaming-0.20.203.0.jar
[*]
<pre name="code" class="html">3）产生测试数据：sh generateGridmix2data.sh

4）运行测试：
view
plaincopy

[*]
$ chmod +x rungridmix_2
[*]
$ ./rungridmix_2

[*]参考资料：

view
plaincopy
[*]
1.<a href="http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/">Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.</a>
[*]
2.<a href="http://adaishu.blog.163.com/blog/static/17583128620114218589154/">Hadoop的Gridmix2基准测试点</a>
[*]
3.<a href="http://dongxicheng.org/mapreduce/hadoop-gridmix-benchmark/">Hadoop Gridmix基准测试</a>
[*]
4<a href="http://blog.iyunv.com/dahaifeiyu/article/details/6220174">.Hadoop 集群的基准测试</a>

页: [1]

运维网's Archiver

运行hadoop基准测试