设为首页 收藏本站
查看: 1140|回复: 0

运行hadoop基准测试

[复制链接]

尚未签到

发表于 2015-11-11 10:47:00 | 显示全部楼层 |阅读模式
  转自:http://blog.iyunv.com/azhao_dn/article/details/6930909





由于需要为hadoop集群采购新的服务器,需要对服务器在hadoop环境下的性能进行测试,所以特地整理了一下hadoop集群自带的测试用例:




  • bin/hadoop jar hadoop-*test*.jar
    运行上述命令,可以得到hadoop-*test*.jar自带的测试程序
    [html] view
    plaincopy


    • An example program must be given as the first argument.  

    • Valid program names are:  

    •   DFSCIOTest: Distributed i/o benchmark of libhdfs.  

    •   DistributedFSCheck: Distributed checkup of the file system consistency.  

    •   MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures  

    •   TestDFSIO: Distributed i/o benchmark.  

    •   dfsthroughput: measure hdfs throughput  

    •   filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)  

    •   loadgen: Generic map/reduce load generator  

    •   mapredtest: A map/reduce test check.  

    •   mrbench: A map/reduce benchmark that can create many small jobs  

    •   nnbench: A benchmark that stresses the namenode.  

    •   testarrayfile: A test for flat files of binary key/value pairs.  

    •   testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce  

    •   testfilesystem: A test for FileSystem read/write.  

    •   testipc: A test for ipc.  

    •   testmapredsort: A map/reduce program that validates the map-reduce framework's sort.  

    •   testrpc: A test for rpc.  

    •   testsequencefile: A test for flat files of binary key value pairs.  

    •   testsequencefileinputformat: A test for sequence file input format.  

    •   testsetfile: A test for flat files of binary key/value pairs.  

    •   testtextinputformat: A test for text input format.  

    •   threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill  

    其中最常用到的是DFSCIOTest,DFSCIOTest的命令参数如下:
    [html] view
    plaincopy



    • $ bin/hadoop jar hadoop-*test*.jar TestDFSIO  

    • TestDFSIO.0.0.4  

    • Usage: TestDFSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes]   

    hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 hadoop jar hadoop-*test*.jar TestDFSIO -read -nrFiles 10 -fileSize 1000hadoop jar hadoop-*test*.jar TestDFSIO -clean



  • [html] view
    plaincopy


    • bin/hadoop jar hadoop-*examples*.jar  

    运行上述命令,可以得到hadoop-*example*.jar自带的测试程序
    [html] view
    plaincopy



    • An example program must be given as the first argument.  

    • Valid program names are:  

    •   aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.  

    •   aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.  

    •   dbcount: An example job that count the pageview counts from a database.  

    •   grep: A map/reduce program that counts the matches of a regex in the input.  

    •   join: A job that effects a join over sorted, equally partitioned datasets  

    •   multifilewc: A job that counts words from several files.  

    •   pentomino: A map/reduce tile laying program to find solutions to pentomino problems.  

    •   pi: A map/reduce program that estimates Pi using monte-carlo method.  

    •   randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.  

    •   randomwriter: A map/reduce program that writes 10GB of random data per node.  

    •   secondarysort: An example defining a secondary sort to the reduce.  

    •   sleep: A job that sleeps at each map and reduce task.  

    •   sort: A map/reduce program that sorts the data written by the random writer.  

    •   sudoku: A sudoku solver.  

    •   teragen: Generate data for the terasort  

    •   terasort: Run the terasort  

    •   teravalidate: Checking results of terasort  

    •   wordcount: A map/reduce program that counts the words in the input files.  

    其中最常用的是teragen/terasort/teravalidate,一个完整的terasort测试由三个步骤组成:1)teragen产生数据;2)terasort执行排序;3)teravalidate验证排序结果。其运行命令参数如下:
    hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir>hadoop jar hadoop-*examples*.jar terasort <input dir> <output dir>hadoop jar hadoop-*examples*.jar teravalidate <terasort output dir (= input data)> <teravalidate output dir>
    teravalidate执行验证操作时会输出排序错误的key,当输出结果为空时,表示排序正确






  • NameNode基准测试nnbench
    [html] view
    plaincopy


    • $ bin/hadoop jar hadoop-*test*.jar nnbench  

    • NameNode Benchmark 0.4  

    • Usage: nnbench <options>  

    • Options:  

    •         -operation <Available operations are create_write open_read rename delete. This option is mandatory>  

    •          * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.  

    •         -maps <number of maps. default is 1. This is not mandatory>  

    •         -reduces <number of reduces. default is 1. This is not mandatory>  

    •         -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time &#43; 2 mins. This is not mandatory   

    •         -blockSize <Block size in bytes. default is 1. This is not mandatory>  

    •         -bytesToWrite <Bytes to write. default is 0. This is not mandatory>  

    •         -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>  

    •         -numberOfFiles <number of files to create. default is 1. This is not mandatory>  

    •         -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>  

    •         -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>  

    •         -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>  

    •         -help: Display the help statement  

    运行案例:

    $ hadoop jar hadoop-*test*.jar nnbench -operation create_write \ -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 \ -replicationFactorPerFile 3 -readFileAfterOpen true \ -baseDir /benchmarks/NNBench-`hostname -s`



  • MapRed基准测试mrbench
    [html] view
    plaincopy


    • bin/hadoop jar hadoop-*test*.jar nnbench --help  

    • NameNode Benchmark 0.4  

    • Usage: nnbench <options>  

    • Options:  

    •         -operation <Available operations are create_write open_read rename delete. This option is mandatory>  

    •          * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.  

    •         -maps <number of maps. default is 1. This is not mandatory>  

    •         -reduces <number of reduces. default is 1. This is not mandatory>  

    •         -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time &#43; 2 mins. This is not mandatory   

    •         -blockSize <Block size in bytes. default is 1. This is not mandatory>  

    •         -bytesToWrite <Bytes to write. default is 0. This is not mandatory>  

    •         -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>  

    •         -numberOfFiles <number of files to create. default is 1. This is not mandatory>  

    •         -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>  

    •         -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>  

    •         -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>  

    •         -help: Display the help statement  




  • gridmix测试:gridmix测试是将hadoop自带基准测试进一步打包,一次运行所有测试
    [html] view
    plaincopy


    • 1)编译:<pre name=&quot;code&quot; class=&quot;html&quot;>cd src/benchmarks/gridmix2  

    • ant  

    2)修改配置文件:vi gridmix-env-2
    [html] view
    plaincopy



    • export HADOOP_INSTALL_HOME=/home/test/hadoop  

    • export HADOOP_VERSION=hadoop-0.20.203.0  

    • export HADOOP_HOME=${HADOOP_INSTALL_HOME}/${HADOOP_VERSION}  

    • export HADOOP_CONF_DIR=$HADOOP_HOME/conf  

    • export USE_REAL_DATASET=  

    •   

    • export APP_JAR=${HADOOP_HOME}/hadoop-core-0.20.203.0.jar  

    • export EXAMPLE_JAR=${HADOOP_HOME}/hadoop-examples-0.20.203.0.jar  

    • export STREAMING_JAR=${HADOOP_HOME}/contrib/streaming/hadoop-streaming-0.20.203.0.jar  

    • <pre name=&quot;code&quot; class=&quot;html&quot;>3)产生测试数据:sh generateGridmix2data.sh  

    4)运行测试:
    [html] view
    plaincopy



    • $ chmod &#43;x rungridmix_2  

    • $ ./rungridmix_2  












  • 参考资料:

    [html] view
    plaincopy


    • 1.<a href=&quot;http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/&quot;>Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.</a>  

    • 2.<a href=&quot;http://adaishu.blog.163.com/blog/static/17583128620114218589154/&quot;>Hadoop的Gridmix2基准测试点</a>  

    • 3.<a href=&quot;http://dongxicheng.org/mapreduce/hadoop-gridmix-benchmark/&quot;>Hadoop Gridmix基准测试</a>  

    • 4<a href=&quot;http://blog.iyunv.com/dahaifeiyu/article/details/6220174&quot;>.Hadoop 集群的基准测试</a>  



运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-137815-1-1.html 上篇帖子: hadoop2.0介绍(一) 下篇帖子: org.apache.hadoop.hbase.coprocessor.AggregateImplementation 来统计表的行数
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表