Apache Hadoop Wins Terabyte Sort Benchmark

lig · 发表于 2016-12-9 07:08:14

　　1T字节的数据排序209秒内完成，成功打破297秒的纪录。
　　100亿100字节的纪录，
　　yahoo拥有13000以上各节点的Hadopp集群。
　　One of Yahoo's Hadoopclusters sorted 1 terabyte of data in 209 seconds, which beat the previous record of 297 seconds in the annual general purpose (daytona) terabyte sort benchmark.The sort benchmark, which was created in 1998 by Jim Gray, specifiesthe input data (10 billion 100 byte records), which must be completelysorted and written to disk. This is the first time that either a Javaor an open source program has won. Yahoo is both the largest user ofHadoop with 13,000+ nodes running hundreds of thousands of jobs a monthand the largest contributor, although non-Yahoo usageand contributionsare increasing rapidly.
　　The cluster statistics were:

910 nodes
2 quad core Xeons @ 2.0ghz per a node
4 SATA disks per a node
8G RAM per a node
1 gigabit ethernet on each node
40 nodes per a rack
8 gigabit ethernet uplinks from each rack to the core
Red Hat Enterprise Linux Server Release 5.1 (kernel 2.6.18)
Sun Java JDK 1.6.0_05-b13

　　The benchmark was run with Hadoop trunk (pre-0.18) with a couple ofoptimization patches to remove intermediate writes to disk. The sortused 1800 maps and 1800 reduces and allocated enough memory to buffersto hold the intermediate data in memory. All of the code for thebenchmark has been checked in as a Hadoop example.

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] Apache Hadoop Wins Terabyte Sort Benchmark

浏览过的版块

扫码加入运维网微信交流群