1T字节的数据排序209秒内完成,成功打破297秒的纪录。
100亿100字节的纪录,
yahoo拥有13000以上各节点的Hadopp集群。
One of Yahoo's Hadoopclusters sorted 1 terabyte of data in 209 seconds, which beat the previous record of 297 seconds in the annual general purpose (daytona) terabyte sort benchmark.The sort benchmark, which was created in 1998 by Jim Gray, specifiesthe input data (10 billion 100 byte records), which must be completelysorted and written to disk. This is the first time that either a Javaor an open source program has won. Yahoo is both the largest user ofHadoop with 13,000+ nodes running hundreds of thousands of jobs a monthand the largest contributor, although non-Yahoo usageand contributionsare increasing rapidly.
The cluster statistics were:
910 nodes
2 quad core Xeons @ 2.0ghz per a node
4 SATA disks per a node
8G RAM per a node
1 gigabit ethernet on each node
40 nodes per a rack
8 gigabit ethernet uplinks from each rack to the core
Red Hat Enterprise Linux Server Release 5.1 (kernel 2.6.18)
Sun Java JDK 1.6.0_05-b13
The benchmark was run with Hadoop trunk (pre-0.18) with a couple ofoptimization patches to remove intermediate writes to disk. The sortused 1800 maps and 1800 reduces and allocated enough memory to buffersto hold the intermediate data in memory. All of the code for thebenchmark has been checked in as a Hadoop example.