Hadoop Balancer

hongblue · 发表于 2018-10-31 13:28:34

　　Whenever the nodes are added to the cluster or lots of data are delete, we need to run Hadoop balancer to balance the data in the datenodes. Or else, the over utilized data nodes will become the bottleneck of the cluster in terms of performance.
　　hadoop balancer -threshold

The optional –threshold parameter specifies the percentage of disk capacity leeway to average cluster datanodes utilization. An under-utilized data node is a node whose utilization is less than average utilization – threshold. An over-utilized data node is a node whose utilization is greater than average utilization + threshold. Smaller threshold values will achieve more evenly balanced nodes, but would take more time for the re-balancing. Default threshold value is 10 percent.
Re-balancing can be stopped by executing the bin/stop-balancer.sh command.
A summary of the re-balancing will be available at the $HADOOP_HOME/logs/hadoop-*-balancer*.out file.

　　During the balancing, Hadoop will move the data blocks from high utilized nodes to low utilized ones in a recursive way. In each iterations of recursion, the time is less than 20 min. Then, the system will see the status of data utilization overall to decide whether to jump out of balancing.
　　dfs.balance.bandwidthPerSec can define the bandwidth used by the balancing, default is 1M/second. If your cluster is busy, you should decrease this number. If you cluster is not busy, you can increase this number to speed up balancing. This setting is effective after HDFS restart

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] Hadoop Balancer

浏览过的版块

扫码加入运维网微信交流群