发布日期: 2009 年 4 月 13 日 级别: 中级 其他语言版本: 英文 访问情况 3254 次浏览 建议: 2 (查看或添加评论) <!-- Rating_Area_Begin --><!-- Ensure that div id is based on input id and ends with -widget -->
回页首
安装 Ganglia
Internet 中有许多介绍如何安装 Ganglia 的文章和参考资料。我们将重新查看我在 xCAT 维基中撰写的一篇文章。我假定出于本文的目的,操作系统是 Red Hat 5 Update 2(但是对于其他企业 Linux 操作系统,这些步骤不会有很大差别)。
先决条件
假定您已经设置了 yum 库,安装先决条件在很大程度上应当十分简单。类似于以下代码:
cd /tmp/
wget http://oss.oetiker.ch/rrdtool/pub/rrdtool.tar.gz
tar zxvf rrdtool*
cd rrdtool-*
./configure --prefix=/usr
make -j8
make install
which rrdtool
ldconfig # make sure you have the new rrdtool libraries linked.
cd /tmp/
tar zxvf ganglia*gz
cd ganglia-3.1.1/
./configure --with-gmetad
make -j8
make install
您应当会退出,而不会遇到任何错误。如果看到错误,则可能需要检查缺少哪些库。
回页首
配置 Ganglia
现在基本安装已完成,需要设置几个配置项才能运行。执行以下步骤:
处理命令行文件。
修改 /etc/ganglia/gmond.conf。
注意多宿主(multi-homed)计算机。
在管理服务器中启动它。
步骤 1:处理命令行文件
如下所示:
cd /tmp/ganglia-3.1.1/ # you should already be in this directory
mkdir -p /var/www/html/ganglia/ # make sure you have apache installed
cp -a web/* /var/www/html/ganglia/ # this is the web interface
cp gmetad/gmetad.init /etc/rc.d/init.d/gmetad # startup script
cp gmond/gmond.init /etc/rc.d/init.d/gmond
mkdir /etc/ganglia # where config files go
gmond -t | tee /etc/ganglia/gmond.conf # generate initial gmond config
cp gmetad/gmetad.conf /etc/ganglia/ # initial gmetad configuration
mkdir -p /var/lib/ganglia/rrds # place where RRDTool graphs will be stored
chown nobody:nobody /var/lib/ganglia/rrds # make sure RRDTool can write here.
chkconfig --add gmetad # make sure gmetad starts up at boot time
chkconfig --add gmond # make sure gmond starts up at boot time
步骤 2:修改 /etc/ganglia/gmond.conf
现在可以修改 /etc/ganglia/gmond.conf 以命名集群。假定集群名称为 “matlock”;则可以将 name = "unspecified" 更改为 name = "matlock"。
步骤 3:注意多宿主计算机
在我的集群中,eth0 是我的系统的公共 IP 地址。但是,监视服务器将通过 eth1 与私有集群网络中的节点进行通信。我需要确保 Ganglia 使用的多点传送将与 eth1 绑定在一起。这可以通过创建 /etc/sysconfig/network-scripts/route-eth1 文件来完成。添加 239.2.11.71 dev eth1 内容。
然后您可以使用 service network restart 重新启动网络并确保路由器显示此 IP 通过 eth1。注:您应当使用 239.2.11.71,因为这是 ganglia 的默认多点传送通道。如果使用其他通道或者增加更多通道,请更改它。
步骤 4:在管理服务器中启动它
现在您可以在监视服务器中完全启动它:
service gmond start
service gmetad start
service httpd restart
停止 Web 浏览器并将其指向位于 http://localhost/ganglia 的管理服务器。您将看到管理服务器现在处于受监视状态。您还将看到若干度量数据正受到监视并绘制曲线图。最有用的曲线图之一是您可以监视这台计算机中的负载情况。下面是我的计算机的负载情况图:
import os
def temp_handler(name):
# our commands we're going to execute
sdrfile = "/tmp/sdr.dump"
ipmitool = "/usr/bin/ipmitool"
# Before you run this Load the IPMI drivers:
# modprobe ipmi_msghandler
# modprobe ipmi_si
# modprobe ipmi_devintf
# you'll also need to change permissions of /dev/ipmi0 for nobody
# chown nobody:nobody /dev/ipmi0
# put the above in /etc/rc.d/rc.local
foo = os.path.exists(sdrfile)
if os.path.exists(sdrfile) != True:
os.system(ipmitool + ' sdr dump ' + sdrfile)
if os.path.exists(sdrfile):
ipmicmd = ipmitool + " -S " + sdrfile + " -c sdr"
else:
print "file does not exist... oops!"
ipmicmd = ipmitool + " -c sdr"
cmd = ipmicmd + " type temperature | sed 's/ /_/g' "
cmd = cmd + " | awk -F, '/Ambient/ {print $2}' "
#print cmd
entries = os.popen(cmd)
for l in entries:
line = l.split()
# print line
return int(line[0])
def metric_init(params):
global descriptors
temp = {'name': 'Ambient Temp',
'call_back': temp_handler,
'time_max': 90,
'value_type': 'uint',
'units': 'C',
'slope': 'both',
'format': '%u',
'description': 'Ambient Temperature of host through IPMI',
'groups': 'IPMI In Band'}
descriptors = [temp]
return descriptors
def metric_cleanup():
'''Clean up the metric module.'''
pass
#This code is for debugging and unit testing
if __name__ == '__main__':
metric_init(None)
for d in descriptors:
v = d['call_back'](d['name'])
print 'value for %s is %u' % (d['name'], v)
#!/usr/bin/perl
# vallard@us.ibm.com
use strict; # to keep things clean... er cleaner
use Socket; # to resolve host names into IP addresses
# code to clean up after forks
use POSIX ":sys_wait_h";
# nodeFile: is just a plain text file with a list of nodes:
# e.g:
# node01
# node02
# ...
# nodexx
my $nodeFile = "/usr/local/bin/nodes";
# gmetric binary
my $gmetric = "/usr/bin/gmetric";
#ipmitool binary
my $ipmi = "/usr/bin/ipmitool";
# userid for BMCs
my $u = "xcat";
# password for BMCs
my $p = "f00bar";
# open the nodes file and iterate through each node
open(FH, "$nodeFile") or die "can't open $nodeFile";
while(my $node = <FH>){
# fork so each remote data call is done in parallel
if(my $pid = fork()){
# parent process
next;
}
# child process begins here
chomp($node); # get rid of new line
# resolve node's IP address for spoofing
my $ip;
my $pip = gethostbyname($node);
if(defined $pip){
$ip = inet_ntoa($pip);
}else{
print "Can't get IP for $node!\n";
exit 1;
}
# check if the SDR cache file exists.
my $ipmiCmd;
unless(-f "/tmp/$node.sdr"){
# no SDR cache, so try to create it...
$ipmiCmd = "$ipmi -I lan -H $node-bmc -U $u -P $p sdr dump /tmp/$node.sdr";
`$ipmiCmd`;
}
if(-f "/tmp/$node.sdr"){
# run the command against the cache so that its faster
$ipmiCmd = "$ipmi -I lan -H $node-bmc -U $u -P $p -S /tmp/$node.sdr sdr type
Temperature ";
# put all the output into the @out array
my @out = `$ipmiCmd`;
# iterate through each @out entry.
foreach(@out){
# each output line looks like this:
# Ambient Temp | 32h | ok | 12.1 | 25 degrees C
# so we parse it out
chomp(); # get rid of the new line
# grap the first and 5th fields. (Description and Temp)
my ($descr, undef, undef, undef,$temp) = split(/\|/);
# get rid of white space in description
$descr =~ s/ //g;
# grap just the temp, (We assume C anyway)
$temp = (split(' ', $temp))[0];
# make sure that temperature is a number:
if($temp =~ /^\d+/ ){
#print "$node: $descr $temp\n";
my $gcmd = "$gmetric -n '$descr' -v $temp -t int16 -u Celcius -S $ip:$node";
`$gcmd`;
}
}
}
# Child Thread done and exits.
exit;
}
# wait for all forks to end...
while(waitpid(-1,WNOHANG) != -1){
1;
}