自己编写接口用于获取Hadoop Job conf 信息

FXMAR · 发表于 2016-12-11 07:18:42

　　Hadoop Job完成后可以设置回调接口，一个自定义的URL，比如我的：
　　

http://x.x.x.x/log/notify/stat_job/{jobId}/{jobStatus}
　　之后我在Servlet中可以拿到jobId，通过jobId，就可以拿到Job对象（RunningJob），代码如下：
　　

public static RunningJob getJobById(String jobId) {
Configuration conf = new Configuration();
conf.set("mapred.job.tracker", Constants.MAP_REDUCE_URL);
JobClient client;
try {
client = new JobClient(new JobConf(conf));
return client.getJob(jobId);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
　　关键是这个RunningJob对象只可以获取jobname信息，但是无法获取我们设置的conf信息。为了解决这个问题，我写了一个jsp，放到namenode上，用于读取本地log文件，并将结果反馈给调用者。代码如下，希望对大家有帮助：
　　

<%@ page
contentType="text/xml; charset=UTF-8"
import="javax.servlet.*"
import="javax.servlet.http.*"
import="java.io.*"
import="java.net.URL"
import="org.apache.hadoop.util.*"
import="javax.servlet.jsp.JspWriter"
%><%!
private static final long serialVersionUID = 1L;
public File getHistoryFile(final String jobId) {
File dir = new File("/opt/hadoop/logs/history/done/");
File[] files = dir.listFiles(new FilenameFilter() {
public boolean accept(File dir, String name) {
if (name.indexOf(jobId) >= 0 && name.endsWith(".xml")) {
return true;
}
return false;
}
});
if (files != null && files.length > 0) {
return files[0];
}
return null;
}
public void printXML(JspWriter out, String jobId) throws IOException {
FileInputStream jobFile = null;
String[] outputKeys = { "db_id", "channel_id", "channel_name", "user_id", "user_name", "job_day", "pub_format_type", "mapred.output.dir" };
String line = "";
try {
jobFile = new FileInputStream(getLogFilePath(jobId));
BufferedReader reader = new BufferedReader(new InputStreamReader(jobFile));
while ((line = reader.readLine()) != null) {
for (String key : outputKeys) {
if (!line.startsWith("<property>") || line.indexOf("<name>" + key + "</name>") >= 0) {
out.println(line);
break;
}
}
}
} catch (Exception e) {
out.println("Failed to retreive job configuration for job '" + jobId + "!");
out.println(e);
} finally {
if (jobFile != null) {
try {
jobFile.close();
} catch (IOException e) {
}
}
}
}
private File getLogFilePath(String jobId) {
String logDir = System.getProperty("hadoop.log.dir");
if (logDir == null || logDir.length() == 0) {
logDir = "/opt/hadoop/logs/";
}
File logFile = new File(logDir + File.separator + jobId + "_conf.xml");
return logFile.exists() ? logFile : getHistoryFile(jobId);
}
%><%
response.setContentType("text/xml");
final String jobId = request.getParameter("jobid");
if (jobId == null) {
out.println("<h2>Missing 'jobid' for fetching job configuration!</h2>");
return;
}
printXML(out, jobId);
%>

　　这里有个要点，运行中和刚完成的job xml文件放到了"/opt/hadoop/logs"下，归档的job xml放到了“/opt/hadoop/logs/history/done/”，所以要判断第一个地方找不到文件，去第二个地方找。
　　功能很简单，但是很有用。
　　--heipark

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] 自己编写接口用于获取Hadoop Job conf 信息

扫码加入运维网微信交流群