Hadoop之TaskTraker分析

榕叶 · 发表于 2015-11-11 12:28:16

　　TaskTracker的工作职责之前已经和大家提过，主要负责维护，申请和监控Task，通过heartbeat和JobTracker进行通信。
　　TaskTracker的init过程：
　　1.读取配置文件，解析参数
　　2.将TaskTraker上原有的用户local files删除并新建新的dir和file
　　3. Map<TaskAttemptID, TaskInProgress> tasks = new HashMap<TaskAttemptID, TaskInProgress>(); 清除map
　　4. this.runningTasks = new LinkedHashMap<TaskAttemptID, TaskInProgress>();记录task的链表

         this.runningJobs = new TreeMap<JobID, RunningJob>();记录job的id信息
　　5.初始化JVMManager：  mapJvmManager = new JvmManagerForType(tracker.getMaxCurrentMapTasks(),
true, tracker);
reduceJvmManager = new JvmManagerForType(tracker.getMaxCurrentReduceTasks(),
false, tracker);    6.初始化RPC，获取JobTracker client用于heartbeat通信；
　　7.new一个后台线程用于监听map完成的事件


this.mapEventsFetcher = new MapEventsFetcherThread();
mapEventsFetcher.setDaemon(true);
mapEventsFetcher.setName(
"Map-events fetcher for all reduce tasks " + "on " +
taskTrackerName);
mapEventsFetcher.start();
后台线程的run方法如下：
　　 while (running) {
try {
List <FetchStatus> fList = null;
synchronized (runningJobs) {
while (((fList = reducesInShuffle()).size()) == 0) {
try {
runningJobs.wait();
} catch (InterruptedException e) {
LOG.info("Shutting down: " + this.getName());
return;
}
}
}
// now fetch all the map task events for all the reduce tasks
// possibly belonging to different jobs
boolean fetchAgain = false; //flag signifying whether we want to fetch
//immediately again.
for (FetchStatus f : fList) {
long currentTime = System.currentTimeMillis();
try {
//the method below will return true when we have not
//fetched all available events yet
if (f.fetchMapCompletionEvents(currentTime)) {
fetchAgain = true;
}
} catch (Exception e) {
LOG.warn(
"Ignoring exception that fetch for map completion" +
" events threw for " + f.jobId + " threw: " +
StringUtils.stringifyException(e));
}
if (!running) {
break;
}
}
synchronized (waitingOn) {
try {
if (!fetchAgain) {
waitingOn.wait(heartbeatInterval);
}
} catch (InterruptedException ie) {
LOG.info("Shutting down: " + this.getName());
return;
}
}
} catch (Exception e) {
LOG.info("Ignoring exception "  + e.getMessage());
}
}
} 8.initializeMemoryManagement，初始化每个TrackTask的内存设置
　　9.new一个Map和Reducer的Launcher后台线程
　　 mapLauncher = new TaskLauncher(TaskType.MAP, maxMapSlots);
reduceLauncher = new TaskLauncher(TaskType.REDUCE, maxReduceSlots);
mapLauncher.start();
reduceLauncher.start();用于后面创建子JVM来执行map、reduce task
　　看一下TaskLauncher的run方法：
//before preparing the job localize
   //all the archives
   TaskAttemptID taskid = t.getTaskID();
   final LocalDirAllocator lDirAlloc = new LocalDirAllocator("mapred.local.dir");
   //simply get the location of the workDir and pass it to the child. The
   //child will do the actual dir creation
   final File workDir =
   new File(new Path(localdirs[rand.nextInt(localdirs.length)],
      TaskTracker.getTaskWorkDir(t.getUser(), taskid.getJobID().toString(),
      taskid.toString(),
      t.isTaskCleanupTask())).toString());
   String user = tip.getUGI().getUserName();
   // Set up the child task's configuration. After this call, no localization
   // of files should happen in the TaskTracker's process space. Any changes to
   // the conf object after this will NOT be reflected to the child.
   // setupChildTaskConfiguration(lDirAlloc);
   if (!prepare()) {
      return;
   }
   // Accumulates class paths for child.
   List<String> classPaths = getClassPaths(conf, workDir,
                                          taskDistributedCacheManager);
   long logSize = TaskLog.getTaskLogLength(conf);
   //  Build exec child JVM args.
   Vector<String> vargs = getVMArgs(taskid, workDir, classPaths, logSize);
   tracker.addToMemoryManager(t.getTaskID(), t.isMapTask(), conf);
   // set memory limit using ulimit if feasible and necessary ...
   String setup = getVMSetupCmd();
   // Set up the redirection of the task's stdout and stderr streams
   File[] logFiles = prepareLogFiles(taskid, t.isTaskCleanupTask());
   File stdout = logFiles[0];
   File stderr = logFiles[1];
   tracker.getTaskTrackerInstrumentation().reportTaskLaunch(taskid, stdout,
               stderr);
   Map<String, String> env = new HashMap<String, String>();
   errorInfo = getVMEnvironment(errorInfo, user, workDir, conf, env, taskid,
                                 logSize);
   // flatten the env as a set of export commands
   List <String> setupCmds = new ArrayList<String>();
   for(Entry<String, String> entry : env.entrySet()) {
      StringBuffer sb = new StringBuffer();
      sb.append("export ");
      sb.append(entry.getKey());
      sb.append("=\"");
      sb.append(entry.getValue());
      sb.append("\"");
      setupCmds.add(sb.toString());
   }
   setupCmds.add(setup);
   launchJvmAndWait(setupCmds, vargs, stdout, stderr, logSize, workDir);
   tracker.getTaskTrackerInstrumentation().reportTaskEnd(t.getTaskID());
   if (exitCodeSet) {
      if (!killed && exitCode != 0) {
      if (exitCode == 65) {
         tracker.getTaskTrackerInstrumentation().taskFailedPing(t.getTaskID());
      }
      throw new IOException("Task process exit with nonzero status of " +
            exitCode + ".");
      }
   }
}
run方法为当前task new一个child JVM，为其设置文件路径，上下文环境，JVM启动参数和启动命令等信息，然后调用TaskControll方法启动新的JVM执行对应的Task工作。
　　各个类关系图如下所示：

　　最后以TaskController的launchTask截至

　　10.然后开始 startHealthMonitor(this.fConf);

　　

　　

　　再来看看TaskLauncher的run方法，就是不停的循环去获取TaskTracker中新的task，然后调用startNewTask方法
　　 if (this.taskStatus.getRunState() == TaskStatus.State.UNASSIGNED ||
this.taskStatus.getRunState() == TaskStatus.State.FAILED_UNCLEAN ||
this.taskStatus.getRunState() == TaskStatus.State.KILLED_UNCLEAN) {
localizeTask(task);
if (this.taskStatus.getRunState() == TaskStatus.State.UNASSIGNED) {
this.taskStatus.setRunState(TaskStatus.State.RUNNING);
}
setTaskRunner(task.createRunner(TaskTracker.this, this, rjob));
this.runner.start();
long now = System.currentTimeMillis();
this.taskStatus.setStartTime(now);
this.lastProgressReport = now;TaskTracker的run方法：通过维护心跳和JobTracker通信，以获取、杀掉新的Task，重点看一下heartBeat通信过程：
　　 synchronized (this) {
askForNewTask =
((status.countOccupiedMapSlots() < maxMapSlots ||
status.countOccupiedReduceSlots() < maxReduceSlots) &&
acceptNewTasks);
localMinSpaceStart = minSpaceStart;
}
if (askForNewTask) {
askForNewTask = enoughFreeSpace(localMinSpaceStart);
long freeDiskSpace = getFreeSpace();
long totVmem = getTotalVirtualMemoryOnTT();
long totPmem = getTotalPhysicalMemoryOnTT();
long availableVmem = getAvailableVirtualMemoryOnTT();
long availablePmem = getAvailablePhysicalMemoryOnTT();
long cumuCpuTime = getCumulativeCpuTimeOnTT();
long cpuFreq = getCpuFrequencyOnTT();
int numCpu = getNumProcessorsOnTT();
float cpuUsage = getCpuUsageOnTT();
status.getResourceStatus().setAvailableSpace(freeDiskSpace);
status.getResourceStatus().setTotalVirtualMemory(totVmem);
status.getResourceStatus().setTotalPhysicalMemory(totPmem);
status.getResourceStatus().setMapSlotMemorySizeOnTT(
mapSlotMemorySizeOnTT);
status.getResourceStatus().setReduceSlotMemorySizeOnTT(
reduceSlotSizeMemoryOnTT);
status.getResourceStatus().setAvailableVirtualMemory(availableVmem);
status.getResourceStatus().setAvailablePhysicalMemory(availablePmem);
status.getResourceStatus().setCumulativeCpuTime(cumuCpuTime);
status.getResourceStatus().setCpuFrequency(cpuFreq);
status.getResourceStatus().setNumProcessors(numCpu);
status.getResourceStatus().setCpuUsage(cpuUsage);
}
//add node health information
TaskTrackerHealthStatus healthStatus = status.getHealthStatus();
synchronized (this) {
if (healthChecker != null) {
healthChecker.setHealthStatus(healthStatus);
} else {
healthStatus.setNodeHealthy(true);
healthStatus.setLastReported(0L);
healthStatus.setHealthReport("");
}
}
//
// Xmit the heartbeat
//
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(status,
justStarted,
justInited,
askForNewTask,
heartbeatResponseId);

该方法主要将TaskTracker上的各种性能参数信息反馈给JobTraker，调用其heartbeat方法然后解析返回的结果，下篇详细分析heartBeat机制

版权声明：本文为博主原创文章，未经博主允许不得转载。

账号		自动登录	找回密码
密码			立即注册

wirelessnetview好用的无线分析工具

Red Hat RHCE 8 (EX294) Cert Guide

Shell从入门到精通（阿良）

亿图图示专家(EDraw Max) V7.9 中文破解版

zabbix3.4.1安装部署+微信推送信息+大屏显

Red Hat OpenShift I: Containers & Kubern

2025 年，C++ 还能“硬核”多久？

Hadoop之TaskTraker分析

浏览过的版块

扫码加入运维网微信交流群