Available Jobs |
Job tracker Host Name | Job tracker Start time | Job Id | Name | User |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0001 | inject crawl-url | hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0002 | crawldb crawl/dist/crawldb | hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0003 | generate: select from crawl/dist/crawldb | hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0004 | generate: partition crawl/dist/segments/2011110720 | hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0005 | fetch crawl/dist/segments/20111107205746 | hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0006 | crawldb crawl/dist/crawldb(update db actually)
| hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0007 | linkdb crawl/dist/linkdb | hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0008 | index-lucene crawl/dist/indexes | hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0009 | dedup 1: urls by time | hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0010 | dedup 2: content by hash | hadoop |
master | Mon Nov 07 20:50:54 CST 2011 | job_201111072050_0011 | dedup 3: delete from index(es)
| hadoop |
* the jobs above with same color is ONE step beong the crawl command;