设为首页 收藏本站
查看: 635|回复: 0

[经验分享] hadoop grep问题

[复制链接]

尚未签到

发表于 2018-11-1 09:16:31 | 显示全部楼层 |阅读模式
  今天应业务方要求,找一个指定URL在HDFS原始日志中的记录条数,为了方便, 就直接使用hadoop-examples-*.jar包中的 grep 作业。
  
    提交作业
  


  • [root@localhost yinjie]>hadoop jar $HADOOP_HOME/hadoop-examples-*.jar grep -Dmapred.job.queue.name=cp_normal_job_queue /group/*****/2011-08-12/00 /group/*****/grep/2011-08-12/00 'www.****.cn'
  • 11/08/31 17:12:39 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140330 for yinjie
  • 11/08/31 17:12:39 INFO security.TokenCache: Got dt for hdfs://*****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24681;uri=****:8020;t.service=****:8020
  • 11/08/31 17:12:39 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
  • 11/08/31 17:12:39 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 2ad6654f3e9cad97d13f716e51a0509253c0aabb]
  • 11/08/31 17:12:39 INFO mapred.FileInputFormat: Total input paths to process : 22
  • 11/08/31 17:12:40 INFO mapred.JobClient: Running job: job_201108241351_24681
  • 11/08/31 17:12:41 INFO mapred.JobClient:  map 0% reduce 0%
  • 11/08/31 17:12:50 INFO mapred.JobClient:  map 4% reduce 0%
  • 11/08/31 17:12:51 INFO mapred.JobClient:  map 52% reduce 0%
  • 11/08/31 17:12:52 INFO mapred.JobClient:  map 60% reduce 0%
  • 11/08/31 17:12:53 INFO mapred.JobClient:  map 69% reduce 0%
  • 11/08/31 17:12:54 INFO mapred.JobClient:  map 79% reduce 0%
  • 11/08/31 17:12:55 INFO mapred.JobClient:  map 84% reduce 0%
  • 11/08/31 17:12:56 INFO mapred.JobClient:  map 90% reduce 0%
  • 11/08/31 17:12:57 INFO mapred.JobClient:  map 93% reduce 0%
  • 11/08/31 17:12:58 INFO mapred.JobClient:  map 95% reduce 27%
  • 11/08/31 17:12:59 INFO mapred.JobClient:  map 97% reduce 27%
  • 11/08/31 17:13:01 INFO mapred.JobClient:  map 98% reduce 27%
  • 11/08/31 17:13:05 INFO mapred.JobClient:  map 99% reduce 27%
  • 11/08/31 17:13:07 INFO mapred.JobClient:  map 99% reduce 32%
  • 11/08/31 17:13:09 INFO mapred.JobClient:  map 100% reduce 32%
  • 11/08/31 17:13:14 INFO mapred.JobClient:  map 100% reduce 100%
  • 11/08/31 17:13:15 INFO mapred.JobClient: Job complete: job_201108241351_24681
  • 11/08/31 17:13:15 INFO mapred.JobClient: Counters: 24
  • 11/08/31 17:13:15 INFO mapred.JobClient:   Job Counters
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Launched reduce tasks=1
  • 11/08/31 17:13:15 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1542961
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Rack-local map tasks=44
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Launched map tasks=242
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Data-local map tasks=198
  • 11/08/31 17:13:15 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=23291
  • 11/08/31 17:13:15 INFO mapred.JobClient:   FileSystemCounters
  • 11/08/31 17:13:15 INFO mapred.JobClient:     FILE_BYTES_READ=3724
  • 11/08/31 17:13:15 INFO mapred.JobClient:     HDFS_BYTES_READ=32281139322
  • 11/08/31 17:13:15 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=14502646
  • 11/08/31 17:13:15 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=118
  • 11/08/31 17:13:15 INFO mapred.JobClient:   Map-Reduce Framework
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Reduce input groups=1
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Combine output records=143
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Map input records=37526374
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Reduce shuffle bytes=5164
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Reduce output records=1
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Spilled Records=286
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Map output bytes=786984
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Map input bytes=32280203347
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Combine input records=32791
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Map output records=32791
  • 11/08/31 17:13:15 INFO mapred.JobClient:     SPLIT_RAW_BYTES=38731
  • 11/08/31 17:13:15 INFO mapred.JobClient:     Reduce input records=143
  • 11/08/31 17:13:15 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
  • 11/08/31 17:13:15 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140331 for yinjie
  • 11/08/31 17:13:15 INFO security.TokenCache: Got dt for hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24682;uri=****:8020;t.service=****:8020
  • 11/08/31 17:13:15 INFO mapred.FileInputFormat: Total input paths to process : 1
  • 11/08/31 17:13:15 INFO mapred.JobClient: Cleaning up the staging area hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24682
  • org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
  •         at org.apache.hadoop.mapred.QueueManager.getQueueACL(QueueManager.java:382)
  •         at org.apache.hadoop.mapred.JobTracker.getQueueAdmins(JobTracker.java:4422)
  •         at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
  •         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  •         at java.lang.reflect.Method.invoke(Method.java:597)
  •         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
  •         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
  •         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
  •         at java.security.AccessController.doPrivileged(Native Method)
  •         at javax.security.auth.Subject.doAs(Subject.java:396)
  •         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
  •         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)

  •         at org.apache.hadoop.ipc.Client.call(Client.java:1107)
  •         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
  •         at org.apache.hadoop.mapred.$Proxy6.getQueueAdmins(Unknown Source)
  •         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:886)
  •         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
  •         at java.security.AccessController.doPrivileged(Native Method)
  •         at javax.security.auth.Subject.doAs(Subject.java:396)
  •         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
  •         at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
  •         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
  •         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1242)
  •         at org.apache.hadoop.examples.Grep.run(Grep.java:84)
  •         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  •         at org.apache.hadoop.examples.Grep.main(Grep.java:93)
  •         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  •         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  •         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  •         at java.lang.reflect.Method.invoke(Method.java:597)
  •         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
  •         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
  •         at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
  •         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  •         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  •         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  •         at java.lang.reflect.Method.invoke(Method.java:597)
  •         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
  • [root@localhost yinjie]>
  

  发现有错, 比较奇怪, 第一个job成功执行, 第二个却失败了, 从异常来看应该是访问控制权限问题。提交的作业中指定了
  
-Dmapred.job.queue.name=cp_normal_job_queue 参数, 怀疑是不是第一个作业执行时带上该参数, 但后面一个作业没有带上,导致失败
  
只好先查看下$HADOOP_HOME下的conf配置:
  


  • [root@localhost yinjie]>cat $HADOOP_HOME/conf/mapred-site.xml




  •   mapred.job.queue.name
  •   cp_admin_job_queue
  •    Queue to which a job is submitted. This must match one of the
  •     queues defined in mapred.queue.names for the system. Also, the ACL setup
  •     for the queue must allow the current user to submit a job to the queue.
  •     Before specifying a queue, ensure that the system is configured with
  •     the queue, and access is allowed for submitting jobs to the queue.
  •   

  • ....
  • ....
  • ....

  

  发现mapred.job.queue.name配置值是cp_admin_job_queue而不是提交作业时指定的cp_normal_job_queue, 会不会是第二个作业使用了cp_admin_job_queue值而导致失败。
  
抱着试试的心态,把$HADOOP_HOME/conf配置文件拷贝一份到当前用户目录下
  


  • [root@localhost yinjie]>cp -rf $HADOOP_HOME/conf ./
  • ....
  • [root@localhost yinjie/conf]>ls
  • allslaves               configuration.xsl  fair-scheduler.xml  hadoop-metrics.properties  hdfs-site.xml     mapred-queue-acls.xml  masters  ssl-client.xml.example
  • capacity-scheduler.xml  core-site.xml      hadoop-env.sh       hadoop-policy.xml          log4j.properties  mapred-site.xml        slaves   ssl-server.xml.example
  • [root@localhost yinjie/conf]>
  • [root@localhost yinjie/conf]>vi mapred-site.xml
  

  编辑mapred-site.xml, 把mapred.job.queue.name修改成cp_normal_job_queue 后保存
  
再一次提交作业,使用 --config 参数指定修改后的配置目录
  


  • [root@localhost yinjie]>hadoop --config /home/yinjie/conf jar $HADOOP_HOME/hadoop-examples-*.jar grep -Dmapred.job.queue.name=cp_normal_job_queue /group/*****/2011-08-12/01 /group/*****/grep/2011-08-12/01 'www.****.cn'
  • 11/08/31 17:25:19 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140356 for yinjie
  • 11/08/31 17:25:19 INFO security.TokenCache: Got dt for hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24719;uri=****:8020;t.service=****:8020
  • 11/08/31 17:25:19 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
  • 11/08/31 17:25:19 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 2ad6654f3e9cad97d13f716e51a0509253c0aabb]
  • 11/08/31 17:25:19 INFO mapred.FileInputFormat: Total input paths to process : 22
  • 11/08/31 17:25:19 INFO mapred.JobClient: Running job: job_201108241351_24719
  • 11/08/31 17:25:20 INFO mapred.JobClient:  map 0% reduce 0%
  • 11/08/31 17:25:30 INFO mapred.JobClient:  map 4% reduce 0%
  • 11/08/31 17:25:31 INFO mapred.JobClient:  map 14% reduce 0%
  • 11/08/31 17:25:32 INFO mapred.JobClient:  map 51% reduce 0%
  • 11/08/31 17:25:33 INFO mapred.JobClient:  map 63% reduce 0%
  • 11/08/31 17:25:34 INFO mapred.JobClient:  map 68% reduce 0%
  • 11/08/31 17:25:35 INFO mapred.JobClient:  map 77% reduce 0%
  • 11/08/31 17:25:36 INFO mapred.JobClient:  map 87% reduce 0%
  • 11/08/31 17:25:37 INFO mapred.JobClient:  map 93% reduce 0%
  • 11/08/31 17:25:38 INFO mapred.JobClient:  map 96% reduce 0%
  • 11/08/31 17:25:39 INFO mapred.JobClient:  map 97% reduce 0%
  • 11/08/31 17:25:40 INFO mapred.JobClient:  map 98% reduce 0%
  • 11/08/31 17:25:42 INFO mapred.JobClient:  map 99% reduce 31%
  • 11/08/31 17:25:48 INFO mapred.JobClient:  map 100% reduce 31%
  • 11/08/31 17:25:51 INFO mapred.JobClient:  map 100% reduce 33%
  • 11/08/31 17:25:53 INFO mapred.JobClient:  map 100% reduce 100%
  • 11/08/31 17:25:53 INFO mapred.JobClient: Job complete: job_201108241351_24719
  • 11/08/31 17:25:53 INFO mapred.JobClient: Counters: 24
  • 11/08/31 17:25:53 INFO mapred.JobClient:   Job Counters
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Launched reduce tasks=1
  • 11/08/31 17:25:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1025313
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Rack-local map tasks=26
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Launched map tasks=176
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Data-local map tasks=150
  • 11/08/31 17:25:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=18297
  • 11/08/31 17:25:53 INFO mapred.JobClient:   FileSystemCounters
  • 11/08/31 17:25:53 INFO mapred.JobClient:     FILE_BYTES_READ=2580
  • 11/08/31 17:25:53 INFO mapred.JobClient:     HDFS_BYTES_READ=22352133231
  • 11/08/31 17:25:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=10563326
  • 11/08/31 17:25:53 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=118
  • 11/08/31 17:25:53 INFO mapred.JobClient:   Map-Reduce Framework
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Reduce input groups=1
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Combine output records=99
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Map input records=26525927
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Reduce shuffle bytes=3624
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Reduce output records=1
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Spilled Records=198
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Map output bytes=515064
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Map input bytes=22351478236
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Combine input records=21461
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Map output records=21461
  • 11/08/31 17:25:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=28153
  • 11/08/31 17:25:53 INFO mapred.JobClient:     Reduce input records=99
  • 11/08/31 17:25:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
  • 11/08/31 17:25:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140359 for yinjie
  • 11/08/31 17:25:53 INFO security.TokenCache: Got dt for hdfs://****.com/home/hdfs/cluster-data/tmp/mapred/staging/yinjie/.staging/job_201108241351_24723;uri=****:8020;t.service=****:8020
  • 11/08/31 17:25:53 INFO mapred.FileInputFormat: Total input paths to process : 1
  • 11/08/31 17:25:53 INFO mapred.JobClient: Running job: job_201108241351_24723
  • 11/08/31 17:25:54 INFO mapred.JobClient:  map 0% reduce 0%
  • 11/08/31 17:26:01 INFO mapred.JobClient:  map 100% reduce 0%
  • 11/08/31 17:26:13 INFO mapred.JobClient:  map 100% reduce 100%
  • 11/08/31 17:26:13 INFO mapred.JobClient: Job complete: job_201108241351_24723
  • 11/08/31 17:26:13 INFO mapred.JobClient: Counters: 23
  • 11/08/31 17:26:13 INFO mapred.JobClient:   Job Counters
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Launched reduce tasks=1
  • 11/08/31 17:26:13 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=3225
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Launched map tasks=1
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Data-local map tasks=1
  • 11/08/31 17:26:13 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8191
  • 11/08/31 17:26:13 INFO mapred.JobClient:   FileSystemCounters
  • 11/08/31 17:26:13 INFO mapred.JobClient:     FILE_BYTES_READ=32
  • 11/08/31 17:26:13 INFO mapred.JobClient:     HDFS_BYTES_READ=248
  • 11/08/31 17:26:13 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=117216
  • 11/08/31 17:26:13 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=22
  • 11/08/31 17:26:13 INFO mapred.JobClient:   Map-Reduce Framework
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Reduce input groups=1
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Combine output records=0
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Map input records=1
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Reduce shuffle bytes=0
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Reduce output records=1
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Spilled Records=2
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Map output bytes=24
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Map input bytes=32
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Combine input records=0
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Map output records=1
  • 11/08/31 17:26:13 INFO mapred.JobClient:     SPLIT_RAW_BYTES=130
  • 11/08/31 17:26:13 INFO mapred.JobClient:     Reduce input records=1
  

  OK, 作业成功了!



运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-629200-1-1.html 上篇帖子: hadoop 之 FAILED Too many fetch-failures 错误 下篇帖子: windows和cygwin下hadoop安装配置
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表