至此表示hadoop2.7.2运行环境搭建完成。
3、结合Eclipse创建MR项目并使用本地系统进行hadoop本地模式开发
我在者使用Eclipse开发使用的是本地文件系统,没有使用HDFS,HDFS在完全分布式下介绍的多,在这就不用介绍,另外使用Eclipse开发并不是很多文章介绍一定要配置DFS Locations(这个不影响开发),这个是用来查看集群上的HDFS文件系统的(我目前是这样理解),反正我使用这个连接本地windows8.1上启动的hadoop(本地模式),一直没练成功过,报下面的错误:
java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:635)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at org.apache.hadoop.eclipse.server.HadoopServer.getDFS(HadoopServer.java:478)
at org.apache.hadoop.eclipse.dfs.DFSPath.getDFS(DFSPath.java:146)
at org.apache.hadoop.eclipse.dfs.DFSFolder.loadDFSFolderChildren(DFSFolder.java:61)
at org.apache.hadoop.eclipse.dfs.DFSFolder$1.run(DFSFolder.java:178)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
这个问题目前已解决,是因为缺少相应的插件jar包;需要将下面3个插件放入到$eclipse_home\plugins\目录下。
上图中红圈标注的是重点,配置的是wordcount的输入输出路径,因为本地模式我使用的是本地文件系统而不是HDFS,所以该地方是使用的file:///而不是hdfs://(需要特别注意)。
然后点击Run按钮,hadoop就可运行了。
当出现下面情况,则表示运行成功:
16/09/15 22:18:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/09/15 22:18:39 WARN mapreduce.JobResourceUploader: No job jar file set. User> 16/09/15 22:18:39 INFO input.FileInputFormat: Total input paths to process : 2
16/09/15 22:18:40 INFO mapreduce.JobSubmitter: number of splits:2
16/09/15 22:18:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1473949101198_0001
16/09/15 22:18:41 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16/09/15 22:18:41 INFO impl.YarnClientImpl: Submitted application application_1473949101198_0001
16/09/15 22:18:41 INFO mapreduce.Job: The url to track the job: http://Lenovo-PC:8088/proxy/application_1473949101198_0001/
16/09/15 22:18:41 INFO mapreduce.Job: Running job: job_1473949101198_0001
16/09/15 22:18:53 INFO mapreduce.Job: Job job_1473949101198_0001 running in uber mode : false
16/09/15 22:18:53 INFO mapreduce.Job: map 0% reduce 0%
16/09/15 22:19:03 INFO mapreduce.Job: map 100% reduce 0%
16/09/15 22:19:10 INFO mapreduce.Job: map 100% reduce 100%
16/09/15 22:19:11 INFO mapreduce.Job: Job job_1473949101198_0001 completed successfully
16/09/15 22:19:12 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=119
FILE: Number of bytes written=359444
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=194
HDFS: Number of bytes written=0
HDFS: Number of read operations=2
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Killed map tasks=1
Launched map tasks=2
Launched reduce tasks=1
Rack-local map tasks=2
Total time spent by all maps in occupied slots (ms)=12156
Total time spent by all reduces in occupied slots (ms)=4734
Total time spent by all map tasks (ms)=12156
Total time spent by all reduce tasks (ms)=4734
Total vcore-milliseconds taken by all map tasks=12156
Total vcore-milliseconds taken by all reduce tasks=4734
Total megabyte-milliseconds taken by all map tasks=12447744
Total megabyte-milliseconds taken by all reduce tasks=4847616
Map-Reduce Framework
Map input records=2
Map output records=8
Map output bytes=78
Map output materialized bytes=81
Input split bytes=194
Combine input records=8
Combine output records=6
Reduce input groups=4
Reduce shuffle bytes=81
Reduce input records=6
Reduce output records=4
Spilled Records=12
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=187
CPU time spent (ms)=1733
Physical memory (bytes) snapshot=630702080
Virtual memory (bytes) snapshot=834060288
Total committed heap usage (bytes)=484966400
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=44
File Output Format Counters
Bytes Written=43
然后在输出路径(运行中配置的输出路径)中查看运行结果:
运行当中可能出现如下问题:
1)、问题1:
16/09/15 22:12:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Exception in thread "main" java.net.ConnectException: Call From Lenovo-PC/192.168.1.105 to 0.0.0.0:9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:144)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:87)
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 27 more
出现上述问题是由于项目中的core-site.xml中和本地安装的hadoop配置文件core-site.xml中的端口不一致,请修改成一致。
2)、问题2:
16/09/15 22:14:45 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/09/15 22:14:48 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/09/15 22:14:50 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/09/15 22:14:52 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
16/09/15 22:14:54 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
如果出现上述问题表示yarn没有启动,请启动yarn。
3)、问题3:
16/09/15 22:16:00 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/09/15 22:16:02 WARN mapreduce.JobResourceUploader: No job jar file set. User> 16/09/15 22:16:02 INFO input.FileInputFormat: Total input paths to process : 2
16/09/15 22:16:03 INFO mapreduce.JobSubmitter: number of splits:2
16/09/15 22:16:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1473948945298_0001
16/09/15 22:16:04 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16/09/15 22:16:04 INFO impl.YarnClientImpl: Submitted application application_1473948945298_0001
16/09/15 22:16:04 INFO mapreduce.Job: The url to track the job: http://Lenovo-PC:8088/proxy/application_1473948945298_0001/
16/09/15 22:16:04 INFO mapreduce.Job: Running job: job_1473948945298_0001
16/09/15 22:16:08 INFO mapreduce.Job: Job job_1473948945298_0001 running in uber mode : false
16/09/15 22:16:08 INFO mapreduce.Job: map 0% reduce 0%
16/09/15 22:16:08 INFO mapreduce.Job: Job job_1473948945298_0001 failed with state FAILED due to: Application application_1473948945298_0001 failed 2 times due to AM Container for appattempt_1473948945298_0001_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://Lenovo-PC:8088/cluster/app/application_1473948945298_0001Then, click on links to logs of each attempt.
Diagnostics: Could not find any valid local directory for nmPrivate/container_1473948945298_0001_02_000001.tokens
Failing this attempt. Failing the application.
16/09/15 22:16:08 INFO mapreduce.Job: Counters: 0
如果出现上述问题,表示你没有使用管理员权限启动hadoop,请使用管理员权限启动hadoop。