设为首页 收藏本站
查看: 714|回复: 0

[经验分享] WIn7下用Idea远程操作Spark

[复制链接]

尚未签到

发表于 2019-1-30 11:56:52 | 显示全部楼层 |阅读模式
  WIn7下用Idea远程操作Spark
  

main.scala
org.apache.spark.SparkContext._
org.apache.spark.{SparkConfSparkContext}
SogouResult {
  (args:Array[]){
    (args.length==) {
      System..println()
      System.()
    }
    conf=SparkConf().setAppName().setMaster()
    sc=SparkContext(conf)
    rdd1=sc.textFile(args()).map(_.split()).filter(_.length==)
    rdd2=rdd1.map(x=>(x())).reduceByKey(_+_).map(x=>(x._2x._1)).sortByKey().map(x=>(x._2x._1))
    rdd2.saveAsTextFile(args())
    sc.stop

  }
}fs.defaultFShdfs://192.168.0.3:90004.0.0HdfsTestHdfsTest1.0-SNAPSHOT12         13             org.apache.hadoop14             hadoop-common15             2.6.416         17         18             org.apache.hadoop19             hadoop-mapreduce-client-jobclient20             2.6.421         22         23             commons-cli24             commons-cli25             1.226         27     28
    29     30         ${project.artifactId}31  运行参数如下:
  hdfs://192.168.0.3:9000/input/SogouQ1 hdfs://192.168.0.3:9000/output/sogou1
  

  出错:

  "C:\Program Files\Java\jdk1.7.0_79\bin\java" -Didea.launcher.port=7535 -Didea.launcher.bin.path=D:\Java\IntelliJ\bin -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.7.0_79\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\jce.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\jfxrt.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\resources.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\rt.jar;D:\scalasrc\HdfsTest\target\classes;D:\scalasrc\lib\datanucleus-core-3.2.10.jar;D:\scalasrc\lib\datanucleus-rdbms-3.2.9.jar;D:\scalasrc\lib\spark-1.5.0-yarn-shuffle.jar;D:\scalasrc\lib\datanucleus-api-jdo-3.2.6.jar;D:\scalasrc\lib\spark-assembly-1.5.0-hadoop2.6.0.jar;D:\scalasrc\lib\spark-examples-1.5.0-hadoop2.6.0.jar;D:\Java\scala210\lib\scala-actors-migration.jar;D:\Java\scala210\lib\scala-actors.jar;D:\Java\scala210\lib\scala-library.jar;D:\Java\scala210\lib\scala-reflect.jar;D:\Java\scala210\lib\scala-swing.jar;D:\Java\IntelliJ\lib\idea_rt.jar" com.intellij.rt.execution.application.AppMain main.scala.SogouResult hdfs://192.168.0.3:9000/input/SogouQ1 hdfs://192.168.0.3:9000/output/sogou1
  Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  SLF4J: Class path contains multiple SLF4J bindings.
  SLF4J: Found binding in [jar:file:/D:/scalasrc/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: Found binding in [jar:file:/D:/scalasrc/lib/spark-examples-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  16/09/16 12:00:43 INFO SparkContext: Running Spark version 1.5.0
  16/09/16 12:00:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  16/09/16 12:00:44 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
  java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
  at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
  at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
  at org.apache.hadoop.util.Shell.(Shell.java:363)
  at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
  at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
  at org.apache.hadoop.security.Groups.(Groups.java:86)
  at org.apache.hadoop.security.Groups.(Groups.java:66)
  at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
  at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
  at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
  at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
  at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
  at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
  at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084)
  at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2084)
  at org.apache.spark.SparkContext.(SparkContext.scala:310)
  at main.scala.SogouResult$.main(SogouResult.scala:16)
  at main.scala.SogouResult.main(SogouResult.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
  16/09/16 12:00:44 INFO SecurityManager: Changing view acls to: danger
  16/09/16 12:00:44 INFO SecurityManager: Changing modify acls to: danger
  16/09/16 12:00:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(danger); users with modify permissions: Set(danger)
  16/09/16 12:00:45 INFO Slf4jLogger: Slf4jLogger started
  16/09/16 12:00:45 INFO Remoting: Starting remoting
  16/09/16 12:00:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.0.2:55944]
  16/09/16 12:00:45 INFO Utils: Successfully started service 'sparkDriver' on port 55944.
  16/09/16 12:00:45 INFO SparkEnv: Registering MapOutputTracker
  16/09/16 12:00:45 INFO SparkEnv: Registering BlockManagerMaster
  16/09/16 12:00:45 INFO DiskBlockManager: Created local directory at C:\Users\danger\AppData\Local\Temp\blockmgr-281e23a9-a059-4670-a1b0-0511e63c55a3
  16/09/16 12:00:45 INFO MemoryStore: MemoryStore started with capacity 481.1 MB
  16/09/16 12:00:45 INFO HttpFileServer: HTTP File server directory is C:\Users\danger\AppData\Local\Temp\spark-84f74e01-9ea2-437c-b532-a5cfec898bc8\httpd-876c9027-ebb3-44c6-8256-bd4a555eaeaf
  16/09/16 12:00:45 INFO HttpServer: Starting HTTP Server
  16/09/16 12:00:46 INFO Utils: Successfully started service 'HTTP file server' on port 55945.
  16/09/16 12:00:46 INFO SparkEnv: Registering OutputCommitCoordinator
  16/09/16 12:00:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
  16/09/16 12:00:46 INFO SparkUI: Started SparkUI at http://192.168.0.2:4040
  16/09/16 12:00:46 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
  16/09/16 12:00:46 INFO Executor: Starting executor ID driver on host localhost
  16/09/16 12:00:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 55964.
  16/09/16 12:00:46 INFO NettyBlockTransferService: Server created on 55964
  16/09/16 12:00:46 INFO BlockManagerMaster: Trying to register BlockManager
  16/09/16 12:00:46 INFO BlockManagerMasterEndpoint: Registering block manager localhost:55964 with 481.1 MB RAM, BlockManagerId(driver, localhost, 55964)
  16/09/16 12:00:46 INFO BlockManagerMaster: Registered BlockManager
  16/09/16 12:00:47 INFO MemoryStore: ensureFreeSpace(157320) called with curMem=0, maxMem=504511856
  16/09/16 12:00:47 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 481.0 MB)
  16/09/16 12:00:47 INFO MemoryStore: ensureFreeSpace(14301) called with curMem=157320, maxMem=504511856
  16/09/16 12:00:47 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 14.0 KB, free 481.0 MB)
  16/09/16 12:00:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:55964 (size: 14.0 KB, free: 481.1 MB)
  16/09/16 12:00:47 INFO SparkContext: Created broadcast 0 from textFile at SogouResult.scala:18
  16/09/16 12:00:48 WARN : Your hostname, danger-PC resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:ac1b:2301%24, but we couldn't find any external IP address!
  Exception in thread "main" java.net.ConnectException: Call From danger-PC/192.168.0.2 to 192.168.0.3:9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
  at org.apache.hadoop.ipc.Client.call(Client.java:1472)
  at org.apache.hadoop.ipc.Client.call(Client.java:1399)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
  at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source)
  at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
  at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
  at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
  at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
  at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
  at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1644)
  at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:257)
  at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
  at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
  at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
  at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:290)
  at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:290)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
  at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:289)
  at main.scala.SogouResult$.main(SogouResult.scala:19)
  at main.scala.SogouResult.main(SogouResult.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
  Caused by: java.net.ConnectException: Connection refused: no further information
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
  at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
  at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
  at org.apache.hadoop.ipc.Client.call(Client.java:1438)
  ... 61 more
  16/09/16 12:00:50 INFO SparkContext: Invoking stop() from shutdown hook
  16/09/16 12:00:50 INFO SparkUI: Stopped Spark web UI at http://192.168.0.2:4040
  16/09/16 12:00:50 INFO DAGScheduler: Stopping DAGScheduler
  16/09/16 12:00:51 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
  16/09/16 12:00:51 INFO MemoryStore: MemoryStore cleared
  16/09/16 12:00:51 INFO BlockManager: BlockManager stopped
  16/09/16 12:00:51 INFO BlockManagerMaster: BlockManagerMaster stopped
  16/09/16 12:00:51 INFO SparkContext: Successfully stopped SparkContext
  16/09/16 12:00:51 INFO ShutdownHookManager: Shutdown hook called
  16/09/16 12:00:51 INFO ShutdownHookManager: Deleting directory C:\Users\danger\AppData\Local\Temp\spark-84f74e01-9ea2-437c-b532-a5cfec898bc8
  

  Process finished with exit code 1
  

  

  半天没搞定啊感觉应该是没有winutils.exe
  果断从网上找个,放在hadoop/bin下,
  

  执行,没有了,但依然无法连接
  

  telnet 192.168.0.3 9000,显示无法连接
  

  应该问题就在这
  

  在hadoop的配置文件中,配置的是主机名
  

  将所有的主机名改为IP地址
  "C:\Program Files\Java\jdk1.7.0_79\bin\java" -Didea.launcher.port=7536 -Didea.launcher.bin.path=D:\Java\IntelliJ\bin -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.7.0_79\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\jce.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\jfxrt.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\resources.jar;C:\Program Files\Java\jdk1.7.0_79\jre\lib\rt.jar;D:\scalasrc\HdfsTest\target\classes;D:\scalasrc\lib\datanucleus-core-3.2.10.jar;D:\scalasrc\lib\datanucleus-rdbms-3.2.9.jar;D:\scalasrc\lib\spark-1.5.0-yarn-shuffle.jar;D:\scalasrc\lib\datanucleus-api-jdo-3.2.6.jar;D:\scalasrc\lib\spark-assembly-1.5.0-hadoop2.6.0.jar;D:\scalasrc\lib\spark-examples-1.5.0-hadoop2.6.0.jar;D:\Java\scala210\lib\scala-actors-migration.jar;D:\Java\scala210\lib\scala-actors.jar;D:\Java\scala210\lib\scala-library.jar;D:\Java\scala210\lib\scala-reflect.jar;D:\Java\scala210\lib\scala-swing.jar;D:\Java\IntelliJ\lib\idea_rt.jar" com.intellij.rt.execution.application.AppMain main.scala.SogouResult hdfs://192.168.0.3:9000/input/SogouQ1 hdfs://192.168.0.3:9000/output/sogou1
  Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
  SLF4J: Class path contains multiple SLF4J bindings.
  SLF4J: Found binding in [jar:file:/D:/scalasrc/lib/spark-assembly-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: Found binding in [jar:file:/D:/scalasrc/lib/spark-examples-1.5.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  16/09/16 14:04:45 INFO SparkContext: Running Spark version 1.5.0
  16/09/16 14:04:46 INFO SecurityManager: Changing view acls to: danger
  16/09/16 14:04:46 INFO SecurityManager: Changing modify acls to: danger
  16/09/16 14:04:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(danger); users with modify permissions: Set(danger)
  16/09/16 14:04:47 INFO Slf4jLogger: Slf4jLogger started
  16/09/16 14:04:47 INFO Remoting: Starting remoting
  16/09/16 14:04:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.0.2:51172]
  16/09/16 14:04:47 INFO Utils: Successfully started service 'sparkDriver' on port 51172.
  16/09/16 14:04:47 INFO SparkEnv: Registering MapOutputTracker
  16/09/16 14:04:47 INFO SparkEnv: Registering BlockManagerMaster
  16/09/16 14:04:47 INFO DiskBlockManager: Created local directory at C:\Users\danger\AppData\Local\Temp\blockmgr-087e9166-2258-4f45-b449-d184c92702a3
  16/09/16 14:04:47 INFO MemoryStore: MemoryStore started with capacity 481.1 MB
  16/09/16 14:04:47 INFO HttpFileServer: HTTP File server directory is C:\Users\danger\AppData\Local\Temp\spark-0d6662f5-0bfa-4e6f-a256-c97bc6ce5f47\httpd-a2355600-9a68-417d-bd52-2ccdcac7bb13
  16/09/16 14:04:47 INFO HttpServer: Starting HTTP Server
  16/09/16 14:04:48 INFO Utils: Successfully started service 'HTTP file server' on port 51173.
  16/09/16 14:04:48 INFO SparkEnv: Registering OutputCommitCoordinator
  16/09/16 14:04:48 INFO Utils: Successfully started service 'SparkUI' on port 4040.
  16/09/16 14:04:48 INFO SparkUI: Started SparkUI at http://192.168.0.2:4040
  16/09/16 14:04:48 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
  16/09/16 14:04:48 INFO Executor: Starting executor ID driver on host localhost
  16/09/16 14:04:48 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51192.
  16/09/16 14:04:48 INFO NettyBlockTransferService: Server created on 51192
  16/09/16 14:04:48 INFO BlockManagerMaster: Trying to register BlockManager
  16/09/16 14:04:48 INFO BlockManagerMasterEndpoint: Registering block manager localhost:51192 with 481.1 MB RAM, BlockManagerId(driver, localhost, 51192)
  16/09/16 14:04:48 INFO BlockManagerMaster: Registered BlockManager
  16/09/16 14:04:49 INFO MemoryStore: ensureFreeSpace(157320) called with curMem=0, maxMem=504511856
  16/09/16 14:04:49 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 481.0 MB)
  16/09/16 14:04:49 INFO MemoryStore: ensureFreeSpace(14301) called with curMem=157320, maxMem=504511856
  16/09/16 14:04:49 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 14.0 KB, free 481.0 MB)
  16/09/16 14:04:49 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:51192 (size: 14.0 KB, free: 481.1 MB)
  16/09/16 14:04:49 INFO SparkContext: Created broadcast 0 from textFile at SogouResult.scala:18
  16/09/16 14:04:50 WARN : Your hostname, danger-PC resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:ac1b:2301%24, but we couldn't find any external IP address!
  16/09/16 14:04:52 INFO FileInputFormat: Total input paths to process : 1
  16/09/16 14:04:52 INFO SparkContext: Starting job: sortByKey at SogouResult.scala:19
  16/09/16 14:04:52 INFO DAGScheduler: Registering RDD 4 (map at SogouResult.scala:19)
  16/09/16 14:04:52 INFO DAGScheduler: Got job 0 (sortByKey at SogouResult.scala:19) with 2 output partitions
  16/09/16 14:04:52 INFO DAGScheduler: Final stage: ResultStage 1(sortByKey at SogouResult.scala:19)
  16/09/16 14:04:52 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
  16/09/16 14:04:52 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
  16/09/16 14:04:52 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[4] at map at SogouResult.scala:19), which has no missing parents
  16/09/16 14:04:52 INFO MemoryStore: ensureFreeSpace(4208) called with curMem=171621, maxMem=504511856
  16/09/16 14:04:52 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.1 KB, free 481.0 MB)
  16/09/16 14:04:52 INFO MemoryStore: ensureFreeSpace(2347) called with curMem=175829, maxMem=504511856
  16/09/16 14:04:52 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 481.0 MB)
  16/09/16 14:04:52 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:51192 (size: 2.3 KB, free: 481.1 MB)
  16/09/16 14:04:52 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861
  16/09/16 14:04:52 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[4] at map at SogouResult.scala:19)
  16/09/16 14:04:52 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
  16/09/16 14:04:52 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, ANY, 2135 bytes)
  16/09/16 14:04:53 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
  16/09/16 14:04:53 INFO HadoopRDD: Input split: hdfs://192.168.0.3:9000/input/SogouQ1:0+134217728
  16/09/16 14:04:53 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
  16/09/16 14:04:53 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
  16/09/16 14:04:53 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
  16/09/16 14:04:53 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
  16/09/16 14:04:53 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
  16/09/16 14:04:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
  java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
  at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
  at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
  at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:216)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
  at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:693)
  at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:749)
  at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
  at java.io.DataInputStream.read(DataInputStream.java:100)
  at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
  at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
  at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:206)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:244)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:216)
  at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
  at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:203)
  at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  16/09/16 14:04:54 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
  java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
  at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
  at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
  at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:216)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
  at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:693)
  at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:749)
  at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
  at java.io.DataInputStream.read(DataInputStream.java:100)
  at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
  at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
  at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:206)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:244)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:216)
  at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
  at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:203)
  at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  16/09/16 14:04:54 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 2135 bytes)
  16/09/16 14:04:54 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
  16/09/16 14:04:54 INFO SparkContext: Invoking stop() from shutdown hook
  16/09/16 14:04:54 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
  at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
  at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
  at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:216)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
  at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:693)
  at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:749)
  at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
  at java.io.DataInputStream.read(DataInputStream.java:100)
  at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
  at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
  at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:206)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:244)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:216)
  at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
  at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:203)
  at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  

  16/09/16 14:04:54 INFO HadoopRDD: Input split: hdfs://192.168.0.3:9000/input/SogouQ1:134217728+17788332
  16/09/16 14:04:54 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
  16/09/16 14:04:54 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
  java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
  at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
  at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
  at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:216)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
  at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:693)
  at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:749)
  at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
  at java.io.DataInputStream.read(DataInputStream.java:100)
  at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
  at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
  at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:134)
  at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239)
  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  16/09/16 14:04:54 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-1,5,main]
  java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
  at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
  at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
  at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:216)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
  at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:693)
  at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:749)
  at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
  at java.io.DataInputStream.read(DataInputStream.java:100)
  at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
  at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
  at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:134)
  at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239)
  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  16/09/16 14:04:54 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor localhost: java.lang.UnsatisfiedLinkError (org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V) [duplicate 1]
  16/09/16 14:04:54 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
  16/09/16 14:04:54 INFO TaskSchedulerImpl: Cancelling stage 0
  16/09/16 14:04:54 INFO SparkUI: Stopped Spark web UI at http://192.168.0.2:4040
  16/09/16 14:04:54 INFO DAGScheduler: ShuffleMapStage 0 (map at SogouResult.scala:19) failed in 1.350 s
  16/09/16 14:04:54 INFO DAGScheduler: Job 0 failed: sortByKey at SogouResult.scala:19, took 1.693803 s
  Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
  at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
  at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
  at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:216)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
  at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:693)
  at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:749)
  at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
  at java.io.DataInputStream.read(DataInputStream.java:100)
  at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
  at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
  at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:206)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:244)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:216)
  at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
  at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:203)
  at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  

  Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
  at scala.Option.foreach(Option.scala:236)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:905)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
  at org.apache.spark.rdd.RDD.collect(RDD.scala:904)
  at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:264)
  at org.apache.spark.RangePartitioner.(Partitioner.scala:126)
  at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
  at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
  at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
  at main.scala.SogouResult$.main(SogouResult.scala:19)
  at main.scala.SogouResult.main(SogouResult.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
  Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
  at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(Native Method)
  at org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:59)
  at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:301)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:216)
  at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
  at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:693)
  at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:749)
  at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:806)
  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
  at java.io.DataInputStream.read(DataInputStream.java:100)
  at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
  at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
  at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
  at org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:206)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:244)
  at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:216)
  at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
  at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
  at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:203)
  at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  at org.apache.spark.scheduler.Task.run(Task.scala:88)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  16/09/16 14:04:54 INFO DAGScheduler: Stopping DAGScheduler
  16/09/16 14:04:54 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
  16/09/16 14:04:54 INFO MemoryStore: MemoryStore cleared
  16/09/16 14:04:54 INFO BlockManager: BlockManager stopped
  16/09/16 14:04:54 INFO BlockManagerMaster: BlockManagerMaster stopped
  16/09/16 14:04:54 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
  16/09/16 14:04:54 INFO SparkContext: Successfully stopped SparkContext
  16/09/16 14:04:54 INFO ShutdownHookManager: Shutdown hook called
  16/09/16 14:04:54 INFO ShutdownHookManager: Deleting directory C:\Users\danger\AppData\Local\Temp\spark-0d6662f5-0bfa-4e6f-a256-c97bc6ce5f47
  16/09/16 14:04:54 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
  16/09/16 14:04:54 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
  

  Process finished with exit code 50
  

  

  呵呵,显示有进步
  16/09/16 14:04:54 INFO HadoopRDD: Input split: hdfs://192.168.0.3:9000/input/SogouQ1:134217728+17788332
  16/09/16 14:04:54 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
  

  继续跟错误如下:
  Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V
  

  百度这个问题:
  http://blog.csdn.net/glad_xiao/article/details/48825391
  

  说是官方的文档官方文档HADOOP-11064

看描述可以清楚这是Spark版本与Hadoop版本不适配导致的错误,遇到这种错误的一般是从Spark官网下载预编译好的二进制bin文件。
因此解决办法有两种:
1. 重新下载并配置Spark预编译好的对应的Hadoop版本
2. 从官网上下载Spark源码按照预装好的Hadoop版本进行编译(毕竟Spark的配置比Hadoop轻松不少)

  又看到一篇文章:
  http://www.cnblogs.com/marost/p/4372778.html
  里面又提到,hadoop2.6.4之后的跟之前的不兼容,果断在CSDN中下载
  http://download.csdn.net/detail/ylhlly/9485201
windows64位平台的hadoop2.6插件包(hadoop.dll,winutils.exe)
  替换,执行
  出错:
  

  Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=danger, access=WRITE, inode="/output":dyq:supergroup:drwxr-xr-x
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6545)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6527)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6479)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4290)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4260)
  

  继续百度,问题
  

  dyq@ubuntu:/opt/hadoop-2.6.4$ hadoop fs -chmod 777 /input
  dyq@ubuntu:/opt/hadoop-2.6.4$ hadoop fs -chmod 777 /output
  

  不行,继续
  

  http://www.cnblogs.com/fang-s/p/3777784.html
  

这个问题出现在本地使用eclipse向hdfs中写入文件时出现的权限问题
解决:在hdfs-site.xml加入如下代码
  

  dfs.permissions
  false
  
    If "true", enable permission checking in HDFS.
    If "false", permission checking is turned off,
    but all other behavior is unchanged.
    Switching from one parameter value to the other does not change the mode,
    owner or group of files or directories.    
  

  修改完后,重启,
  终于看到了
  Process finished with exit code 0
  http://192.168.0.3:50070/explorer.html#/output
  查看Hadoop的文件系统,也顺利出线了sogou2
  

  哈哈,花了一天时间搞定
  

  看来大数据的成本很高啊




运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-669594-1-1.html 上篇帖子: Spark Streaming 实现数据实时统计案例 下篇帖子: spark安装部署手册
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表