编辑 masters 文件
echo spark1 > masters 编辑 slaves 文件
spark1
spark2
spark3 安装好后,使用rsync 把相关目录及/etc/profile同步过去即可
启动hadoop dfs
./sbin/start-dfs.sh 初始化文件系统
hadoop namenode -format 启动 yarn
./sbin/start-yarn.sh 检查spark1相关进程
root@spark1:/usr/local/spark/conf# jps
1699 NameNode
8856 Jps
2023 SecondaryNameNode
2344 NodeManager
1828 DataNode
2212 ResourceManager spark2 spark3 也要类似下面的运程
root@spark2:/tmp# jps
3238 Jps
1507 DataNode
1645 NodeManager 可以打开web页面查看
http://192.168.100.25:50070 测试hadoop
hadoop fs -mkdir /testin
hadoop fs -put ~/str.txt /testin
cd /usr/local/hadoop
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /testin/str.txt testout 结果如下:
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /testin/str.txt testout
17/02/24 11:20:59 INFO client.RMProxy: Connecting to ResourceManager at spark1/192.168.100.25:8032
17/02/24 11:21:01 INFO input.FileInputFormat: Total input paths to process : 1
17/02/24 11:21:01 INFO mapreduce.JobSubmitter: number of splits:1
17/02/24 11:21:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1487839487040_0002
17/02/24 11:21:06 INFO impl.YarnClientImpl: Submitted application application_1487839487040_0002
17/02/24 11:21:06 INFO mapreduce.Job: The url to track the job: http://spark1:8088/proxy/application_1487839487040_0002/
17/02/24 11:21:06 INFO mapreduce.Job: Running job: job_1487839487040_0002
17/02/24 11:21:28 INFO mapreduce.Job: Job job_1487839487040_0002 running in uber mode : false
17/02/24 11:21:28 INFO mapreduce.Job: map 0% reduce 0%
17/02/24 11:22:00 INFO mapreduce.Job: map 100% reduce 0%
17/02/24 11:22:15 INFO mapreduce.Job: map 100% reduce 100%
17/02/24 11:22:17 INFO mapreduce.Job: Job job_1487839487040_0002 completed successfully
17/02/24 11:22:17 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=212115
FILE: Number of bytes written=661449
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=377966
HDFS: Number of bytes written=154893
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=23275
Total time spent by all reduces in occupied slots (ms)=11670
Total time spent by all map tasks (ms)=23275
Total time spent by all reduce tasks (ms)=11670
Total vcore-milliseconds taken by all map tasks=23275
Total vcore-milliseconds taken by all reduce tasks=11670
Total megabyte-milliseconds taken by all map tasks=23833600
Total megabyte-milliseconds taken by all reduce tasks=11950080
Map-Reduce Framework
Map input records=1635
Map output records=63958
Map output bytes=633105
Map output materialized bytes=212115
Input split bytes=98
Combine input records=63958
Combine output records=14478
Reduce input groups=14478
Reduce shuffle bytes=212115
Reduce input records=14478
Reduce output records=14478
Spilled Records=28956
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=429
CPU time spent (ms)=10770
Physical memory (bytes) snapshot=455565312
Virtual memory (bytes) snapshot=1391718400
Total committed heap usage (bytes)=277348352
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=377868
File Output Format Counters
Bytes Written=154893
4: 安装 spark
tar xvf spark-2.1.0-bin-hadoop2.7.tgz
mv spark-2.1.0-bin-hadoop2.7 /usr/local/spark 添加环境变量
cat >> /etc/profile <<EOF
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
EOFexport LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
#这一条不添加的话在运行 spark-shell 时会出现下面的错误
NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 编辑 spark-env.sh
SPARK_MASTER_HOST=spark1
HADOOP_CONF_DIR=/usr/locad/hadoop/etc/hadoop 编辑 slaves
spark1
spark2
spark3 启动 spark
./sbin/start-all.sh 此时在spark1上运行jps应该如下, 多了 Master 和 Worker
root@spark1:/usr/local/spark/conf# jps
1699 NameNode
8856 Jps
7774 Master
2023 SecondaryNameNode
7871 Worker
2344 NodeManager
1828 DataNode
2212 ResourceManager spark2 和 spark3 则多了 Worker
root@spark2:/tmp# jps
3238 Jps
1507 DataNode
1645 NodeManager
3123 Worker 可以打开web页面查看
http://192.168.100.25:8080/ 运行 spark-shell
root@spark1:/usr/local/spark/conf# spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/02/24 11:55:46 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0
17/02/24 11:56:17 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.100.25:4040
Spark context available as 'sc' (master = local
, app id = local-1487908553475).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :help 此时可以打开spark 查看
http://192.168.100.25:4040/environment/
spark 测试
run-example org.apache.spark.examples.SparkPi
17/02/28 11:17:20 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 3.491241 s
Pi is roughly 3.1373756868784346 至此完成.