4.Tachyon测试:
4.1首先,格式化tachyon缓冲层,然后启动所有节点,并把ramdisk进行mount操作:
cd /mnt/tachyon-0.4.1
./bin/tachyon format
./bin/tachyon-stop.sh
./bin/tachyon-start.sh all Mount
4.2等待tachyon的master和slave就绪,也可以访问http://192.168.1.1:19999确定所有节点启动。
while [ `netstat -ntlp | grep 19998` -eq `echo` ]
do
sleep 1
done
jps -l | sort -k 2
4.3 加载under file system到tachyon,让tachyon明白hadoop中已有的目录和其中的所有文件信息,如果不执行此命令,可能会遇到Unknown under file system scheme 错误java.lang.IllegalArgumentException.
./bin/tachyon loadufs tachyon://192.168.1.1:19998 hdfs://192.168.1.1:9000 /
4.4测试tachyon:
简单测试:
./bin/tachyon runTest Basic CACHE_THROUGH
全面测试:
./bin/tachyon runTests
4.5用tachyon层+hadoop测试wordcount,在hadoop安装目录运行以下行:
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar \
wordcount -libjars /root/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.jar \
tachyon://192.168.1.1:19998/in/file /out/file
4.6 关闭tachyon
./bin/tachyon-stop.sh
5.Spark测试:
5.1启动spark集群
cd /mnt/spark-0.9.1-bin-hadoop2
SPARK_MASTER_IP=192.168.1.1 ./sbin/start-all.sh
5.2检查启动状况,使用如下命令,或查看网页http://192.168.1.1:8080:
jps -l | sort -k 2
echo "please wait..."
while [ `netstat -ntlp | grep 7077` -eq `echo` ]
do
sleep 1
done
netstat -ntlp | grep 7077
5.3开启python spark 命令行:
cd /mnt/spark-0.9.1-bin-hadoop2
MASTER=spark://192.168.1.1:7077 ./bin/pyspark 5.4pi测试脚本(1000为number of samples采样个数):
from random import random
def sample(p):
x, y = random(), random()
return 1 if x*x + y*y < 1 else 0
count = sc.parallelize(xrange(0, 1000)).map(sample) \
.reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / 1000) 5.5 wordcount的hadoop版:
7.Hadoop配置
cd /mnt/hadoop-2.4.0
7.1配置hadoop运行环境:
vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=/mnt/jdk1.7.0_55
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/mnt/tachyon-0.4.1/target/tachyon-0.1-jar-with-dependencies.jar
7.2配置yarn-site
vi etc/hadoop/yarn-site.xml
7.3配置core-site
首先建立:/mnt/hadoop/tmp目录
vi etc/hadoop/core-site.xml
fs.defaultFS
hdfs://192.168.1.1:9000
hadoop.tmp.dir
/mnt/hadoop/tmp
fs.tachyon.impl
tachyon.hadoop.TFS
7.4配置hdfs-site
dfs.replication
1
dfs.permissions
false
dfs.namenode.rpc-address
192.168.1.1:9000
dfs.datanode.data.dir
file:/mnt/datanode
dfs.namenode.name.dir
file:/mnt/namenode
7.5配置mapred-site
vi etc/hadoop/mapred-site.xml
mapreduce.framework.name
yarn
7.6master配置为192.168.1.1,slave配置成5个节点的ip地址即可
8.Tachyon配置
cd /mnt/tachyon-0.4.1
8.1配置tachyon环境:
vi conf/tachyon-env.sh
if [[ `uname -a` == Darwin* ]]; then
# Assuming Mac OS X
export JAVA_HOME=$(/usr/libexec/java_home)
export TACHYON_RAM_FOLDER=/Volumes/ramdisk
export TACHYON_JAVA_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.k="
else
# Assuming Linux
if [ -z "$JAVA_HOME" ]; then
export JAVA_HOME=/mnt/jdk1.7.0_55
fi
export TACHYON_RAM_FOLDER=/mnt/ramdisk
fi
export JAVA="$JAVA_HOME/bin/java"
export TACHYON_MASTER_ADDRESS=192.168.1.1
#export TACHYON_UNDERFS_ADDRESS=/mnt/underfs
#export TACHYON_UNDERFS_ADDRESS=/mnt/underfs
export TACHYON_UNDERFS_ADDRESS=hdfs://192.168.1.1:9000
export TACHYON_WORKER_MEMORY_SIZE=1GB
export TACHYON_UNDERFS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem
CONF_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
9. Spark配置
cd /mnt/spark-0.9.1-bin-hadoop2/
9.1配置core-site
vi conf/core-site.xml
fs.tachyon.impl
tachyon.hadoop.TFS
9.2配置core-site
vi conf/spark-env.sh
JAVA_HOME=/mnt/jdk1.7.0_55
SPARK_MASTER_IP=192.168.1.1
SPARK_CLASSPATH=/mnt/tachyon-0.4.1/target/tachyon-0.4.1-jar-with-dependencies.r:$SPARK_CLASSPATH
export SPARK_CLASSPATH
9.3配置slaves为192.168.1.2~5
10.配置zookeeper:
cd /mnt/zookeeper-3.3.6
vi conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/mnt/zookeeper
# the port at which the clients will connect
clientPort=2181
#server.1=192.168.1.1:2888:3888
#server.2=192.168.1.2:2888:3888
12. reference:
1.Digilent zybo Ref Design
http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,1198&Prod=ZYBO
2.Oracle JDK7 for ARM
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-arm-downloads-2187468.html
3.What is hadoop:
http://hadoop.apache.org/
4.What is spark:
http://spark.apache.org/
5.Spark example code:
http://spark.apache.org/examples.html
6.What is hdfs:
http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
7.What is tachyon:
http://tachyon-project.org/
8.Tachyon github:
https://github.com/amplab/tachyon/releases
9.What is Zoo Keeper:
http://zookeeper.apache.org/