* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
JDK是必须的,安装配置JDK,maven3.0并配置PATH变量
安装ProtocolBuffer
使用以下命令编译:
mvn clean install -DskipTests
cd hadoop-mapreduce-project
mvn clean install assembly:assembly -Pnative
----------------------------
[或者直接下载编译好的hadoop版本,以上步骤省略,直接从配置环境变量开始配置]
下载地址:
http://mirror.bjtu.edu.cn/apache/hadoop/common/hadoop-0.23.0/hadoop-0.23.0.tar.gz
下载后解压 tar -zxvf hadoop-0.23.0.tar.gz
----------------------------
配置环境变量(使用export)
$HADOOP_COMMON_HOME (指向common目录)
$HADOOP_MAPRED_HOME (指向mr目录)
$YARN_HOME(与HADOOP_MAPRED_HOME相同)
$HADOOP_HDFS_HOME (指向HDFS目录)
$YARN_HOME
$JAVA_HOME
$HADOOP_CONF_DIR (指向conf目录)
$YARN_CONF_DIR(与$HADOOP_CONF_DIR 相同)
配置/编写mapred-site.xml
mapreduce.cluster.temp.dir
No description
true
mapreduce.cluster.local.dir
No description
true
配置/编写yarn-site.xml
[其中的host换成你机器上hostname的输出值,port为端口号,自己定义,不能重复]
yarn.resourcemanager.resource-tracker.address
host:port
host is the hostname of the resource manager and
port is the port on which the NodeManagers contact the Resource Manager.
yarn.resourcemanager.scheduler.address
host:port
host is the hostname of the resourcemanager and port is the port
on which the Applications in the cluster talk to the Resource Manager.
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
In case you do not want to use the default scheduler
yarn.resourcemanager.address
host:port
the host is the hostname of the ResourceManager and the port is the port on
which the clients can talk to the Resource Manager.
yarn.nodemanager.local-dirs
the local directories used by the nodemanager
yarn.nodemanager.address
0.0.0.0:port
the nodemanagers bind to this port
yarn.nodemanager.resource.memory-mb
10240
the amount of memory on the NodeManager in GB
yarn.nodemanager.remote-app-log-dir
/app-logs
directory on hdfs where the application logs are moved to
yarn.nodemanager.log-dirs
the directories used by Nodemanagers as log directories
yarn.nodemanager.aux-services
mapreduce.shuffle
shuffle service that needs to be set for Map Reduce to run