Install and configurate Hadoop 1.1.1 on OS X

ningleesherry · 发表于 2016-12-9 07:38:31

As homework of the Hadoop workshop, I keep it as a note here.
　　Hadoop install steps:

$ sudo cp ~/Downloads/hadoop-1.1.1.tar.gz ~/dev/hadoop-1.1.1.tar.gz
$ sudo tar -xvzf hadoop-1.1.1.tar.gz
　　Env variable setup:

cat - >> ~/.zshrc
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
export HADOOP_HOME="/Users/gsun/dev/hadoop-1.1.1"
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
　　Press Ctrl-D to exit.
　　Also as an alternative, you can install Hadoop thru HomeBrew. Here is the instruction of how to install HomeBrew, it's really awesome. https://github.com/mxcl/homebrew/wiki/installation
　　First, view detailed information of Hadoop version in HomeBrew repository:

# gsun at MacBookPro in ~/prog/hadoop/hadoop-guide on git:master o [18:26:25]
$ brew info hadoop
hadoop: stable 1.1.2
http://hadoop.apache.org/
Not installed
From: https://github.com/mxcl/homebrew/commits/master/Library/Formula/hadoop.rb
==> Caveats
In Hadoop's config file:
/usr/local/Cellar/hadoop/1.1.2/libexec/conf/hadoop-env.sh
$JAVA_HOME has been set to be the output of:
/usr/libexec/java_home
　　Then install:

brew install hadoop
　　Hadoop env setup:
　　As you may already knew, we can configure and use Hadoop in three modes. These modes are:
　　1. Standalone mode
　　This mode is the default mode that you get when you’re downloading and extracting Hadoop for the first time. In this mode, Hadoop didn’t utilize HDFS to store input and output files. Hadoop just use local filesystem in its process. This mode is very useful for debugging your MapReduce code before you deploy it on large cluster and handle huge amounts of data. In this mode, the Hadoop’s configuration file triplet (mapred-site.xml, core-site.xml, hdfs-site.xml) still free from custom configuration.
　　2. Pseudo distributed mode (or single node cluster)
　　In this mode, we configure the configuration triplet to run on a single cluster. The replication factor of HDFS is one, because we only use one node as Master Node, Data Node, Job Tracker, and Task Tracker. We can use this mode to test our code in the real HDFS without the complexity of fully distributed cluster. I’ve already covered the configuration process on my previous post.
　　3. Fully distributed mode (or multiple node cluster)
　　In this mode, we use Hadoop at its full scale. We can use cluster consists of a thousand nodes working together. This is the production phase, where your code and data are used and distributed across many nodes. You use this mode when your code is ready and work properly on the previous mode.
　　So how could we switching among the three mode, here's trick. We will separate the Hadoop’s configuration directory (conf/) for each mode. Let’s assume that you just extracted your Hadoop distribution and haven’t made any changes on the configuration triplet. In the terminal, write these commands:

# gsun at MacBookPro in ~ [18:38:57]
$ cd $HADOOP_HOME
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:39:05]
$ cp -R conf conf.standalone
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:39:32]
$ cp -R conf conf.pseudo
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:40:00]
$ cp -R conf conf.distributed
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:40:22]
$ rm -R conf

　　Now if you want to switch to pseudo mode, do this:

# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:40:53]
$ ln -s conf.pseudo conf
　　Configurate hadoop as pseudo distribution, you have to edit Hadoop's configuration file triplet: (mapred-site-xml, core-site.xml and hdfs-site.xml).
　　1. mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
<property>
<name>mapred.child.env</name>
<value>JAVA_LIBRARY_PATH=/Users/gsun/dev/hadoop-1.1.1/lib/native</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
　　2. core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Volumes/MacintoshHD/Users/puff/prog/hadoop/hadoop-data</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>

　　3.hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

　　Now try hadoop in your terminal:

# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:50:24]
$ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format    format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode          run the DFS namenode
datanode          run a DFS datanode
dfsadmin          run a DFS admin client
mradmin             run a Map-Reduce admin client
fsck                run a DFS filesystem checking utility
fs                run a generic filesystem user client
balancer          run a cluster balancing utility
fetchdt             fetch a delegation token from the NameNode
jobtracker          run the MapReduce job Tracker node
pipes             run a Pipes job
tasktracker       run a MapReduce task Tracker node
historyserver       run job history servers as a standalone daemon
job                manipulate MapReduce jobs
queue             get information regarding JobQueues
version             print the version
jar <jar>          run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath          prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog          get/set the log level for each daemon
or
CLASSNAME          run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:50:26]
$ hadoop -version
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

　　Start all of hadoop components:

# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:51:59]
$ start-all.sh
starting namenode, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-namenode-MacBookPro.local.out
localhost: starting datanode, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-datanode-MacBookPro.local.out
localhost: 2013-07-02 18:52:05.346 java[2265:1b03] Unable to load realm info from SCDynamicStore
localhost: starting secondarynamenode, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-secondarynamenode-MacBookPro.local.out
starting jobtracker, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-jobtracker-MacBookPro.local.out
localhost: starting tasktracker, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-tasktracker-MacBookPro.local.out
　　Screenshot of Hadoop JobTracker activitity:

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] Install and configurate Hadoop 1.1.1 on OS X

浏览过的版块

扫码加入运维网微信交流群