设为首页 收藏本站
查看: 682|回复: 0

[经验分享] Install and configurate Hadoop 1.1.1 on OS X

[复制链接]

尚未签到

发表于 2016-12-9 07:38:31 | 显示全部楼层 |阅读模式
  As homework of the Hadoop workshop, I keep it as a note here.
  Hadoop install steps:

$ sudo cp ~/Downloads/hadoop-1.1.1.tar.gz ~/dev/hadoop-1.1.1.tar.gz
$ sudo tar -xvzf hadoop-1.1.1.tar.gz
  Env variable setup:

cat - >> ~/.zshrc
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
export HADOOP_HOME="/Users/gsun/dev/hadoop-1.1.1"
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
  Press Ctrl-D to exit.
  Also as an alternative, you can install Hadoop thru HomeBrew. Here is the instruction of how to install HomeBrew, it's really awesome. https://github.com/mxcl/homebrew/wiki/installation
  First, view detailed information of Hadoop version in HomeBrew repository:

# gsun at MacBookPro in ~/prog/hadoop/hadoop-guide on git:master o [18:26:25]
$ brew info hadoop
hadoop: stable 1.1.2
http://hadoop.apache.org/
Not installed
From: https://github.com/mxcl/homebrew/commits/master/Library/Formula/hadoop.rb
==> Caveats
In Hadoop's config file:
/usr/local/Cellar/hadoop/1.1.2/libexec/conf/hadoop-env.sh
$JAVA_HOME has been set to be the output of:
/usr/libexec/java_home
  Then install:

brew install hadoop
  Hadoop env setup:
  As you may already knew, we can configure and use Hadoop in three modes. These modes are:
  1. Standalone mode
  This mode is the default mode that you get when you’re downloading and extracting Hadoop for the first time. In this mode, Hadoop didn’t utilize HDFS to store input and output files. Hadoop just use local filesystem in its process. This mode is very useful for debugging your MapReduce code before you deploy it on large cluster and handle huge amounts of data. In this mode, the Hadoop’s configuration file triplet (mapred-site.xml, core-site.xml, hdfs-site.xml) still free from custom configuration.
  2. Pseudo distributed mode (or single node cluster)
  In this mode, we configure the configuration triplet to run on a single cluster. The replication factor of HDFS is one, because we only use one node as Master Node, Data Node, Job Tracker, and Task Tracker. We can use this mode to test our code in the real HDFS without the complexity of fully distributed cluster. I’ve already covered the configuration process on my previous post.
  3. Fully distributed mode (or multiple node cluster)
  In this mode, we use Hadoop at its full scale. We can use cluster consists of a thousand nodes working together. This is the production phase, where your code and data are used and distributed across many nodes. You use this mode when your code is ready and work properly on the previous mode.
  So how could we switching among the three mode, here's trick. We will separate the Hadoop’s configuration directory (conf/) for each mode. Let’s assume that you just extracted your Hadoop distribution and haven’t made any changes on the configuration triplet. In the terminal, write these commands:

# gsun at MacBookPro in ~ [18:38:57]
$ cd $HADOOP_HOME
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:39:05]
$ cp -R conf conf.standalone
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:39:32]
$ cp -R conf conf.pseudo
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:40:00]
$ cp -R conf conf.distributed
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:40:22]
$ rm -R conf

  Now if you want to switch to pseudo mode, do this:

# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:40:53]
$ ln -s conf.pseudo conf
  Configurate hadoop as pseudo distribution, you have to edit Hadoop's configuration file triplet: (mapred-site-xml, core-site.xml and hdfs-site.xml).
  1. mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
<property>
<name>mapred.child.env</name>
<value>JAVA_LIBRARY_PATH=/Users/gsun/dev/hadoop-1.1.1/lib/native</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
  2. core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Volumes/MacintoshHD/Users/puff/prog/hadoop/hadoop-data</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>

  3.hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

  Now try hadoop in your terminal:

# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:50:24]
$ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format     format the DFS filesystem
secondarynamenode    run the DFS secondary namenode
namenode             run the DFS namenode
datanode             run a DFS datanode
dfsadmin             run a DFS admin client
mradmin              run a Map-Reduce admin client
fsck                 run a DFS filesystem checking utility
fs                   run a generic filesystem user client
balancer             run a cluster balancing utility
fetchdt              fetch a delegation token from the NameNode
jobtracker           run the MapReduce job Tracker node
pipes                run a Pipes job
tasktracker          run a MapReduce task Tracker node
historyserver        run job history servers as a standalone daemon
job                  manipulate MapReduce jobs
queue                get information regarding JobQueues
version              print the version
jar <jar>            run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath            prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog            get/set the log level for each daemon
or
CLASSNAME            run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:50:26]
$ hadoop -version
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

  Start all of hadoop components:

# gsun at MacBookPro in ~/dev/hadoop-1.1.1 [18:51:59]
$ start-all.sh
starting namenode, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-namenode-MacBookPro.local.out
localhost: starting datanode, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-datanode-MacBookPro.local.out
localhost: 2013-07-02 18:52:05.346 java[2265:1b03] Unable to load realm info from SCDynamicStore
localhost: starting secondarynamenode, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-secondarynamenode-MacBookPro.local.out
starting jobtracker, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-jobtracker-MacBookPro.local.out
localhost: starting tasktracker, logging to /Users/gsun/dev/hadoop-1.1.1/libexec/../logs/hadoop-gsun-tasktracker-MacBookPro.local.out
  Screenshot of Hadoop JobTracker activitity:
DSC0000.png
 

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-311588-1-1.html 上篇帖子: hadoop-eclipse-plugin-2.2.0编译 下篇帖子: 【Avro三】Hadoop MapReduce读写Avro文件
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表