设为首页 收藏本站
查看: 635|回复: 0

[经验分享] How To Set up Hadoop on OS X Lion 10.7(转)

[复制链接]

尚未签到

发表于 2015-7-14 07:16:28 | 显示全部楼层 |阅读模式
  Chances are good if you are a just starting out software engineer knowing MapReduce inside and out is as important now as knowing how to configure a LAMP stack was in the last decade. Therefore most developers will want to have a local instance to learn and experiment without having to go down the route of virtualization.
  Although there are a lot of competing MapReduce implementations out there, Apache Hadoop is the leader, with most PaaS vendors such as Amazon and Microsoft supporting it.
  Setting up Apache Hadoop on Mac OS X follows the similar pattern to the official Hadoop single node documentation on the Apache side, but there are some bugs and custom configuration for OS X Lion that could trip you up, so this post should help you get started. Here is a quick tutorial (with some gotcha configuration changes you need to make until some bugs are fixed by Apache) to get you started. If you have any updates or suggestions please drop me a line and I’ll update.

Getting Java
  Mac OS X no longer provides Java out of the box, but forcing it is fairly easy.

Option 1: From UNIX Command Line
  Just check your Java version on a command line, which will prompt OS X to ask if you’d like to install Java.

$ java -version
  

Option 2: Get it from Apple website
  You can also download it directly from Apple by visiting here: http://support.apple.com/kb/dl1421

Getting Hadoop

Setting up your environment
  Some people like putting Hadoop under ~/Library/Hadoop. That’s fine, but I am use to the /usr/local/ of *nix world so I’ll use that for $HADOOP_HOME. You can make changes as appropriate.
  Edit your .bash_profile and insert the following:

export HADOOP_HOME=/usr/local/hadoop export JAVA_HOME=$(/usr/libexec/java_home) export PATH=$PATH:$HADOOP_HOME/bin
  Note that I have specified JAVA_HOME to point to a command which will dynamically find the correct Java in your OS X environment. This can be done both in your bash_profile and in hadoop-env.sh in your configuration. I recommend this to make sure any changes Apple (or perhaps Oracle once Apple gets out of the business of providing Java all together) makes in various updates does not break your Java configuration.

Download Hadoop from command line

$ cd /usr/local/ $ mkdir hadoop $ wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz $ tar xzvf hadoop-0.20.2-cdh3u1.tar.gz $ mv hadoop-0.20.2-cdh3u1 ./hadoop
Configuring Hadoop for OS X (and fixing some bugs)
  Once installed, there will be three configuration files you’ll want to edit. Learning what these files do in general is left up to the reader, but this will get you up to speed quick.
  We will set up the following single node configuration:


  • sets the default file system as an HDFS instance
  • sets the path on the local filesystem that the Hadoop daemons will use for persistence to something accessible by you
  • sets hdfs configuration so that HDFS will only try to store one copy of each file
  • sets map reduce properties to define the number of map and reduce slots that will be available on your box (you can play with these depending on your system resources)

Configuring: hadoop-env.sh
  In your command window, load the environment configuration file. You won’t want to change much here, but some things will help you ensure you run right the first time and every time. I recommend making these changes.

vi /usr/local/hadoop/config/hadoop-env.sh
  Uncomment #JAVA_HOME and specify the command path to dynamically load your Java location as discussed above:

# The java implementation to use. Required. export JAVA_HOME=$(/usr/libexec/java_home)
  Next, uncomment HADOOP_HEAPSIZE and make it 2000. This is optional but recommended.:

# The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE=2000
IMPORTANT: Fix Configuration Files To Get Around Lion Specific Problems
  OS X Lion introduced a bug that many people experience when first initializing their name node storage. It typically appears as this error:
  “Unable to load realm info from SCDynamicStore”
  This error is currently being tracked in Apache HADOOP-7489 bug. Readers may want to check if this is fixed before applying the below fix.
  To fix this issue, simply add the following to your hadoop-env.sh file:

export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
  To sum up, your hadoop-env.sh should have the following defined:

export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk" export JAVA_HOME=$(/usr/libexec/java_home) export HADOOP_HEAPSIZE=2000
  With that file ready to go, let’s move on to configuring your hdfs and map reduce XML files.

Configuring: core-site.xml
  A change from previous versions of Apache Hadoop is that instead of putting all the configuration for your hadoop instance in to one XML file (hadoop-site.xml), you now have three configuration files you need to edit. This separation of concern is a good decision, but causes some extra work for us. First up is the core-site.xml file.
  As stated, you need to pick a good place to run the local instance of your single node hdfs storage and setup the location for running the master HDFS instance. I also chose to dynamically inject the username in the temp directory in order to keep track of what account is writing to the HDFS store. This is good practice if you plan on running a local service account (or a few) to test different scenarios. It’s not necessary though. Keep in mind that whatever tmp directory you point to, whatever service account you are using (or your own account) will need write access to the directory.
  Your file should look something like this:

  hadoop.tmp.dir /usr/local/tmp/hadoop/hadoop-${user.name} A base for other temporary directories.   fs.default.name hdfs://localhost:8020  
Configuring: hdfs-site.xml
  Now that we’ve accomplished that, we need to setup some configuration for hdfs itself. the hdfs-site.xml is used to configure HDFS itself. Since we are running a single node cluster on our Mac, we will want to specify for HDFS to only store one copy of the file:

  dfs.replication 1  
Configuring: mapred-site.xml
  Next we need to do some custom configuration on the map reduce engine itself. We specify the job tracker location (usually just your HDFS port + 1, but you can use any open port) and also set the maximum map and reduce jobs that can be spawned. You can configure these depending on the size / speed of your system. I specified 2 here.

  mapred.job.tracker localhost:8021   mapred.tasktracker.map.tasks.maximum 2   mapred.tasktracker.reduce.tasks.maximum 2  
Setup HDFS For The First Time
  We are almost done here, but one final step is to format the HDFS instance we’ve specified. Since we’ve already squashed the nasty SCDynamicStore bug in your hadoop-env.sh file, this should work without issue. This is also a great way to test if the account you are running hadoop as actually has access to all the required directories.

$ $HADOOP_HOME/bin/hadoop namenode -format
  You should see output like the following:

Brandons-MacBook-Air:local bbjwerner$  hadoop namenode -format Warning: $HADOOP_HOME is deprecated. 11/10/23 00:30:26 INFO namenode.NameNode: STARTUP_MSG: Re-format filesystem in /usr/local/tmp/hadoop/hadoop-bbjwerner/dfs/name ? (Y or N) Y <-- NOTE: You have to use a capital "Y" here. Dumb script.  11/10/23 00:30:28 INFO util.GSet: VM type       = 64-bit11/10/23 00:30:28 INFO util.GSet: 2% max memory = 39.83375 MB11/10/23 00:30:28 INFO util.GSet: capacity      = 2^22 = 4194304 entries 11/10/23 00:30:28 INFO util.GSet: recommended=4194304, actual=4194304 11/10/23 00:30:28 INFO namenode.FSNamesystem: fsOwner=bbjwerner 11/10/23 00:30:28 INFO namenode.FSNamesystem: supergroup=supergroup 11/10/23 00:30:28 INFO namenode.FSNamesystem: isPermissionEnabled=true 11/10/23 00:30:28 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 11/10/23 00:30:28 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 11/10/23 00:30:28 INFO namenode.NameNode: Caching file names occuring more than 10 times 11/10/23 00:30:29 INFO common.Storage: Image file of size 115 saved in 0 seconds. 11/10/23 00:30:29 INFO common.Storage: Storage directory /usr/local/tmp/hadoop/hadoop-bbjwerner/dfs/name has been successfully formatted. 11/10/23 00:30:29 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at Brandons-MacBook-Air.local/10.0.1.31
  With this complete, your setup of Hadoop is ready! Now all we have to do is run a simple test to make sure it all works!

Startup Hadoop With The Included Scripts
  You use to have to start each part of Hadoop individually (datanode, namenode, jobtracker, tasktracker) but now they include a script that will start all the services at once.

$ $HADOOP_HOME/bin/start-all.sh
  You will see each service startup. If there are no errors, you are ready to move on to testing out your Hadoop instance!

Run Hadoop with the included Examples JAR files in the Hadoop distribution
  To test out your single node, run a quick command from your command line to test it out. To see the available tests for Hadoop, run the following command:

$ hadoop jar $HADOOP_HOME/hadoop-examples-*.jar
  You will see a bunch of different cool tests. The easiest is pi. You can run it in this way:

$ hadoop jar $HADOOP_HOME/hadoop-examples-*.jar pi 10 100
  You should see output like the following:

Number of Maps  = 10 Samples per Map = 100 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2
Congratulations!
  You now have a single node Hadoop on OS X Lion. Happy Hacking!






hardoop, OSX, Apple






Who Am I?


  I am Brandon Werner. I love good friends, good coffee, and good ideas shared around a room. I work for Microsoft helping build the next identity platform in the cloud for Azure.












DSC0000.gif

DSC0001.gif

DSC0002.gif

DSC0003.gif

DSC0004.gif









Comments









  • DSC0005.gif Hiroshi@gmail.com

    22 Dec 2011 4:36 PM

      Thank you for the great instruction!   A couple of comments for your attention;
      1) $ mv hadoop-0.20.2-cdh3u1 ./hadoop : three steps before in your instruction, you have already created hadoop directory, this command will move hadoop-0.20.2-cdh3u1/ under ./hadoop.
      2) During formatting namenode and start-all.sh, I hit many of permission error like, localhost: mkdir: /usr/local/hadoop/bin/../logs: Permission denied
      How do you set your account on Mac (Lion)?  I used "sudo" with root password during the installation.  /usr/local are is protected on my Mac.
      Thank you again for your help.
      -Hiroshi











  • DSC0006.jpg Brandon Werner

    27 Dec 2011 7:14 PM

      It may be best just to sudo su and run the entire process as root to ensure anything spawned during the installation also has permission to the directory. Lion has done a lot of confusing things to permissions in the unix directory to "protect" users, so your best bet is to  take ownership of the entire /usr/local/ directory recursively and then set 775 on them.
      There is no reason in my mind why /usr/local/ shouldn't be under ownership of the user in a single user machine.









  • Will L

    30 Dec 2011 9:42 AM

      Hello, Have you been able to get Hadoop Eclipse Plugin to work on OS X Lion? For some reason in Eclipse 3.7.1 with Hadoop 0.20.205.0, the eclipse plugin cannot connect to the DFS and gives an error of "Failed to login". What I don't understand is why in Hadoop is starting to deprecate the Hadoop Eclipse PLugin. I tried building Hadoop 1.0.0's eclipse plugin from the source directory but it doesn't seem to generate any jar files.
      Thank you again for your help!









  • Ryan

    9 Jan 2012 6:01 PM

      Hi Will, it looks like the plugin is missing some jar files and I think the manifest may be off too.


运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-86326-1-1.html 上篇帖子: 在windows上建立hadoop+eclipse开发环境 下篇帖子: Hadoop分布式文件系统:架构和设计
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表