hadoop的伪分布式安装(详细)
安装环境vmvare 11
centos 6.5
安装开始:
一 安装jdk
将下载的.bin格式的JDK传输到Linux家目录下的Hadoop 文件夹,将hadoop安装包也放到这个文件夹
进入JDK安装目录建立一个软连接
# ln -s jdk1.6.0_27 java
# cd
进入家目录
编辑.bashrc
将hadoop目录解压到/usr/目录
让环境变量生效
二设置ssh等效性
配置ssh等效性
免密码ssh设置 --不然每启动一次就需要输入一次密码
现在确认能否不输入口令就用ssh登录localhost:
# ssh localhost
如果不输入口令就无法用ssh登陆localhost,执行下面的命令:
# ssh-keygen -t rsa
# cd .ssh/
# ls
id_rsaid_rsa.pubknown_hosts
# cat id_rsa.pub >authorized_keys
# ls
authorized_keysid_rsaid_rsa.pubknown_hosts
#
三 安装hadoop软件
将hadoop压缩包解压到 /usr/目录下
建立一个软连接
编辑hadoop配置文件
进入
编辑三个文件core-site.xml 、hdfs-site.xml和mapred-site.xml
1)编辑 core-site.xml 在 之间增加
fs.default.name
hdfs://localhost:9000
hadoop.tmp.dir
/tmp/hadoop/hadoop-${user.name}
2)编辑hdfs-site.xml 在 之间增加
dfs.replication
1
3)编辑mapred-site.xml 在 之间增加
mapred.job.tracker
localhost:9001
进入hadoop安装目录
格式化分布式文件系统
格式化名称节点:建立一系列结构,存放HDFS元数据
#./hadoop namenode -format
启动hadoop
# ./start-all.sh
检测守护进程启动情况
# jps
3884 NameNode
4180 JobTracker
4111 SecondaryNameNode
4441 Jps
#
启动过程出现错误 DataNode 没有启动成功
查看日志
从日志中可以看出 java.net.UnknownHostException 异常
java.net.UnknownHostException: CentOS-6.5: CentOS-6.5
at java.net.InetAddress.getLocalHost(InetAddress.java:1360)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.getHostname(MetricsSystemImpl.java:481)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSystem(MetricsSystemImpl.java:412)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:408)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:152)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:133)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:40)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1650)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
Hadoop在格式化HDFS的时候,通过hostname命令获取到的主机名是CentOS-6.5
,然后在/etc/hosts文件中进行映射的时候,没有找到
查看一下/etc/sysconfig/network文件: 保存的是hostname 也就是主机名 。
看看主机名是什么,然后修改 /etc/hosts文件,修改成主机名。
修改后的hosts文件
重启网络
# /etc/rc.d/init.d/network restart
重新格式化HDFS
启动集群
开启成功
进入虚拟机,打开浏览器,输入
http://192.168.141.2:50070
会看到节点存储信息
到此安装结束
简单测试
# hadoop fs -ls
ls: Cannot access .: No such file or directory.
# ls
anaconda-ks.cfghbase-0.94.16-securityinstall.log.syslog模板文档桌面
Hadoop installer workspace 视频下载
Hbase install.log 公共的 图片音乐
# mkdir input
# cd input/
# ls
# echo "hello world">test2.txt
# echo "hello hadoop">test1.txt
# ls
test1.txttest2.txt
# hadoop fs -mkdirinput
# hadoop fs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2015-04-19 17:15 /user/root/input
# hadoop fs -put test1.txt input
# hadoop fs -put test2.txt input
# hadoop fs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2015-04-19 17:16 /user/root/input
#
测试mapreduce
# cd /usr/hadoop
# hadoop jar hadoop-examples-1.2.1.jar wordcount input output
15/04/19 17:22:19 INFO input.FileInputFormat: Total input paths to process : 2
15/04/19 17:22:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/04/19 17:22:19 WARN snappy.LoadSnappy: Snappy native library not loaded
15/04/19 17:22:20 INFO mapred.JobClient: Running job: job_201504191711_0001
15/04/19 17:22:21 INFO mapred.JobClient:map 0% reduce 0%
15/04/19 17:22:57 INFO mapred.JobClient:map 50% reduce 0%
15/04/19 17:22:58 INFO mapred.JobClient:map 100% reduce 0%
15/04/19 17:23:09 INFO mapred.JobClient:map 100% reduce 100%
15/04/19 17:23:10 INFO mapred.JobClient: Job complete: job_201504191711_0001
15/04/19 17:23:10 INFO mapred.JobClient: Counters: 29
15/04/19 17:23:10 INFO mapred.JobClient: Job Counters
15/04/19 17:23:10 INFO mapred.JobClient: Launched reduce tasks=1
15/04/19 17:23:10 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=63538
15/04/19 17:23:10 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/04/19 17:23:10 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/04/19 17:23:10 INFO mapred.JobClient: Launched map tasks=2
15/04/19 17:23:10 INFO mapred.JobClient: Data-local map tasks=2
15/04/19 17:23:10 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11731
15/04/19 17:23:10 INFO mapred.JobClient: File Output Format Counters
15/04/19 17:23:10 INFO mapred.JobClient: Bytes Written=25
15/04/19 17:23:10 INFO mapred.JobClient: FileSystemCounters
15/04/19 17:23:10 INFO mapred.JobClient: FILE_BYTES_READ=55
15/04/19 17:23:10 INFO mapred.JobClient: HDFS_BYTES_READ=249
15/04/19 17:23:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=169962
15/04/19 17:23:10 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25
15/04/19 17:23:10 INFO mapred.JobClient: File Input Format Counters
15/04/19 17:23:10 INFO mapred.JobClient: Bytes Read=25
15/04/19 17:23:10 INFO mapred.JobClient: Map-Reduce Framework
15/04/19 17:23:10 INFO mapred.JobClient: Map output materialized bytes=61
15/04/19 17:23:10 INFO mapred.JobClient: Map input records=2
15/04/19 17:23:10 INFO mapred.JobClient: Reduce shuffle bytes=61
15/04/19 17:23:10 INFO mapred.JobClient: Spilled Records=8
15/04/19 17:23:10 INFO mapred.JobClient: Map output bytes=41
15/04/19 17:23:10 INFO mapred.JobClient: CPU time spent (ms)=48340
15/04/19 17:23:10 INFO mapred.JobClient: Total committed heap usage (bytes)=292167680
15/04/19 17:23:10 INFO mapred.JobClient: Combine input records=4
15/04/19 17:23:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=224
15/04/19 17:23:10 INFO mapred.JobClient: Reduce input records=4
15/04/19 17:23:10 INFO mapred.JobClient: Reduce input groups=3
15/04/19 17:23:10 INFO mapred.JobClient: Combine output records=4
15/04/19 17:23:10 INFO mapred.JobClient: Physical memory (bytes) snapshot=324190208
15/04/19 17:23:10 INFO mapred.JobClient: Reduce output records=3
15/04/19 17:23:10 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1133568000
15/04/19 17:23:10 INFO mapred.JobClient: Map output records=4
#
# hadoop fs -ls
Found 2 items
drwxr-xr-x - root supergroup 0 2015-04-19 17:16 /user/root/input
drwxr-xr-x - root supergroup 0 2015-04-19 17:23 /user/root/output
# hadoop fs -ls output
Found 3 items
-rw-r--r-- 1 root supergroup 0 2015-04-19 17:23 /user/root/output/_SUCCESS
drwxr-xr-x - root supergroup 0 2015-04-19 17:22 /user/root/output/_logs
-rw-r--r-- 1 root supergroup 25 2015-04-19 17:23 /user/root/output/part-r-00000
# hadoop fs -cat output/part-r-00000
hadoop1
hello2
world1
#
运行结果
数据其实就在 slave1 和 slave2 我们制定的位置
$ vim hdfs-site.xml
dfs.data.dir
/data/hadoop
dfs.replication
2
# hadoop fs
Usage: java FsShell
[-ls ]
[-lsr ]
[-du ]
[-dus ]
[-count[-q] ]
[-mv]
[-cp]
[-rm [-skipTrash] ]
[-rmr [-skipTrash] ]
[-expunge]
[-put... ]
[-copyFromLocal... ]
[-moveFromLocal... ]
[-get [-ignoreCrc] [-crc]]
[-getmerge ]
[-cat ]
[-text ]
[-copyToLocal [-ignoreCrc] [-crc]]
[-moveToLocal [-crc]]
[-mkdir ]
[-setrep [-R] [-w]]
[-touchz ]
[-test - ]
[-stat ]
[-tail [-f] ]
[-chmod [-R]PATH...]
[-chown [-R] [:] PATH...]
[-chgrp [-R] GROUP PATH...]
[-help ]
Generic options supported are
-conf specify an application configuration file
-D use value for given property
-fs specify a namenode
-jt specify a job tracker
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command
#
# hadoop dfsadmin
Usage: java DFSAdmin
[-report]
[-safemode enter | leave | get | wait]
[-saveNamespace]
[-refreshNodes]
[-finalizeUpgrade]
[-upgradeProgress status | details | force]
[-metasave filename]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-setQuota...]
[-clrQuota ...]
[-setSpaceQuota...]
[-clrSpaceQuota ...]
[-setBalancerBandwidth ]
[-help ]
Generic options supported are
-conf specify an application configuration file
-D use value for given property
-fs specify a namenode
-jt specify a job tracker
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command
#
# hadoop dfsadmin -report
Configured Capacity: 18536591360 (17.26 GB)
Present Capacity: 11065761792 (10.31 GB)
DFS Remaining: 11065618432 (10.31 GB)
DFS Used: 143360 (140 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 18536591360 (17.26 GB)
DFS Used: 143360 (140 KB)
Non DFS Used: 7470829568 (6.96 GB)
DFS Remaining: 11065618432(10.31 GB)
DFS Used%: 0%
DFS Remaining%: 59.7%
Last contact: Sun Apr 19 17:37:09 CST 2015
#
可以使用帮助
刷新命令
# hadoop dfsadmin -help refreshNodes
归档命令 将多个小文件合并一个大文件
$ hadoop archive -archiveName files.har -p /user/hadoop/input
/user/hadoop
这里名字必须叫做 files.har
$ hadoop fs -cat /user/hadoop/files.har/part-0
hello hadoop
hello world
使得负载均衡
# start-balancer.sh
starting balancer, logging to /usr/hadoop-1.2.1/libexec/../logs/hadoop-root-balancer-CentOS-6.5.out
#
页:
[1]