HADOOP 存储图片方案------------准备工作

st0627 · 发表于 2015-7-12 10:16:07

　　1：he HBase API cannot do positioned reads of partial byte ranges of stored objects, while the HDFS API can.
　　2：There are two basic ways of serving image files: storing the image in HBase itself, or storing a path to the image. HBase has successfully been used by a large-scale commercial photo sharing site for storing and retrieving images -- although they have had to carefully tune and monitor their system (see the HBase mailing list for details).
　　If you store your images on HDFS and only keep a path in HBase you will have to ensure that you will not have too many images as HDFS does not deal well with a lot of files (depends on the size of RAM allocated to your namenode, but there is still an upper limit).
　　Unless you plan on storing meta data along with each image, you may be able to get away with a very simple schema for either storing the data or the path to the image. I am imagining something like a single column family with two column qualifiers: data, and type. The data column could store either the path or the the actual image bytes. The type would store the image type (png, jpg, tiff, etc.). This would be useful for sending the correct mime type over the wire when returning the image.
　　3：HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups.
　　4：echo ruok | nc loclhost 2181; to check zookeeper.
　　5:一开是我单独运行了 zookeeper,然后start-habse时候又提示绑定zkserver 2181失败，于是关掉zookeeper(查看2181的程序是JAVA，于是killall java).重新开启 start-hbase没有绑定错误了.（这个错误是再LOGS中的，命令行没有提示任何内容）

Apache HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process. You can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. To toggle HBase management of ZooKeeper, use the HBASE_MANAGES_ZK variable in conf/hbase-env.sh. This variable, which defaults to true, tells HBase whether to start/stop the ZooKeeper ensemble servers as part of HBase start/stop.
　　
　　6：Will not attempt to authenticate using SASL (unknown error)

/etc/hosts should look something like this:
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
　　本来看了官网的这个提示了的，不过当时可能是一时发神经，想不改试试看，我本就localhost 是127.0.01但，pc name是1287.0.1.1. 后来就遇到上面这个问题，wast lots of time.  it's ok till changing he pc name ip.
　　7:PC之间时间不同步（hbase）(get from other's website, log it for funture )

FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server suc-pc,60020,1363269953286 has been rejected; Reported time is too far out of sync with master.  Time difference of 39375ms > max allowed of 30000ms
　　小问题，一看就知道错误发生在哪。在hbase中，允许小的时间偏差，但是上面39秒的时间偏差就有点大了。如果你是联网的话，可以用ntpdate 219.158.14.130进行同步。219.158.14.130是网通北京的时间服务器，如果不行你可以用别的服务器进行同步。
　　
　　8：https://github.com/dhardy92/thumbor_hbase
　　https://github.com/globocom/thumbor/wiki
　　Thumbor is a smart imaging service. It enables on-demand crop, resizing and flipping of images.
　　HBase is a column oriented database from the Hadoop ecosystem.
　　This module provide support for Hadoop HBase as a large auto replicant key/value backend storage for images in Thumbor.
　　9：http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
　　这里有篇讨论挺好的。I am using HBase to store a few things, one is the meta information on the data that is stored (PDFs, images, movies etc.) and also the binary location. I am writing the files as they are uploaded directly to HDFS in separate files or into one file if indicated by the user. I use an implicit batch number for the upload. A user can ask for a new explicitly and then use then that ID to upload many objects and in the end call commit(batchId). In this mode I am writing the objects into one HDFS file.
　　10：http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-td4036184.html
　　这里已有个讨论，JACK已经配置过HBASE存储图片并运行了2年，几乎没有发生过错误。得仔细看看。
　　

We stored about 1 billion images into hbase with file size up to 10MB.
Its been running for close to 2 years without issues and serves
delivery of images for Yfrog and ImageShack.  If you have any
questions about the setup, I would be glad to answer them.
　　

i have a better idea for you copy your image files to a single file on
hdfs, and if new image comes append it to the existing image, and keep and
update the metadata and the offset to the HBase. Because if you put bigger
image in hbase it wil lead to some issue.
HDFS reads are faster than HBase, but it would require first hitting the index in HBase which points to the file and then fetching the file.
It could be faster... we found storing binary data in a sequence file and indexed on HBase to be faster than HBase, however, YMMV and HBase has been improved since we did that project....
　　
　　

11:hadoop java程序感觉开发比较麻烦，应为要生成JAVA，于是我用ECLISPE的插件工具FAT JAR
　　运行hadoop -jar test.jar hdfs://localhost/user/root/hello.txt,  报错，发现程序在试着链接localhost/127.0.0.1:8020. Already tried 1 time(s).但我的core-site.xml中fs配置端口是9000.所以改成hadoop -jar test.jar hdfs://localhost:9000/user/root/hello.txt。运行成功。\

12:怎么在本地机子上（我是在本地开发hadoop程序），服务器上运行hadoop和habse的。
　　如果要在本地直接运行HADOOP程序，并操作服务器上的HDFS和HBASE。
　　1:要在本地安装HADOOP和HBASE。说是安装其实就是下载HADOOP和HBASE程序，直接解压就行了。
　　用ECLIPSE开发的HADOOP和HBASE程序时，导入HADOOP和HBASE中的LIB文件。
　　2：修改HADOOP 和HBASE中的配置文件，例如：
　　core-site.xml
　　这里的 hadoopinokpc:9000是服务器的core-site.xml配置，我这里在/etc/host中绑定了hadoopinokpc为服务器的IP
　　感觉就像解压的HADOOP只是个工具，本地HADOOP程序运行时，已会读取这个配置文件去链接HDFS。

fs.default.name
hdfs://hadoopinokpc:9000

　　mapred-site.xml

mapred.job.tracker
hadoopinokpc:9001

　　hdfs-site.xml

dfs.replication
1

　　hbase-site.xml
　　

2了，在代码里面可以直接修改配置文件的。比如下面这样，就不用管本地的配置是什么鸟了。

      config.set("hbase.zookeeper.quorum", "hadoopinokpc");
config.set("hbase.zookeeper.property.clientPort","2181");
13.java.net.ConnectException: Connection refused: no further information
　　org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:60000
服务器上master地址和localhost/127.0.0.1:60000对不上。
查看http://192.168.3.206:60010/master-status的master地址。

14.怎样设置master 和regionserver ip地址。

MASTER AND regionserver must be set by DNS.


hbase.rootdir
hdfs://hadoopinokpc.inoknok.com:9000/hbase

hbase.zookeeper.property.dataDir
hdfs://hadoopinokpc.inoknok.com:9000/zookeeper

hbase.zookeeper.quorum
192.168.0.29

hbase.zookeeper.property.clientPort
2181

hbase.master.port
60000

hbase.regionserver.port
60020

hbase.master.dns.interface
eth0

hbase.regionserver.dns.interface
eth0

hbase.master.dns.nameserver
192.168.0.254

hbase.regionserver.dns.nameserver
192.168.0.254

hbase.cluster.distributed
true

hbase.config.read.zookeeper.config
false

　　

15；ERROR: org.apache.hadoop.hbase.exceptions.MasterNotRunningException: java.io.IOException: Can't get master address from ZooKeeper; znode data == null

　　

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] HADOOP 存储图片方案------------准备工作

扫码加入运维网微信交流群