1:he HBase API cannot do positioned reads of partial byte ranges of stored objects, while the HDFS API can.
2:There are two basic ways of serving image files: storing the image in HBase itself, or storing a path to the image. HBase has successfully been used by a large-scale commercial photo sharing site for storing and retrieving images -- although they have had to carefully tune and monitor their system (see the HBase mailing list for details).
If you store your images on HDFS and only keep a path in HBase you will have to ensure that you will not have too many images as HDFS does not deal well with a lot of files (depends on the size of RAM allocated to your namenode, but there is still an upper limit).
Unless you plan on storing meta data along with each image, you may be able to get away with a very simple schema for either storing the data or the path to the image. I am imagining something like a single column family with two column qualifiers: data, and type. The data column could store either the path or the the actual image bytes. The type would store the image type (png, jpg, tiff, etc.). This would be useful for sending the correct mime type over the wire when returning the image.
3:HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups.
4:echo ruok | nc loclhost 2181; to check zookeeper.
5:一开是我单独运行了 zookeeper,然后start-habse时候又提示绑定zkserver 2181失败,于是关掉zookeeper(查看2181的程序是JAVA,于是killall java).重新开启 start-hbase没有绑定错误了.(这个错误是再LOGS中的,命令行没有提示任何内容)
Apache HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process. You can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. To toggle HBase management of ZooKeeper, use the HBASE_MANAGES_ZK variable in conf/hbase-env.sh. This variable, which defaults to true, tells HBase whether to start/stop the ZooKeeper ensemble servers as part of HBase start/stop.
6:Will not attempt to authenticate using SASL (unknown error)
/etc/hosts should look something like this:
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
本来看了官网的这个提示了的,不过当时可能是一时发神经,想不改试试看,我本就localhost 是127.0.01但,pc name是1287.0.1.1. 后来就遇到上面这个问题,wast lots of time. it's ok till changing he pc name ip.
7:PC之间时间不同步(hbase)(get from other's website, log it for funture )
FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server suc-pc,60020,1363269953286 has been rejected; Reported time is too far out of sync with master. Time difference of 39375ms > max allowed of 30000ms
小问题,一看就知道错误发生在哪。在hbase中,允许小的时间偏差,但是上面39秒的时间偏差就有点大了。如果你是联网的话,可以用ntpdate 219.158.14.130进行同步。219.158.14.130是网通北京的时间服务器,如果不行你可以用别的服务器进行同步。
8:https://github.com/dhardy92/thumbor_hbase
https://github.com/globocom/thumbor/wiki
Thumbor is a smart imaging service. It enables on-demand crop, resizing and flipping of images.
HBase is a column oriented database from the Hadoop ecosystem.
This module provide support for Hadoop HBase as a large auto replicant key/value backend storage for images in Thumbor.
9:http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
这里有篇讨论挺好的。I am using HBase to store a few things, one is the meta information on the data that is stored (PDFs, images, movies etc.) and also the binary location. I am writing the files as they are uploaded directly to HDFS in separate files or into one file if indicated by the user. I use an implicit batch number for the upload. A user can ask for a new explicitly and then use then that ID to upload many objects and in the end call commit(batchId). In this mode I am writing the objects into one HDFS file.
10:http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-td4036184.html
这里已有个讨论,JACK已经配置过HBASE存储图片并运行了2年,几乎没有发生过错误。得仔细看看。
We stored about 1 billion images into hbase with file size up to 10MB.
Its been running for close to 2 years without issues and serves
delivery of images for Yfrog and ImageShack. If you have any
questions about the setup, I would be glad to answer them.
i have a better idea for you copy your image files to a single file on
hdfs, and if new image comes append it to the existing image, and keep and
update the metadata and the offset to the HBase. Because if you put bigger
image in hbase it wil lead to some issue.
HDFS reads are faster than HBase, but it would require first hitting the index in HBase which points to the file and then fetching the file.
It could be faster... we found storing binary data in a sequence file and indexed on HBase to be faster than HBase, however, YMMV and HBase has been improved since we did that project....
config.set("hbase.zookeeper.quorum", "hadoopinokpc");
config.set("hbase.zookeeper.property.clientPort","2181"); 13.java.net.ConnectException: Connection refused: no further information
org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:60000
服务器上master地址和localhost/127.0.0.1:60000对不上。
查看http://192.168.3.206:60010/master-status的master地址。
15;ERROR: org.apache.hadoop.hbase.exceptions.MasterNotRunningException: java.io.IOException: Can't get master address from ZooKeeper; znode data == null