设为首页 收藏本站
查看: 6952|回复: 0

[经验分享] HADOOP 存储图片方案------------准备工作

[复制链接]

尚未签到

发表于 2015-7-12 10:16:07 | 显示全部楼层 |阅读模式
  1:he HBase API cannot do positioned reads of partial byte ranges of stored objects, while the HDFS API can.
  2:There are two basic ways of serving image files: storing the image in HBase itself, or storing a path to the image. HBase has successfully been used by a large-scale commercial photo sharing site for storing and retrieving images -- although they have had to carefully tune and monitor their system (see the HBase mailing list for details).
  If you store your images on HDFS and only keep a path in HBase you will have to ensure that you will not have too many images as HDFS does not deal well with a lot of files (depends on the size of RAM allocated to your namenode, but there is still an upper limit).
  Unless you plan on storing meta data along with each image, you may be able to get away with a very simple schema for either storing the data or the path to the image. I am imagining something like a single column family with two column qualifiers: data, and type. The data column could store either the path or the the actual image bytes. The type would store the image type (png, jpg, tiff, etc.). This would be useful for sending the correct mime type over the wire when returning the image.
  3:HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups.
  4:echo ruok | nc loclhost 2181;    to check zookeeper.
  5:一开是我单独运行了 zookeeper,然后start-habse时候又提示绑定zkserver 2181失败,于是关掉zookeeper(查看2181的程序是JAVA,于是killall java).重新开启 start-hbase没有绑定错误了.(这个错误是再LOGS中的,命令行没有提示任何内容)



Apache HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process. You can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. To toggle HBase management of ZooKeeper, use the HBASE_MANAGES_ZK variable in conf/hbase-env.sh. This variable, which defaults to true, tells HBase whether to start/stop the ZooKeeper ensemble servers as part of HBase start/stop.
  
  6:Will not attempt to authenticate using SASL (unknown error)



/etc/hosts should look something like this:
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
  本来看了官网的这个提示了的,不过当时可能是一时发神经,想不改试试看,我本就localhost 是127.0.01但,pc name是1287.0.1.1. 后来就遇到上面这个问题,wast lots of time.  it's ok till changing he pc name ip.
  7:PC之间时间不同步(hbase)(get from other's website, log it for funture )



FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server suc-pc,60020,1363269953286 has been rejected; Reported time is too far out of sync with master.  Time difference of 39375ms > max allowed of 30000ms
  小问题,一看就知道错误发生在哪。在hbase中,允许小的时间偏差,但是上面39秒的时间偏差就有点大了。如果你是联网的话,可以用ntpdate 219.158.14.130进行同步。219.158.14.130是网通北京的时间服务器,如果不行你可以用别的服务器进行同步。
  
  8:https://github.com/dhardy92/thumbor_hbase
  https://github.com/globocom/thumbor/wiki
  Thumbor is a smart imaging service. It enables on-demand crop, resizing and flipping of images.
  HBase is a column oriented database from the Hadoop ecosystem.
  This module provide support for Hadoop HBase as a large auto replicant key/value backend storage for images in Thumbor.
  9:http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
  这里有篇讨论挺好的。I am using HBase to store a few things, one is the meta information on the data that is stored (PDFs, images, movies etc.) and also the binary location. I am writing the files as they are uploaded directly to HDFS in separate files or into one file if indicated by the user. I use an implicit batch number for the upload. A user can ask for a new explicitly and then use then that ID to upload many objects and in the end call commit(batchId). In this mode I am writing the objects into one HDFS file.
  10:http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-td4036184.html
  这里已有个讨论,JACK已经配置过HBASE存储图片并运行了2年,几乎没有发生过错误。得仔细看看。
  



We stored about 1 billion images into hbase with file size up to 10MB.
Its been running for close to 2 years without issues and serves
delivery of images for Yfrog and ImageShack.  If you have any
questions about the setup, I would be glad to answer them.
  



i have a better idea for you copy your image files to a single file on
hdfs, and if new image comes append it to the existing image, and keep and
update the metadata and the offset to the HBase. Because if you put bigger
image in hbase it wil lead to some issue.
HDFS reads are faster than HBase, but it would require first hitting the index in HBase which points to the file and then fetching the file.
It could be faster... we found storing binary data in a sequence file and indexed on HBase to be faster than HBase, however, YMMV and HBase has been improved since we did that project....
  
  

11:hadoop java程序感觉开发比较麻烦,应为要生成JAVA,于是我用ECLISPE的插件工具FAT JAR
  运行hadoop -jar test.jar hdfs://localhost/user/root/hello.txt,  报错,发现程序在试着链接localhost/127.0.0.1:8020. Already tried 1 time(s).但我的core-site.xml中fs配置端口是9000.所以改成hadoop -jar test.jar hdfs://localhost:9000/user/root/hello.txt。运行成功。\

12:怎么在本地机子上(我是在本地开发hadoop程序),服务器上运行hadoop和habse的。
  如果要在本地直接运行HADOOP程序,并操作服务器上的HDFS和HBASE。
  1:要在本地安装HADOOP和HBASE。说是安装其实就是下载HADOOP和HBASE程序,直接解压就行了。
  用ECLIPSE开发的HADOOP和HBASE程序时,导入HADOOP和HBASE中的LIB文件。
  2:修改HADOOP 和HBASE中的配置文件,例如:
  core-site.xml
  这里的 hadoopinokpc:9000是服务器的core-site.xml配置,我这里在/etc/host中绑定了hadoopinokpc为服务器的IP
  感觉就像解压的HADOOP只是个工具,本地HADOOP程序运行时,已会读取这个配置文件去链接HDFS。





fs.default.name
hdfs://hadoopinokpc:9000



  mapred-site.xml





mapred.job.tracker
hadoopinokpc:9001



  hdfs-site.xml





dfs.replication
1



  hbase-site.xml
  

2了,在代码里面可以直接修改配置文件的。比如下面这样,就不用管本地的配置是什么鸟了。



        config.set("hbase.zookeeper.quorum", "hadoopinokpc");
config.set("hbase.zookeeper.property.clientPort","2181");
13.java.net.ConnectException: Connection refused: no further information
  org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:60000
服务器上master地址和localhost/127.0.0.1:60000对不上。
查看http://192.168.3.206:60010/master-status的master地址。

14.怎样设置master 和regionserver ip地址。

MASTER AND regionserver must be set by DNS.



  
hbase.rootdir
hdfs://hadoopinokpc.inoknok.com:9000/hbase


hbase.zookeeper.property.dataDir
hdfs://hadoopinokpc.inoknok.com:9000/zookeeper


hbase.zookeeper.quorum
192.168.0.29


hbase.zookeeper.property.clientPort
2181


hbase.master.port
60000


hbase.regionserver.port
60020

   
hbase.master.dns.interface
eth0


hbase.regionserver.dns.interface
eth0


hbase.master.dns.nameserver
192.168.0.254


hbase.regionserver.dns.nameserver
192.168.0.254


hbase.cluster.distributed
true


hbase.config.read.zookeeper.config
false

  

15;ERROR: org.apache.hadoop.hbase.exceptions.MasterNotRunningException: java.io.IOException: Can't get master address from ZooKeeper; znode data == null


  

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-85713-1-1.html 上篇帖子: windows下eclipse远程连接Hadoop集群进行开发 下篇帖子: Hadoop配置文件解析
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表