设为首页 收藏本站
查看: 433|回复: 0

[经验分享] Hadoop

[复制链接]

尚未签到

发表于 2016-12-9 07:28:12 | 显示全部楼层 |阅读模式
Hadoop 官方网站

 

Hadoop - Cloudera

 

Hadoop - Yahoo!

 

Hadoop - Wiki



Doug Cutting - Wiki



Doug Cutting - blog

 

Hadoop 包括下面这些子项目:


  • HDFS:A distributed file system that provides high throughput access toapplication data. HDFS: 一个能够提供高吞吐量访问应用数据的分布式文件系统。其思想来自于 Google 的 The Google File System (GFS)
  • MapReduce:A software framework for distributed processing of large data sets oncompute clusters. MapReduce: 在。其思想来自于 Google 的 MapReduce: Simplified Data Processing on Large Clusters

 

《Hadoop权威指南(中文版)》

      本人已买且读过部分章节。翻译的语句明显不通,但是该刚接触 Hadoop 挚友的还是很有帮助的。从中文版的内容来看,英文原版的质量非常不错。所以,建议将她和英文版(下载电子版即可,下载地址详见下面,附件也有文件下载),以及 Hadoop 官方文档信息一起结合起来学习和实践。这应该是一种不错的折衷方案吧,毕竟有关 Hadoop 的经典中文书籍少之又少。



《Hadoop: The Definitive Guide》

    从中文版的内容介绍来看,她对 Hadoop 的 HDFS 和 MapReduce 的具体实现细节都介绍地很详细。个人认为她与《Java 编程思想》有的一拼。英文原版下载地址:Oreilly.Hadoop.The.Definitive.Guide.Jun.2009.rar



《云计算的关键技术与应用实例》

     有选择的看了这本书的部分章节,发现她对云计算(包括概念、相关技术)的解释还是颇有深度,且是用通俗易懂的语言阐明非常深奥的知识实属难得。同时也看出作者对云计算的理解还是很有深度的。

 

  The Google File System
Sanjay Ghemawat,Howard Gobioff, andShun-Tak Leung
  Abstract
  Wehave designed and implemented the Google File System, a scalabledistributed file system for large distributed data-intensiveapplications. It provides fault tolerance while running on inexpensivecommodity hardware, and it delivers high aggregate performance to alarge number of clients.
  

  Whilesharing many of the same goals as previous distributed file systems,our design has been driven by observations of our application workloadsand technological environment, both current and anticipated, thatreflect a marked departure from some earlier file system assumptions.This has led us to reexamine traditional choices and explore radicallydifferent design points.
  

  Thefile system has successfully met our storage needs. It is widelydeployed within Google as the storage platform for the generation andprocessing of data used by our service as well as research anddevelopment efforts that require large data sets. The largest clusterto date provides hundreds of terabytes of storage across thousands ofdisks on over a thousand machines, and it is concurrently accessed byhundreds of clients.
  

  In thispaper, we present file system interface extensions designed to supportdistributed applications, discuss many aspects of our design, andreport measurements from both micro-benchmarks and real world use.
  

  Appeared in:
19th ACM Symposium on Operating Systems Principles,
Lake George, NY, October, 2003.
  

  Download: PDF Version
  

  MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Deanand Sanjay Ghemawat
  Abstract
  MapReduceis a programming model and an associated implementation for processingand generating large data sets. Users specify a map function thatprocesses a key/value pair to generate a set of intermediate key/valuepairs, and a reduce function that merges all intermediate valuesassociated with the same intermediate key. Many real world tasks areexpressible in this model, as shown in the paper.
  

  Programswritten in this functional style are automatically parallelized andexecuted on a large cluster of commodity machines. The run-time systemtakes care of the details of partitioning the input data, schedulingthe program's execution across a set of machines, handling machinefailures, and managing the required inter-machine communication. Thisallows programmers without any experience with parallel and distributedsystems to easily utilize the resources of a large distributed system.
  

  Ourimplementation of MapReduce runs on a large cluster of commoditymachines and is highly scalable: a typical MapReduce computationprocesses many terabytes of data on thousands of machines. Programmersfind the system easy to use: hundreds of MapReduce programs have beenimplemented and upwards of one thousand MapReduce jobs are executed onGoogle's clusters every day.
  

  Appeared in:
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.
  

  Download: PDF Version
  Slides: HTML Slides
 
想要学习 Google 技术的挚友,不妨时常访问她:Google Research 技术论文中心


  

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-311577-1-1.html 上篇帖子: 记录Hadoop native libraries无法load的问题 下篇帖子: 【转】Hadoop
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表