Hue
Hue是一个开源的Apache Hadoop UI系统,由Cloudera Desktop演化而来,最后Cloudera公司将其贡献给Apache基金会的Hadoop社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据. Hue在数据库方面,默认使用的是SQLite数据库来管理自身的数据,包括用户认证和授权,另外,可以自定义为MySQL数据库、Postgresql数据库、以及Oracle数据库目录:
[*]功能介绍(演示地址: http://gethue.com/)
[*]安装部署
[*]Azure 安装CDH
功能介绍
[*]对HDFS的访问,通过浏览器来查阅HDFS的数据
[*]Hive编辑器:可以编写HQL和运行HQL脚本,以及查看运行结果等相关Hive功能
[*]提供Solr搜索应用,并对应相应的可视化数据视图以及DashBoard
[*]提供Impala的应用进行数据交互查询
[*]最新的版本集成了Spark编辑器和DashBoard
[*]支持Pig编辑器,并能够运行编写的脚本任务
[*]Oozie调度器,可以通过DashBoard来提交和监控Workflow、Coordinator以及Bundle
[*]支持HBase对数据的查询修改以及可视化
[*]支持对Metastore的浏览,可以访问Hive的元数据以及对应的HCatalog
[*]对Job的支持,Sqoop,ZooKeeper等的支持
安装部署
[*]安装配置过程参见:http://cloudera.github.io/hue/docs-3.6.0/manual.html
[*]hue支持广,依赖多,系统环境有所缺失安装就比较麻烦,如make时会自建一个虚拟的运行环境,导致与系统默认有所偏差,造成编译安装过程遇到一些问题
[*]最简单的安装方式当然是使用CDH的RPM包,但是就要用到CDH的一整套集群环境,毕竟这在已有集群的情况下不太合理,可行性低
Azure 安装CDH
[*]Go to https://ms.portal.azure.com
[*]Click on resource groups on the left navigation bar
[*]
[*]Enter a name for your resource group, pick the subscription and availability region and click on “create”.This will create a resource group that we will use in the cluster setup
[*]
[*]Click on “New”, then on “Data + Analytics” and then on “Cloudera Enterprise Data Hub”
[*]
[*]In the blade that opens up, under “Select deployment model”, click on “Resource Manager”, the click “Create”
[*]
[*]In the blade that opens, click on “Basics, Configure basic settings”;Here, enter the following: User name (Linux user)\password....
[*]
[*]
[*]Next, click on “Inftrastructure information”;See screenshot below for where you can customize, and where to leave defaults.
[*]
[*]Next, click on “Cloudera setup information”;Here, enter the following:Cloudera Manager User Name \ Password \ Cluster Type (two options – POC and Production) \ Number of data nodes
[*]Click on user information, enter some details about yourself.
[*]Click on “Buy” and then create.This will provision the cluster.
[*] Step away for a long break; At the time this post was written, it took more than an hour.You can monitor the progress from the portal.
Nodes and Roles
[*]In the setup, we entered 3 data nodes, and selected Production,The following are the nodes and the roles running on them:
[*]
[*]
[*]
Connecting to the cluster
[*]
页:
[1]