大数据架构-使用HBase和Solr将存储与索引放在不同的机器上

iszjw · 发表于 2015-7-18 13:59:43

　　

摘要：HBase可以通过协处理器Coprocessor的方式向Solr发出请求，Solr对于接收到的数据可以做相关的同步：增、删、改索引的操作，这样就可以同时使用HBase存储量大和Solr检索性能高的优点了，更何况HBase和Solr都可以集群。这对海量数据存储、检索提供了一种方式，将存储与索引放在不同的机器上，是大数据架构的必须品。
关键词：HBase, Solr, Coprocessor, 大数据, 架构
HBase和Solr可以通过协处理器Coprocessor的方式向Solr发出请求，Solr对于接收到的数据可以做相关的同步：增、删、改索引的操作。将存储与索引放在不同的机器上，这是大数据架构的必须品。下面讲述对HBase和Solr的性能时，使用HBase协处理器向HBase添加数据所编写的相关代码，及解释说明。

一、编写HBase协处理器Coprocessor

一旦有数据postPut，就立即对Solr里相应的Core更新。这里使用了ConcurrentUpdateSolrServer，它是Solr速率性能的保证，使用它不要忘记在Solr里面配置autoCommit哟。

/*
*版权：王安琪
*描述：监视HBase，一有数据postPut就向Solr发送，本类要作为触发器添加到HBase
*修改时间：2014-05-27
*修改内容：新增
*/
package solrHbase.test;

import java.io.UnsupportedEncodingException;

import ***;

public class SorlIndexCoprocessorObserver extends BaseRegionObserver {

private static final Logger LOG = LoggerFactory
         .getLogger(SorlIndexCoprocessorObserver.class);
private static final String solrUrl = "http://192.1.11.108:80/solr/core1";
private static final SolrServer solrServer = new ConcurrentUpdateSolrServer(
         solrUrl, 10000, 20);

/**
   * 建立solr索引
   *
   * @throws UnsupportedEncodingException
   */
@Override
public void postPut(final ObserverContext e,
         final Put put, final WALEdit edit, final boolean writeToWAL)
         throws UnsupportedEncodingException {
      inputSolr(put);
}

public void inputSolr(Put put) {
      try {
         solrServer.add(TestSolrMain.getInputDoc(put));
      } catch (Exception ex) {
         LOG.error(ex.getMessage());
      }
}
}

注意：getInputDoc是这个HBase协处理器Coprocessor的精髓所在，它可以把HBase内的Put里的内容转化成Solr需要的值。其中String fieldName = key.substring(key.indexOf(columnFamily) + 3, key.indexOf("我在这")).trim();这里有一个乱码字符，在这里看不到，请大家注意一下。

public static SolrInputDocument getInputDoc(Put put) {
      SolrInputDocument doc = new SolrInputDocument();
      doc.addField("test_ID", Bytes.toString(put.getRow()));
      for (KeyValue c : put.getFamilyMap().get(Bytes.toBytes(columnFamily))) {
         String key = Bytes.toString(c.getKey());
         String value = Bytes.toString(c.getValue());
         if (value.isEmpty()) {
            continue;
         }
         String fieldName = key.substring(key.indexOf(columnFamily) + 3,
                  key.indexOf("")).trim();
         doc.addField(fieldName, value);
      }
      return doc; }

二、编写测试程序入口代码main

这段代码向HBase请求建了一张表，并将模拟的数据，向HBase连续地提交数据内容，在HBase中不断地插入数据，同时记录时间，测试插入性能。

/*
*版权：王安琪
*描述：测试HBaseInsert，HBase插入性能
*修改时间：2014-05-27
*修改内容：新增
*/
package solrHbase.test;

import hbaseInput.HbaseInsert;

import ***;

public class TestHBaseMain {

private static Configuration config;
private static String tableName = "angelHbase";
private static HTable table = null;
private static final String columnFamily = "wanganqi";

/**
   * @param args
   */
public static void main(String[] args) {
      config = HBaseConfiguration.create();
      config.set("hbase.zookeeper.quorum", "192.103.101.104");
      HbaseInsert.createTable(config, tableName, columnFamily);
      try {
         table = new HTable(config, Bytes.toBytes(tableName));
         for (int k = 0; k < 1; k++) {
            Thread t = new Thread() {
                  public void run() {
                     for (int i = 0; i < 100000; i++) {
                        HbaseInsert.inputData(table,
                                 PutCreater.createPuts(1000, columnFamily));
                        Calendar c = Calendar.getInstance();
                        String dateTime = c.get(Calendar.YEAR) + "-"
                                 + c.get(Calendar.MONTH) + "-"
                                 + c.get(Calendar.DATE) + "T"
                                 + c.get(Calendar.HOUR) + ":"
                                 + c.get(Calendar.MINUTE) + ":"
                                 + c.get(Calendar.SECOND) + ":"
                                 + c.get(Calendar.MILLISECOND) + "Z 写入: "
                                 + i * 1000;
                        System.out.println(dateTime);
                     }
                  }
            };
            t.start();
         }
      } catch (IOException e1) {
         e1.printStackTrace();
      }
}

}

下面的是与HBase相关的操作，把它封装到一个类中，这里就只有建表与插入数据的相关代码。

/*
*版权：王安琪
*描述：与HBase相关操作，建表与插入数据
*修改时间：2014-05-27
*修改内容：新增
*/
package hbaseInput;
import ***;
import org.apache.hadoop.hbase.client.Put;

public class HbaseInsert {

public static void createTable(Configuration config, String tableName,
         String columnFamily) {
      HBaseAdmin hBaseAdmin;
      try {
         hBaseAdmin = new HBaseAdmin(config);
         if (hBaseAdmin.tableExists(tableName)) {
            return;
         }
         HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);
         tableDescriptor.addFamily(new HColumnDescriptor(columnFamily));
         hBaseAdmin.createTable(tableDescriptor);
         hBaseAdmin.close();
      } catch (MasterNotRunningException e) {
         e.printStackTrace();
      } catch (ZooKeeperConnectionException e) {
         e.printStackTrace();
      } catch (IOException e) {
         e.printStackTrace();
      }
}

public static void inputData(HTable table, ArrayList puts) {
      try {
         table.put(puts);
         table.flushCommits();
         puts.clear();
      } catch (IOException e) {
         e.printStackTrace();
      }
}
}

三、编写模拟数据Put

向HBase中写入数据需要构造Put，下面是我构造模拟数据Put的方式，有字符串的生成，我是由mmseg提供的词典words.dic中随机读取一些词语连接起来，生成一句字符串的，下面的代码没有体现，不过很easy，你自己造你自己想要的数据就OK了。

public static Put createPut(String columnFamily) {
      String ss = getSentence();
      byte[] family = Bytes.toBytes(columnFamily);
      byte[] rowKey = Bytes.toBytes("" + Math.abs(r.nextLong()));
      Put put = new Put(rowKey);
      put.add(family, Bytes.toBytes("DeviceID"),
            Bytes.toBytes("" + Math.abs(r.nextInt())));
      ******
      put.add(family, Bytes.toBytes("Company_mmsegsm"), Bytes.toBytes("ss"));

      return put; }

当然在运行上面这个程序之前，需要先在Solr里面配置好你需要的列信息，HBase、Solr安装与配置，它们的基础使用方法将会在之后的文章中介绍

。在这里，Solr的列配置就跟你使用createPut生成的Put搞成一样的列名就行了，当然也可以使用动态列的形式。

四、直接对Solr性能测试

如果你不想对HBase与Solr的相结合进行测试，只想单独对Solr的性能进行测试，这就更简单了，完全可以利用上面的代码段来测试，稍微组装一下就可以了。

private static void sendConcurrentUpdateSolrServer(final String url,
         final int count) throws SolrServerException, IOException {
      SolrServer solrServer = new ConcurrentUpdateSolrServer(url, 10000, 20);
      for (int i = 0; i < count; i++) {
         solrServer.add(getInputDoc(PutCreater.createPut(columnFamily)));
      } }

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

Red Hat RHCE 8 (EX294) Cert Guide

c++ size_t 和 int 的区别

HERE 使用 AWS EF 和 JFrog Artifactory 打

C++ 指针大全：从基础到进阶，一篇快速上手

wirelessnetview好用的无线分析工具

亿图图示专家(EDraw Max) V7.9 中文破解版

[经验分享] 大数据架构-使用HBase和Solr将存储与索引放在不同的机器上

浏览过的版块

扫码加入运维网微信交流群