古城热线 发表于 2016-12-15 06:47:42

solr DataImportHandler(DIH)

  http://wiki.apache.org/solr/DataImportHandler
  目标
  从关系数据库中导入数据
  环境
  apache-solr-dataimporthandler-3.4.0.jar和apache-solr-dataimporthandler-extras-3.4.0.jar和数据库驱动jar需要放到$solr.home/lib目录下
  配置solrconfig.xml

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">/home/username/data-config.xml</str>
</lst>
</requestHandler>
  配置data-config.xml, 使用mysql, 表结构同例子(example-DIH中的db)中一致

<dataConfig>
<dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/testsolr?autoReconnect=true&amp;characterEncoding=utf8&amp;useUnicode=true" user="root" password="123456" />
<!-- pk id 小写,大写报错,Map.containsKey区分大小写 -->
<document>
<entity name="item" pk="id"
query="select * from item"
deltaImportQuery="select * from item where ID ='${dataimporter.delta.id}'"
deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'">
<entity name="feature" pk="ITEM_ID"
query="select DESCRIPTION as features from FEATURE where ITEM_ID='${item.ID}'"
deltaQuery="select ITEM_ID from FEATURE where last_modified > '${dataimporter.last_index_time}'"
parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}"/>
<entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"
deltaQuery="select ITEM_ID, CATEGORY_ID from item_category where last_modified > '${dataimporter.last_index_time}'"
parentDeltaQuery="select ID from item where ID=${item_category.ITEM_ID}">
<entity name="category" pk="ID"
query="select DESCRIPTION as cat from category where ID = '${item_category.CATEGORY_ID}'"
deltaQuery="select ID from category where last_modified > '${dataimporter.last_index_time}'"
parentDeltaQuery="select ITEM_ID, CATEGORY_ID from item_category where CATEGORY_ID=${category.ID}"/>
</entity>
</entity>
</document>
<!-- deltaQuery集中写在一起
<document name="products">
<entity name="item" pk="id"
query="select * from item"
deltaImportQuery="select * from item where ID='${dataimporter.delta.id}'"
deltaQuery_1="select id from item where last_modified > '${dataimporter.last_index_time}'"
deltaQuery="select id from item where
id in (select item_id as id from feature where last_modified > '${dataimporter.last_index_time}')
or id in (select item_id as id from item_category where
item_id in (select id as item_id from category where last_modified > '${dataimporter.last_index_time}')
or last_modified > '${dataimporter.last_index_time}'
)
or last_modified > '${dataimporter.last_index_time}'" >

<entity name="feature" pk="ITEM_ID"
query="select description as features from feature where item_id='${item.ID}'">
</entity>
<entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'">
<entity name="category" pk="ID"
query="select description as cat from category where id = '${item_category.CATEGORY_ID}'">
</entity>
</entity>
</entity>
</document>
-->
</dataConfig>

  pk="id"中id的大小写要注意
  dataSource更多参数见http://wiki.apache.org/solr/DataImportHandler
  entity属性


[*]query数据查询sql
[*]deltaQuery增加数据
[*]
parentDeltaQuery父entity增加数据
[*]deletedPkQuery?
[*]deltaImportQuery增量数据查询sql,如果没有则会根据query生成(可能生成错误),所以还是自己写的好
  全导入
  http://localhost:8983/solr/db/dataimport?command=full-import
  增量导入
  http://localhost:8983/solr/dataimport?command=delta-import
  其他命令:
  查看结果http://localhost:8983/solr/dataimport
  重新装载配置,修改配置文件后执行,避免重启服务http://localhost:8983/solr/dataimport?command=reload-config
  终止http://localhost:8983/solr/dataimport?command=abort
  执行后看返回的xml结果是否正常,还可以看后台是否有异常, 导入后可查询数据看看是否与数据库中一致
  conf/dataimport.properties中保存有last_index_time, 导入后solr会更新这个时间
  对于数据库中删除的数据?solr中的索引也应该要删除吧, 通过设置删除标记?(是不是最好的方法)
  MORE:
  multiple datasources
  DataImportHandlerDeltaQueryViaFullImport
  
  
页: [1]
查看完整版本: solr DataImportHandler(DIH)