|
一、solr自带导入插件DataImporte设置
1、在solrconfig.xml文件中添加如下内容,引入DataImport功能,并设置配置文件位置。
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">D:\dev\test\solr-tomcat\solr\db\conf\db-data-config.xml</str>
</lst>
</requestHandler>
2、将jdbc的jar包和solr包中的DataImport的jar包拷贝到webapp中solr/WEB-INF/lib目录
二、导入数据
1、从数据库导入数据(sqlserver)
<dataConfig>
<!-- <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/langsin1" user="root" password="root"/> -->
<dataSource type="JdbcDataSource" driver="net.sourceforge.jtds.jdbc.Driver" url="jdbc:jtds:sqlserver://localhost:1433;SelectMethod=Cursor;DatabaseName=dbname" user="sa" password="123"/>
<document name="userss">
<entity name="users" pk="id" query="select * from news_friend_links">
<field column="id" name="id" />
<field column="name" name="name" />
</entity>
</document>
</dataConfig>
详细配置参见:http://wiki.apache.org/solr/DataImportHandler#Extending_the_tool_with_APIs
处理CLOB和BLOB
<dataSource name="ora" driver="oracle.jdbc.OracleDriver" url="...." />
<datasource name="ds-BlobField" type="FieldStreamDataSource" />
<entity dataSource="ora" name="meta" query="select id, filename,content, bytes from documents" <span style="color:#ff6666;">transformer="ClobTransformer"</span>>
<field column="ID" name="id" />
<field column="FILENAME" name="filename" />
<field column="CONTENT" name="CONTENT" <span style="color:#ff6666;">clob="true" </span>/>
<entity dataSource="ds-BlobField" processor="TikaEntityProcessor" url="FILE_CONTENT" dataField="ATTACH.FILE_CONTENT"> <field column="text" name="FJ_FILE_CONTENT" /><!-- 全局搜索 --> <field column="Author" name="FJ_FILE_AUTHOR" meta="true" /> </entity>
</entity>
2、通过http或xml文件导入数据
a、http方式
<dataConfig>
<dataSource type="HttpDataSource" />
<document>
<entity name="slashdot"
pk="link"
url="http://rss.slashdot.org/Slashdot/slashdot"
processor="XPathEntityProcessor"
forEach="/RDF/channel | /RDF/item"
transformer="DateFormatTransformer">
<field column="source" xpath="/RDF/channel/title" commonField="true" />
<field column="title" xpath="/RDF/item/title" />
<field column="link" xpath="/RDF/item/link" />
<field column="date" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
</entity>
</document>
</dataConfig>
b、xml方式
导入单个文件
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name="page"
processor="XPathEntityProcessor"
stream="true"
forEach="/mediawiki/page/"
url="/data/enwiki-20080724-pages-articles.xml"
transformer="RegexTransformer,DateFormatTransformer"
>
<field column="id" xpath="/mediawiki/page/id" />
<field column="title" xpath="/mediawiki/page/title" />
<field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
<field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" sourceColName="text"/>
</entity>
</document>
</dataConfig>
导入多个文件
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name="jc" rootEntity="false" dataSource="null"
processor="FileListEntityProcessor"
fileName=".xml$" recursive="false"
baseDir="D:/"
>
<entity name="page"
processor="XPathEntityProcessor"
stream="true"
forEach="/node/list/"
url="${jc.fileAbsolutePath}"
transformer="RegexTransformer,DateFormatTransformer"
>
<field column="id" xpath="/node/list/id" />
<field column="name" xpath="/node/list/name" />
</entity>
</entity>
</document>
</dataConfig>
三、导执行导入操作
访问:http://localhost:8081/solr/dataimport?command=full-import
其中command参数包括:
full-import:全部导入,清空原数据
delta-import:增量导入
reload-config:重新加载配置文件
版权声明:本文为博主原创文章,未经博主允许不得转载。 |
|
|