设为首页 收藏本站
查看: 655|回复: 0

[经验分享] Solr 创建索引 XML格式

[复制链接]

尚未签到

发表于 2016-12-15 08:24:24 | 显示全部楼层 |阅读模式
  Solr receives commands and possibly document data through HTTP POST.One way to send an HTTP POST is through the Unix command line program curl (also available on Windows through Cygwin: http://www.cygwin.com) and that's what we'll use here in the examples. An alternative cross-platform option that comes with Solr is post.jar located in Solr's example/exampledocs directory. To get some

basic help on how to use it, run the following command:

>> java –jar example/exampledocs/post.jar -help
  You'll see in a bit that you can post name-value pair options as HTML form data. However, post.jar doesn't support that, so you'll be forced to specify the URL and put the options in the query string.(打开post.jar包,看到里面只有一个类SimplePostTool用来转发创建索引的,里面确定了solr服务器的URL只能为:public static final String DEFAULT_POST_URL = "http://localhost:8983/solr/update",对于自己部署的solr服务不能使用)
  There are several ways to tell Solr to index data, and all of them are through  HTTP POST:
  ·     Send the data as the entire POST payload. curl does this with --data-binary (or some similar options) and an appropriate content-type header for whatever the format is.

·     Send some name-value pairs akin to an HTML form submission. With curl, such pairs are preceded by -F. If you're giving data to Solr to be indexed as opposed to it looking for it in a database, then there are a few ways to  do that:

     ° Put the data into the stream.body parameter. If it's small, perhaps less than a megabyte, then this approach is fine. The limit is configured with the multipartUploadLimitInKB setting in solrconfig.xml, defaulting to 2GB. If you're tempted to increase this limit, you should reconsider your approach.

     ° Refer to the data through either a local file on the Solr server using the stream.file parameter or a URL that Solr will fetch through the stream.url parameter. These choices are a feature that Solr calls

remote streaming.
  Here is an example of the first choice. Let's say we have a Solr Update-XML file named artists.xml in the current directory. We can post it to Solr using the following command line:

>> curl http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=utf-8' --data-binary @artists.xml
  If it succeeds, then you'll have output that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int><int name="QTime">128</int>
</lst>
</response>
  To use the stream.body feature for the preceding example, you would do this:

curl http://localhost:8983/solr/update -F stream.body=@artists.xml
   In both cases, the @ character instructs curl to get the data from the file instead of being @artists.xml literally. If the XML is short, then you can just as easily specify it literally on the command line:

curl http://localhost:8983/solr/update -F stream.body='<commit />'
   Notice the leading space in the value. This was intentional. In this example, curl treats @ and < to mean things we don't want. In this case, it might be more appropriate to use form-string instead of -F. However, it's more typing, and I'm feeling lazy.
  Remote streaming

In the preceding examples, we've given Solr the data to index in the HTTP message. Alternatively, the POST request can give Solr a pointer to the data in the form of either a file path accessible to Solr or an HTTP URL to it.
  Just as before, the originating request does not return a response until Solr has finished processing it. If the file is of a decent size or is already at some known URL, then you may find remote streaming faster and/or more convenient, depending on your situation.
  Here is an example of Solr accessing a local file:

curl http://localhost:8983/solr/update -F stream.file=/tmp/artists.xml
   To use a URL, the parameter would change to stream.url, and we'd specify a URL. We're passing a name-value parameter (stream.file and the path), not the actual data.
  Solr's Update-XML format
  Using an XML formatted message, you can supply documents to be indexed, tell Solr to commit changes, to optimize the index, and to delete documents. Here is a sample XML file you can HTTP POST to Solr that adds (or replaces) a couple documents:

<add overwrite="true">
<doc boost="2.0">
<field name="id">5432a</field>
<field name="type" ...</field>
<field name="a_name" boost="0.5"></field>
<!-- the date/time syntax MUST look just like this -->
<field name="begin_date">2007-12-31T09:40:00Z</field>
</doc>
<doc>
<field name="id">myid</field>
<field name="type" ...
<field name="begin_date">2007-12-31T09:40:00Z</field>
</doc>
<!-- more doc elements here as needed -->
</add>
  The overwrite attribute defaults to true to guarantee the uniqueness of values in the field that you have designated as the unique field in the schema, assuming you have such a field. If you were to add another document that has the same value for the unique field, then this document would overwrite the previous document. You will not get an error.
  The boost attribute affects the scores of matching documents in order to affect ranking in score-sorted search results. Providing a boost value, whether at the document or field level, is optional. The default value is 1.0, which is effectively a non-boost. Technically, documents are not boosted, only fields are. The effective boost value of a field is that specified for the document multiplied by that specified for the field.
  Deleting documents
  You can delete a document by its unique field. Here we delete two documents:

<delete><id>Artist:11604</id><id>Artist:11603</id></delete>
  To more flexibly specify which documents to delete, you can alternatively use a Lucene/Solr query:

<delete><query>timestamp:[* TO NOW-12HOUR]</query></delete>
   Commit
  Data sent to Solr is not immediately searchable, nor do deletions take immediate effect. Like a database, changes must be committed first. The easiest way to do this is to add a commit=true request parameter to a Solr update URL. The request to Solr could be the same request that contains data to be indexed then committed or an empty request—it doesn't matter. For example, you can visit this URL to issue a commit on our mbreleases core: http://localhost:8983/solr/update?commit=true. You can also commit changes using the XML syntax by simply sending this to Solr:

<commit />

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-314420-1-1.html 上篇帖子: Solr从数据库导入数据 下篇帖子: [转] 用solr搭建中文搜索应用
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表