solr(四)索引文件之Extract Metadata

84366992 · 发表于 2016-12-16 07:45:29

　　假设客户要上传一个标准的mp3，服务器要对其做以下几件事:
　　1) 上传到server并将相关作者名，作品名,url 等存入数据库。
　　2) 建立索引，当然用solr。
　　这个看上去没有什么问题，但主要是，我们可能不知道mp3的作者是谁，它的作品名也与文件名不同，数量多的话不可能一个个去看，那么用solr的Extract Metadata功能就可以搞定，步骤如下。
　　1: 在${catalina_home}\solr_config\solr\collection1\conf\schema.xml中定义几个字段（如果存在则不用定义）。

<field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
<field name="url" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<dynamicField name="ignored_*" type="string" multiValued="true"/>
　　启动程序，代码如下：

public static void indexFilesSolrCell(String fileName, String solrId)
throws IOException, SolrServerException {
String urlString = "http://localhost:8080/solr";
HttpSolrServer solr = new HttpSolrServer(urlString);
ContentStreamUpdateRequest up
= new ContentStreamUpdateRequest("/update/extract");
up.addFile(new File(fileName),"audio/mp3");
up.setParam("literal.id", solrId);
up.setParam("literal.url", "http://189.256.23.10:8080/UploadServer/upload/Woman.mp3");
//up.setParam("literal.image", "http://189.256.23.10:8080/UploadServer/upload/Woman.jpg");
up.setParam("literal.description", "这是mp3的简介");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
solr.request(up);
QueryResponse rsp = solr.query(new SolrQuery("id:Woman.mp3"));
System.out.println(rsp);
}
　　运行结果如下:

{
responseHeader={
status=0,
QTime=0,
params={
q=id: Woman.mp3,
wt=javabin,
version=2
}
},
response={
numFound=1,
start=0,
docs=[
SolrDocument{
ignored_meta=[
xmpDM: releaseDate,
2013-04-08,
dc: creator,
张靓颖,
xmpDM: album,
OneWoman,
Author,
张靓颖,
xmpDM: artist,
张靓颖,
creator,
张靓颖,
xmpDM: audioCompressor,
MP3,
meta: author,
张靓颖,
stream_content_type,
audio/mp3,
stream_size,
null,
Content-Type,
audio/mpeg,
dc: title,
OneWoman
],
url=http: //189.256.23.10: 8080/UploadServer/upload/Woman.mp3,
description=这是mp3的简介,
id=Woman.mp3,
ignored_image=[
http: //189.256.23.10: 8080/UploadServer/upload/Woman.jpg
],
ignored_xmpdm_releasedate=[
2013-04-08
],
ignored_xmpdm_audiochanneltype=[
Stereo
],
ignored_dc_creator=[
张靓颖
],
ignored_xmpdm_album=[
OneWoman
],
author=张靓颖,
author_s=张靓颖,
ignored_xmpdm_artist=[
张靓颖
],
ignored_channels=[
2
],
ignored_xmpdm_audiosamplerate=[
44100
],
ignored_version=[
MPEG3LayerIIIVersion1
],
ignored_creator=[
张靓颖
],
ignored_xmpdm_audiocompressor=[
MP3
],
title=[
OneWoman
],
title_copy=OneWoman,
ignored_samplerate=[
44100
],
ignored_meta_author=[
张靓颖
],
ignored_stream_content_type=[
audio/mp3
],
ignored_stream_size=[
null
],
content_type=[
audio/mpeg
],
ignored_dc_title=[
OneWoman
],
content=[
OneWomanOneWoman张靓颖OneWoman2013-04-08
],
content_copy=OneWomanOneWoman张靓颖OneWoman2013-04-08,
_version_=1431728246751232000
}
]
}
}
　　可见只用了一个mp3的文件，就可以索引很多内容，那么，再将数据存入数据库就没什么问题了。你也可以直接在页面上query中查询,结果是一样的。
　　注意，这里由于我们的solr的所有config文件都是从example里面copy出来的，所以有些路径要修改，我这里暂时改成了绝对路径,请查看solrConfig.xml的<lib >标签，确保路径正确。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] solr(四)索引文件之Extract Metadata

浏览过的版块

扫码加入运维网微信交流群