[ solr备忘录 ]

tile · 发表于 2015-7-17 09:53:28

　　1.对于“关注度排序问题”的记录
　　在查阅资料是发现：ExternalFileField is handy for cases where you want to update a particular field in many documents more often than you want to update the rest of the documents. For example, suppose you have some kind of document rank based on number of views . You might want to update the rank of all the documents daily or hourly , while the rest of the contents of the documents might be update much less frequently .
　　Without ExternalFileField , you would need to update each document just to change the rank. Using ExternalFileField is much more efficient because all document values for a particular field are stored in an external file that can be updated as frequently as you wish .
　　An attribute in the field type declaration, valType, specifies the actual type of the values that will be found in the file. Note that only pfloat fields are currently supported.

　　The file itself is located in Solr's index directory, which by default is data/index in the Solr home directory.
　　The file contains entries that map a key field, on the left of the equals sign, to a value, on the right.　　
　　(20120215)
　　
　　2.field type属性的用例
　　Field Type Properties by Use Case

Use Case	indexed	stored	multivalued	omitNorms	termVectors	termPositions
search within	true
retrieve contents		true
use as unique key	true		false
sort on field	true		false	true
use field boosts				false
document boosts affect searches				false
highlighting	true	true			true	true
faceting	true
add multiple values,maintaining			true
field length affects doc score				false
MoreLikeThis					true

　　(20120215)
　　
　　3.Lucene's near-real-time search is fast !(NRT)
　　Near Realtime!
　　Near realtime search means thats documents are available for search almost immediately after being indexed - additions and updates to documents are seen in 'near' realtime .
　　[lucene wiki http://wiki.apache.org/lucene-java/NearRealtimeSearch]
　　...One goal of the near realtime search design is to make NRT as transparent as possible to the user. Another is minimize the latency after an update is made to perform a search that includes the update...Index Writer manages the subreaders internally so there is no need to call reopen, instead getReader may be used. NRT adds an internal ram directory(Lucene-1313) to index writer where documents are flushed to before being merged to disk. This technique decreases the turnaround time required for updating the index when calling getReader...

IndexWriter writer; // create an IndexWriter here
Document doc = null; // create a document here
writer.addDocument(doc); // update a document
IndexReader reader = writer.getReader(); // get a reader with the new doc
Document addedDoc = reader.document(0);

　　[solr wiki http://wiki.apache.org/solr/NearRealtimeSearch]
Near realtime search means thats documents are available for search almost immediately after being indexed - additions and updates to documents are seen in 'near' realtime.
Near realtime search will be added to Solr in version 4.0 and is currently available on trunk.
You can now modify a commit command to be a 'soft' commit. A soft commit will avoid parts of the standard commit that can be costly. You still will want to do normal commits to ensure that documents are on stable storage, but soft commits allow users to see a very near realtime view of the index in the meantime. Be sure to pay special attention to cache and autowarm settings as they can have a significant impact on NRT performance.

You can read about soft commits here: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
You can see how to auto soft commit here: http://wiki.apache.org/solr/SolrConfigXml?#Update_Handler_Section

A common configuration might be to 'hard' auto commit every 1-10 minutes and 'soft' auto commit every second. With this configuration, new documents will show up within about a second of being added, and if the power goes out, you will be certain to have a consistent index up to the last 'hard' commit.
There is also a blog post detailing some of the current improvements in this area on trunk located here: http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-%E2%80%98near-realtime%E2%80%99-improvements/
其他参考：
　　http://java.dzone.com/news/lucenes-near-real-time-search
　　(20120220)
　　
　　
　　
　　
　　
　　
　　
　　

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] [ solr备忘录 ]

浏览过的版块

扫码加入运维网微信交流群