设为首页 收藏本站
查看: 666|回复: 0

[经验分享] Full-text Search for SQLite

[复制链接]

尚未签到

发表于 2016-11-29 08:02:57 | 显示全部楼层 |阅读模式


Full-text Search for SQLite

(as of 2006-09-29)





Introduction
  The module fts1
(see also FtsTwo

and FtsUsage

)
adds full-text indexing capabilities to SQLite. It is not yet included
in the SQLite binary distribution; you can find it in the SQLite CVS
tree in the ext/fts1
directory.


  Building fts1


  fts1 can be built either as a standalone shared library, or statically linked into the SQLite library.


  As a shared library


  To build the module as a shared library, compile all source
files in the fts1 directory into a shared library (.so or .dll) on your
platform. (Sorry - there's no makefile checked in yet. Coming soon.)


  Statically linked


  To statically link fts1 into SQLite, add all .c files from the
fts1 directory to the Makefile you use to compile SQLite so that they
will be linked into the SQLite image. You must define the preprocessor
symbols SQLITE_CORE and SQLITE_ENABLE_FTS1 when compiling these files.




  Initializing fts1


  When fts1 is built as a shared library, you can load it into SQLite using the ".load" shell command.



  sqlite> .load fts1


  Or you can load it using a SELECT statement:



  SELECT load_extension('fts1');


  Note that you may need to call sqlite3_enable_load_extension before loading the extension; see the SQLite LoadableExtensions


documentation.

  In a static build, fts1 is always available; there's no need to load or otherwise initialize it.


  Using fts1


  Full-text tables store one or more columns of fully indexed
text. You can create a full-text table using the CREATE VIRTUAL TABLE
statement. For example, the following creates a table with columns name
and ingredients
:



  sqlite>create virtual table recipe using fts1(name, ingredients);


  You can insert rows into a full-text table in the same way as into an ordinary table with columns of type TEXT:



  sqlite>insert into recipe (name, ingredients) values ('broccoli stew', 'broccoli peppers cheese tomatoes');

sqlite>insert into recipe (name, ingredients) values ('pumpkin stew', 'pumpkin onions garlic celery');

sqlite>insert into recipe (name, ingredients) values ('broccoli pie', 'broccoli cheese onions flour');

sqlite>insert into recipe (name, ingredients) values ('pumpkin pie', 'pumpkin sugar flour butter');


  The MATCH operator performs a full-text match on a column in a full-text table:



  sqlite> select rowid, name, ingredients from recipe where name match 'pie';

3|broccoli pie|broccoli cheese onions flour

4|pumpkin pie|pumpkin sugar flour butter

sqlite>


  As can be seen in the preceding output, every row in a full-text table has a unique rowid, just as in any other SQLite table.


  A query may contain multiple terms, in which case it will return only documents containing all of the terms:



  sqlite> select name, ingredients from recipe where ingredients match 'onions cheese';

broccoli pie|broccoli cheese onions flour

sqlite>


  OR queries


  Inside a query, the OR operator may be used to retrieve documents containing either of two terms:



  sqlite> select name, ingredients from recipe where ingredients match 'onions OR cheese';

broccoli stew|broccoli peppers cheese tomatoes

pumpkin stew|pumpkin onions garlic celery

broccoli pie|broccoli cheese onions flour

sqlite>


  Note that the OR in this query must be capitalized.


  The OR operator binds more tightly than the implicit AND between
two adjacent terms. Thus, the query 'onions OR cheese pumpkin' matches
text which contains either "onions" or "cheese", and also contains
"pumpkin":



  sqlite> select name, ingredients from recipe where ingredients match 'onions OR cheese pumpkin';

pumpkin stew|pumpkin onions garlic celery

sqlite>


  fts1 does not currently provide any grouping operator (i.e. parentheses) for overriding this default precedence.


  Excluding terms


  The - operator excludes any documents containing the term which follows it:



  sqlite> select name, ingredients from recipe where ingredients match 'onions -celery';

broccoli pie|broccoli cheese onions flour

sqlite>


  Note that a query must contain at least one non-excluded term:



  sqlite> select name, ingredients from recipe where ingredients match '-celery';  /* invalid! */

SQL error: SQL logic error or missing database

sqlite>


  Phrase searches


  Phrases may be enclosed in double quotes:



  sqlite> select name, ingredients from recipe where ingredients match '"broccoli cheese"';

broccoli pie|broccoli cheese onions flour

sqlite>


  Querying multiple columns


  To query all columns in a full-text table simultaneously, use the table's name on the left-hand side of the MATCH operator:



  sqlite> select name, ingredients from recipe where recipe match 'pie';

broccoli pie|broccoli cheese onions flour

pumpkin pie|pumpkin sugar flour butter

sqlite>


  When an all-column query contains multiple terms, a row will match even if the terms appear in different columns in the row:



  sqlite> select name, ingredients from recipe where recipe match 'sugar pie';

pumpkin pie|pumpkin sugar flour butter

sqlite>


  Any term in a query string may be preceded by the name of a particular column to use for matching that term:



  sqlite> select name, ingredients from recipe where recipe match 'name:pie ingredients:onions';

broccoli pie|broccoli cheese onions flour

sqlite>


  The following are entirely equivalent:



  sqlite> select name from recipe where ingredients match 'sugar';

sqlite> select name from recipe where recipe match 'ingredients:sugar';


  When a specific column name appears to the left of the MATCH
operator, that column is used for matching any term without an explicit
column qualifier. Thus, the following are equivalent:



  sqlite> select name from recipe where recipe match 'name:pie ingredients:onions';

sqlite> select name from recipe where name match 'pie ingredients:onions';


  Note that fts1 currently requires a query to contain at most one
instance of the MATCH operator. This means that to match multiple
specific columns you must use field specifiers as described above; the
following won't work:



  sqlite> select name, ingredients from recipe

...>        where name match 'pie' and ingredients match 'onions';   /* invalid! */

SQL error: unable to use function MATCH in the requested context

sqlite>


  Retrieving offset information


  When returning full-text matches, fts1 can return the character
offsets of individual term matches. To retrieve offset information in a
query, call the offsets
function and pass it the name of your full-text table:



  sqlite> select name, ingredients, offsets(recipe) from recipe where recipe match 'sugar pie';

pumpkin pie|pumpkin sugar flour butter|0 1 8 3 1 0 8 5

sqlite>


  offsets
returns a string containing a series of
integers separated by spaces. Each match is represented by a series of
four consecutive integers:





  • The index of the column containing the match.  Columns are numbered starting from 0.


  • The term in the query expression which was matched.  Terms are numbered starting from 0.


  • The byte offset of the first character of the matching phrase, measured from the beginning of the column's text.


  • Number of bytes in the match.

  For example, in the query above two matches are returned, which we can decode as follows:





  • In column 0 (name), term 1 ("pie") matched at byte offset 8; the match is 3 bytes long (the length of "pie").


  • In column 1 (ingredients), term 0 ("sugar") matched at byte offset 8; the match is 5 bytes long (the length of "sugar").

  Generating snippets


  fts1 can generate snippets
of text surrounding each match returned in a full-text query.  To produce a snippet, call the snippet
function and pass it the name of your full-text table:



  sqlite> create virtual table poem using fts1(name, text);

sqlite> insert into poem values ('ozymandias', 'i met a traveller from an antique land who said: two vast and trunkless legs of stone stand in the desert');

sqlite> select name, snippet(poem) from poem where text match 'land';

ozymandias|i met a traveller from an antique <b>land</b> who said: two vast and trunkless legs of <b>...</b>

sqlite>


  By default, each matching term in a snippet is surrounded with the
delimiters "<b>" and "</b>", and an ellipsis "..."
indicates text not included in a snippet. (Note that the ellipsis is
itself surrounded by delimiters "<b>" and "</b>".)


  You can specify your own term delimiters and ellipsis text by specifying extra arguments to the snippet
function.  Its arguments are as follows:





  • The name of the virtual table being queried.


  • Markup to put before each matching word. Default: <b>


  • Markup to put after each matching word. Default: </b>


  • Ellipsis markup. Default: <b>...</b>

  For example:



  sqlite> select name, snippet(poem, '[', ']', '%%') from poem where text match 'land';

ozymandias|i met a traveller from an antique [land] who said: two vast and trunkless legs of %%

sqlite>


  Joining full-text data


  A full-text table stores only full-text-indexed strings. To
store full-text-indexed values along with other values, it's convenient
to use both a full-text table and an ordinary SQLite table, joined by
rowid. For example, suppose that you'd like to store a set of messages;
each message has a sender and priority, which are not full-text
indexed, and a subject and body, which are full-text indexed. You can
use the following schema:



  create table email(sender text, priority integer);

create virtual table email_text using fts1(subject, body);


  You can insert a new message as follows:



  insert into email (sender, priority) values ('ashley@foo.com', 4);

insert into email_text (rowid, subject, body) values (last_insert_rowid(), 'update', 'coming home now');


  To find the sender of all messages containing the word "jam" you can issue a query joining the email and email_text tables:



  select sender, subject from email join email_text on email.rowid = email_text.rowid

where body match 'jam';


  Tokenization


  As the module indexes a piece of text, it converts the text to a
sequence of tokens. Each token becomes a term in the index and can be
matched using a full-text query.


  The module currently uses the following generic tokenization
mechanism. A token is a contiguous sequence of alphanumeric ASCII
characters (A-Z, a-z and 0-9). All non-ASCII characters are ignored.
Each token is converted to lowercase before it is stored in the index,
so all full-text searches are case-insensitive. The module does not
perform stemming of any sort.


  Soon, we hope to allow applications to define their own
tokenizers (we in fact already have a generic tokenizer mechanism in
our code; we just have yet to expose it to the outside world).


  Performance


  There are two steps you can take to greatly improve performance
when inserting documents into a full-text index. First, you can set the
synchronous pragma to OFF:



   sqlite>pragma synchronous = off;


  In our testing we've found that this dramatically increases indexing
speed. Of course, you should study the SQLite documentation (see http://www.sqlite.org/pragma.html
)
to understand the safety/speed tradeoff which this pragma controls and
to determine the setting which is right for your application.


  Secondly, you can index more than one document per transaction.
In our testing, we've found throughput to be best when we index at
least 50 or so documents in each transaction; this dramatically
improves performance over the one-document-per-transaction case.


  We're still in the process of assessing and improving performance.


  Internals


  Every full-text table contains a column with the same name as
the table itself. This self-named column is used in multi-column
queries as described above, and will appear in queries for * :



  sqlite> create virtual table foo using fts1(name, address);

sqlite> insert into foo (name, address) values ('amanda', '43 elm avenue');

sqlite> .header on

sqlite> select * from foo;

name|address|foo

amanda|43 elm avenue|8&#232;5

sqlite>


  This column's value is private to fts1. When querying a full-text
table, you probably want to specify column names explicitly rather than
using *, to avoid seeing this self-named column.


  When you create a full-text table, fts1 creates two backing tables which hold the table's contents and full-text index:



  sqlite> .tables

foo foo_content foo_term

sqlite>


  You should not access the backing tables directly; their format is internal to fts1 and is subject to change.


  Caveats


  Please note that the full-text database format is subject to
change at any time. We are not planning to implement backward
compatibility in updates in the near future, so new code releases may
fail spectacularly with old databases. Of course this will change at
some future point once our data structures become more stable.


  Missing features


  The full-text module is still in an early development phase.  The following features are missing but hopefully coming soon:





  • We plan to support prefix queries (e.g. "foo*").


  • Applications will be able to specify custom tokenizers.

  Requests/ideas/suggestions





  • Some way to return how many matches we had, or some "relevance" value (lalo dot martins at gmail dot com, 07-aug-04
    )

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-306875-1-1.html 上篇帖子: SQLITE源码剖析(10) 下篇帖子: Sqlite 数据库源码
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表