|
A
Quick Guide to SQLite and Ruby
I spent last night playing with SQLite
and am convinced that this is a tool which could prove incredibly useful to
coders and a great tool for learners to check out SQL.
The problem is that there isn’t enough documentation for Ruby users who want to
take advantage of SQLite’s features.
So, let’s talk about SQLite’s handsome features:
- SQLite is swift. In my own testing, I have found it to be speedy.
Some speed comparisons with MySQL and PostgreSQL are here.
- SQLite is not a large database server, such as MySQL. You don’t connect to the database. Using
SQLite, you access a database file. Everything happens in-process.
- SQLite is an ACID database. Supports transactions, triggers.
- SQLite is public domain. Absolutely no licensing issues.
- SQLite is typeless. Any type or length of data may be stored in a
column, regardless of the declared type. This allows extreme flexibility and
avoidance of type errors.
- SQLite allows custom functions and aggregates. This is my favorite
feature of SQLite, which we will explore shortly.
Getting Started with SQLite
SQLite is available for most platforms (Linux, BSD,
Windows) from the download
page. SQLite comes with a command-line tool for managing databases. You can
find a decent tutorial for starting with SQLite here.
In RAA, you’ll find several Ruby
libraries for using SQLite. The ruby-dbi module is great if you want
your code to work if you switch databases, but you’re hampered in using some of
SQLite’s features. (If you plan on using ruby-dbi, I would be aware of how SQLite compares to SQL-92, so your queries can be portable as well.)
The other two libraries (ruby-sqlite and sqlite-ruby) have custom APIs for
accessing SQLite, which will allow us to add custom functions and aggregates, as
well as set table metadata. I suggest sqlite-ruby, as it is a bit more
feature complete. Either will work fine, but the rest of this tutorial will
focus on using sqlite-ruby.
Creating a Database
To open a new database (or an existing one), simply instantiate a SQLite::Database object with the name of the database
file:
require 'sqlite'
db = SQLite::Database.new( 'sample.db', 0644 )
According to SQLite
docs, the second argument passed to the constructor is “intended to signal
whether the database is going to be used for reading and writing or just for
reading.” But in current implementations, this argument is ignored. All
databases are opened for both reading and writing, though it is anticipated that
readonly databases could be added in the future.
SQLite stores all of the data for a database inside a single file. This
encompasses all indices, tables and schemas for the entire database. The
advantage is that this single file can be easily transported wherever you like.
The same database file can be included with your software and accessed on
Windows, Linux, or any other supported platform.
The disadvantage to a single database file is that this file can grow quite
large. Even after you’ve deleted rows or entire tables, your file may not
decrease in size. To free the disk space once again, you’ll need to execute the
VACUUM statement, which cleans up tables and
indices. The VACUUM statement can be run
alone to clean the whole database.
Passing Queries to SQLite
The execute method can be used to pass
queries to your database, once it is open.
db.execute <<SQL
CREATE TABLE sites (
idx INTEGER PRIMARY KEY,
url VARCHAR(255)
);
SQL
You can also test the completeness of your SQL
statements with the complete?
methods.
>> db.complete? "SELECT *"
=> false
>> db.complete? "SELECT * FROM email;"
=> true
On its own, execute will simply return
an Array of Hashes as the resultset. Passing a block into execute will cause the block to be called on each
successive loading of a row. In such a case, it becomes a sort of “each_row” for
a query, each time receiving a Hash of field-value pairs.
db.execute( "SELECT * FROM sites;" ) do |site|
puts "-> Site #%d %s" % [ site['idx'], site['url'] ]
end
Vital Pragma
SQLite has a few features enabled by default that you might consider
disabling. These are optimizations that have consequences and I present them for
your careful thought. I am giving you the basics. Futher optimizations can be
had at the SQLite
Optimization FAQ.
The cache_size setting determines how
many database pages can be kept in memory. The default settings is 2000, counted in 1KB chunks. Consider increasing this
before executing queries on large sets of data. (Especially updates to large
tables.) This setting can dramatically speedup such situations. Use PRAGMA cache_size to set.
By calling PRAGMA
default_synchronous=OFF;, you can turn off the intensive
synchronization of the database. When set, queries will wait for a database to
be completely written before executing. On truly mission-critical apps, this may
be necessary, but generally you can turn this off.
If you’re not worried about how many rows are affected following an
UPDATE or INSERT, consider using PRAGMA count_changes=OFF;, which will disable
counting of affected rows. A smaller speedup in this case, but still worth
noting.
Custom Functions
SQLite comes with a variety of common functions for forming
expressions. For example, you may want to uppercase a field you are reading:
db.execute( "SELECT UPPER(url) FROM sites;" )
You can add your own Ruby functions to SQLite by using the create_function method. To make our own function for
reversing a field’s contents:
db.create_function( 'revers', 1,
proc { |ctx,s| s.to_s.reverse }, nil )
The first parameter we pass in is the name of the function to create. SQLite
will ignore casing of this string. The second parameter indicates the number of
parameters to send to the function. The third parameter is a Proc object. The
fourth parameter should allow you to pass further data into the Proc, but
doesn’t appear to be implemented at the time of this writing.
The proc object you create should receive an extra initial argument, listed
above as ctx. This is a SQLite::Context object, which allows you store data
between calls. I’ve found this object to be quite buggy when used in functions.
But, hey, it’s there.
To call our new revers function:
db.execute( "SELECT REVERS(url) FROM sites;" )
One thing to note about the create_function method is that your proc should not
return any sort of object which is a collection (Array, Hash, etc.) The object
won’t make the translation in and out of the database.
Like Ruby, you may also override the current set of functions. For example,
the @Y LIKE X@ syntax is syntactical sugar for
the like(X,Y)= function. If you want to support regular
expressions in your =LIKE statement, you could override
LIKE to do so:
like_function = proc do |ctx, x, y|
1 if /#{ x }/ =~ y
end
db.create_function( 'like', 2, like_function, nil )
db.execute( "SELECT url FROM sites WHERE url LIKE '^http:'" )
Custom Aggregates
Aggregates are similiar to functions, but their return is totaled for a set
of rows. If you’ve used much SQL, you’ve seen these
before in the form of count, avg, or sum
functions.
To create an aggregate, you provide two procs. One which is called for each
row like a function. The other proc is called upon completion of the query and
provides a final total.
sum_up_1 = proc do |ctx, a|
ctx.properties["sum"] ||= 0
ctx.properties["sum"] += a.length
end
sum_up_2 = proc do |ctx|
ctx.properties["sum"]
end
db.create_aggregate( 'letter_count', 1,
sum_up_1, sum_up_2, nil )
db.execute( "SELECT LETTER_COUNT(address) FROM email" )
The above code totals the letter count for all of the address fields in a set
of rows.
So how does SQLite do this? Remember that since SQLite is executed
in-process, you can pass memory addresses to it. A function pointer is passed
inside the SQLite extension, which calls your proc. I haven’t done any
benchmarking, but I imagine the figures are pretty tight for these calls.
Storing Binary Data
Storing binary data is a big use case for SQLite. If I was going to write an
adventure game in Ruby, I would lodge all my scenes and characters in an SQLite
database.
But remember I said that SQLite was typeless? This means that you
can’t get away with storing binary data in a BLOB.
BLOBs, CHARs, TEXTs are all the
same datatypes which only store null-terminated strings. SQLite comes with two
API functions, sqlite_encode_binary and sqlite_decode_binary, but these aren’t implemented in
any Ruby APIs currently.
A quick solution is to use Ruby’s base64 library. Really, base64 is a bit much, since we really only need to
escape ’\000’ (which is what sqlite_encode_binary does). Until we can get those
function exposed, though, certainly use base64.
Let’s declare our table with a BLOB to indicate that
we plan to store binary data and to give our table some degree of
portability.
db.execute << SQL
CREATE TABLE scenes (
idx INTEGER PRIMARY KEY,
background_png BLOB
);
SQL
To store binary data in our table:
require 'base64'
background_png = File.open( 'background.png' ).read
db.execute( "INSERT INTO scenes (background_png) VALUES " +
"('#{ encode64( background_png ) }');" )
To read binary data from our table and write it out to files:
db.execute( "SELECT * FROM scenes" ) do |scene|
background_png = decode64( scene['background_png'] )
File.open( "back-#{ idx }.png", "w" ) do |back_out|
back_out << background_png
end
end
Alternatively (if you’re mental), you could load the schema for your database
and parse out the blobs. Try this query, after creating the scenes table:
SELECT sql FROM
(SELECT * FROM sqlite_master UNION ALL
SELECT * FROM sqlite_temp_master)
WHERE tbl_name = 'scenes' AND type != 'meta'
You’ll receive the CREATE TABLE
statement we used to create the table. BLOBs could be
parsed out when the database is loaded and handled differently. (To myself:
why am I even suggesting this?! Probably to demonstrate metadata access
without having to write a new section on it!)
Conclusion
Hopefully this is a fitting introduction to SQLite in Ruby. If not, please
contact me and spew wisdom. |
|