Efficient Pagination Using MySQL
Surat Singh Bhati (surat@yahoo-inc.com)
Rick James (rjames@yahoo-inc.com)
Yahoo Inc
Percona Performance Conference 2009
- 2 -
Outline
1. Overview
– Common pagination UI pattern
– Sample table and typical solution using OFFSET
– Techniques to avoid large OFFSET
– Performance comparison
– Concerns
- 3 -
Common Patterns
- 4 -
Basics
First step toward having efficient pagination over large data set
– Use index to filter rows (resolve WHERE)
– Use same index to return rows in sorted order (resolve ORDER)
Step zero
– http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html
– http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html
– http://dev.mysql.com/doc/refman/5.1/en/limit-optimization.html
- 5 -
Using Index
KEY a_b_c (a, b, c)
ORDER may get resolved using Index
– ORDER BY a
– ORDER BY a,b
– ORDER BY a, b, c
– ORDER BY a DESC, b DESC, c DESC
WHERE and ORDER both resolved using index:
– WHERE a = const ORDER BY b, c
– WHERE a = const AND b = const ORDER BY c
– WHERE a = const ORDER BY b, c
– WHERE a = const AND b > const ORDER BY b, c
ORDER will not get resolved uisng index (file sort)
– ORDER BY a ASC, b DESC, c DESC /* mixed sort direction */
– WHERE g = const ORDER BY b, c /* a prefix is missing */
– WHERE a = const ORDER BY c /* b is missing */
– WHERE a = const ORDER BY a, d /* d is not part of index */
- 6 -
Sample Schema
CREATE TABLE `message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`user_id` int(11) NOT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`create_time` int(11) NOT NULL,
`thumbs_up` int(11) NOT NULL DEFAULT '0', /* Vote Count */
PRIMARY KEY (`id`),
KEY `thumbs_up_key` (`thumbs_up`,`id`)
) ENGINE=InnoDB
mysql> show table status like 'message' \G
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 50000040 /* 50 Million */
Avg_row_length: 565
Data_length: 28273803264 /* 26 GB */
Index_length: 789577728 /* 753 MB */
Data_free: 6291456
Create_time: 2009-04-20 13:30:45
Two use case:
Paginate by time, recent message one page one
Paginate by thumps_up, largest value on page one
- 7 -
Typical Query
1. Get the total records
SELECT count(*) FROM message
2. Get current page
SELECT * FROM message
ORDER BY> http://domain.com/message?page=1
ORDER BY> http://domain.com/message?page=2
ORDER BY> http://domain.com/message?page=3
ORDER BY>
Note:> –
- 8 -
Explain
mysql> explain SELECT * FROM message
ORDER BY> LIMIT 10000, 20\G
***************** 1. row **************
id: 1
select_type: SIMPLE
table: message
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 10020
Extra:
1 row in set (0.00 sec)
– it can read rows using index scan and execution will stop as soon as it finds
required rows.
– LIMIT 10000, 20 means it has to read 10020 and throw away 10000 rows, then
return next 20 rows.
- 9 -
Performance Implications
– Larger OFFSET is going to increase active data set, MySQL has to bring data
in memory that is never returned to caller.
– Performance issue is more visible when your have database that can't fit in
main memory.
– Small percentage of request with large OFFSET would be able to hit disk I/O
Disk I/O bottleneck
– In order to display “21 to 40 of 1000,000” , some one has to count 1000,000
rows.
- 10 -
Simple Solution
– Do not display total records, does user really care?
– Do not let user go to deep pages, redirect him
http://en.wikipedia.org/wiki/Internet_addiction_disorder after certain number of
pages
- 11 -
Avoid Count(*)
1. Never display total messages, let user see more message by clicking
'next'
2. Do not count on every request, cache it, display stale count, user do not
care about 324533 v/s 324633
3. Display 41 to 80 of Thousands
4. Use pre calculated count, increment/decrement value as insert/delete
happens.
- 12 -
Solution to avoid offset
1. Change User Interface
– No direct jumps to Nth page
2. LIMIT N is fine, Do not use LIMIT M,N
– Provide extra clue about from where to start given page
– Find the desired records using more restricted WHERE using given clue and
ORDER BY and LIMIT N without OFFSET)
- 13 -
Find the clue
150
111
102 Page One
101
100
98
97
96 Page Two
95
94
93
92
91 Page Three
90
89
Next
Prev
Prev
Next
Next
- 14 -
Solution using clue
Next Page:
http://domain.com/forum?page=2&last_seen=100&dir=next
WHERE>
ORDER BY> Prev Page:
http://domain.com/forum?page=1&last_seen=98&dir=prev
WHERE>
ORDER BY> Reverse given 10 rows before sending to user
- 15 -
Explain
mysql> explain
SELECT * FROM message
WHERE>
ORDER BY> *************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: message
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
Rows: 25000020 /* ignore this */
Extra: Using where
1 row in set (0.00 sec)
- 16 -
What about order by non unique values?
We can't do:
WHERE thumbs_up < 98
ORDER BY thumbs_up DESC /* It will return few seen rows */
Can we say this:
WHERE thumbs_up - 18 -
Solution
First Page
SELECT thumbs_up,> FROM message
ORDER BY thumbs_up DESC,> LIMIT $page_size
+-----------+----+