Home » Performance » MySQL » SELECT FOR UPDATE using secondary index => 50x times slower than forcing full table scan
SELECT FOR UPDATE using secondary index => 50x times slower than forcing full table scan [message #644] Sun, 28 January 2007 11:54 Go to previous message
philippe  is currently offline philippe
Messages: 7
Registered: January 2007
Location: Paris
Junior Member
Hi all! (Please be kind with my poor english...)

THANK YOU for this blog! :o)

I am developing on my spare time a little CMS storing data in a hierarchical manner. After googling and testing a while, I end up using the "materialized path" solution despite its unefficient storage requirement.

Here is the (simplified) tree table:

create table site_node		(
    id		MEDIUMINT	UNSIGNED	NOT NULL	AUTO_INCREMENT,

    # ---

    path	VARBINARY(32)			NOT NULL,
    children	SMALLINT	UNSIGNED	NOT NULL,

    parent	MEDIUMINT	UNSIGNED	NOT NULL,
    alias	VARBINARY(32)			NOT NULL,

    # ---

    PRIMARY KEY			(id),
    UNIQUE INDEX ak_path	(path),
    UNIQUE INDEX ak_parent	(parent, alias),

    FOREIGN KEY fk_parent	(parent) REFERENCES site_node (id) ON UPDATE CASCADE ON DELETE RESTRICT
)
ENGINE=InnoDB;

"parent" field is used to select direct children (faster than using path index).

"children" field contains the number of direct children (0 => leaf node).

"path" field contains the node path encoded in hexadecimal, each level is 4 bytes long:

mysql> SELECT id, parent, children, path FROM site_node
       WHERE path >= "0001" AND path < "0002"
       ORDER BY path LIMIT 20;
+--------+--------+----------+------------------+
| id     | parent | children | path             |
+--------+--------+----------+------------------+
|      2 |      1 |      521 | 0001             |
|    102 |      2 |      348 | 00010001         |
|   5128 |    102 |      133 | 000100010001     |
| 258090 |   5128 |        0 | 0001000100010001 |
| 261033 |   5128 |        0 | 0001000100010002 |
| 263014 |   5128 |        0 | 0001000100010003 |
| 269425 |   5128 |        0 | 0001000100010004 |
| 272585 |   5128 |        0 | 0001000100010005 |
| 283558 |   5128 |        0 | 0001000100010006 |
| 284133 |   5128 |        0 | 0001000100010007 |
| 284925 |   5128 |        0 | 0001000100010008 |
| 293219 |   5128 |        0 | 0001000100010009 |
| 309769 |   5128 |        0 | 000100010001000A |
| 311042 |   5128 |        0 | 000100010001000B |
| 316655 |   5128 |        0 | 000100010001000C |
| 337223 |   5128 |        0 | 000100010001000D |
| 339034 |   5128 |        0 | 000100010001000E |
| 346686 |   5128 |        0 | 000100010001000F |
| 348222 |   5128 |        0 | 0001000100010010 |
| 363623 |   5128 |        0 | 0001000100010011 |
+--------+--------+----------+------------------+
20 rows in set (0.00 sec)

As expected, retrieving or counting all children from a node is trivial:

Cold:

mysql> SELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path;
+----------+
| COUNT(*) |
+----------+
|   601591 |
+----------+
1 row in set (13.07 sec)

Warm:

mysql> SELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path;
+----------+
| COUNT(*) |
+----------+
|   601591 |
+----------+
1 row in set (0.55 sec)

Here is my problem:

when I need to move or shift a node, I must lock all affected nodes before updating them. For instance, before shifting node id #2 from rank 1 to rank 2 and updating 601591 rows, I lock the whole subtree :

Note: should be path < "0003" but you get the idea.

mysql> EXPLAIN SELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path FOR UPDATE;
+----+-------------+-----------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table     | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+-----------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | site_node | range | ak_path       | ak_path |      32 | NULL | 902702 | Using where; Using index |
+----+-------------+-----------+-------+---------------+---------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)

Index range scan should be fast (Handler_read_next going up) but it is not :

SELECT FOR UPDATE:

mysql> SELECT COUNT(*) FROM site_node
       WHERE path >= "0001" AND path < "0002"
       ORDER BY path FOR UPDATE;
+----------+
| COUNT(*) |
+----------+
|   601591 |
+----------+
1 row in set (20 min 41.33 sec)

Note: using SELECT COUNT(id) does not improve anything.

Innodb output:

LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 36156404, ACTIVE 1208 sec, process no 20386, OS thread id 1448221616 fetching rows, thread declared inside InnoDB 136
mysql tables in use 1, locked 1
26110 lock struct(s), heap size 1944896
MySQL thread id 629, query id 30002552 localhost root Sending data
SELECT COUNT(*) FROM site_node WHERE path >= "0001" AND path < "0002" ORDER BY path FOR UPDATE

BUFFER POOL AND MEMORY
----------------------
Total memory allocated 305426468; in additional pool allocated 1682688
Buffer pool size   16384
Free buffers       0
Database pages     16289
Modified db pages  1
Pending reads 1
Pending writes: LRU 0, flush list 0, single page 0
Pages read 1367173, created 54631, written 605910
228.55 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 883 / 1000

ROW OPERATIONS
--------------
1 queries inside InnoDB, 0 queries in queue
Main thread process no. 20386, id 1439226800, state: waiting for server activity
Number of rows inserted 5000000, updated 10000000, deleted 0, read 72571421
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 598.03 reads/s

What I do not understand is that using a read lock is much faster:

SELECT LOCK IN SHARE MODE:

mysql> SELECT COUNT(*) FROM site_node
       WHERE path >= "0001" AND path < "0002"
       ORDER BY path LOCK IN SHARE MODE;
+----------+
| COUNT(*) |
+----------+
|   601591 |
+----------+
1 row in set (13.57 sec)

After scratching my head a while, I tried this:

Full index scan:

mysql> EXPLAIN SELECT COUNT(*) FROM site_node USE INDEX(PRIMARY)
       WHERE id != 0 AND path >= "0001" AND path < "0002"
       ORDER BY path FOR UPDATE;
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table     | type  | possible_keys | key     | key_len | ref  | rows    | Extra       |
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-------------+
|  1 | SIMPLE      | site_node | range | PRIMARY       | PRIMARY |       3 | NULL | 2501352 | Using where |
+----+-------------+-----------+-------+---------------+---------+---------+------+---------+-------------+
1 row in set (0.04 sec)

mysql> SELECT COUNT(*) FROM site_node USE INDEX(PRIMARY)
       WHERE id != 0 AND path >= "0001" AND path < "0002"
       ORDER BY path FOR UPDATE;
+----------+
| COUNT(*) |
+----------+
|   601591 |
+----------+
1 row in set (23.97 sec)

Yep! Much better. :o|

And this:

Full table scan:

mysql> EXPLAIN SELECT COUNT(*) FROM site_node USE INDEX(PRIMARY)
       WHERE path >= "0001" AND path < "0002"
       ORDER BY path FOR UPDATE;
+----+-------------+-----------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-----------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | site_node | ALL  | NULL          | NULL |    NULL | NULL | 5002703 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM site_node USE INDEX(PRIMARY)
       WHERE path >= "0001" AND path < "0002"
       ORDER BY path FOR UPDATE;
+----------+
| COUNT(*) |
+----------+
|   601591 |
+----------+
1 row in set (20.21 sec)

Sigh... Better again...

The "site_node" table contains 5 millions rows and is arround 1Go (Data 396,288 Mo, Index 525,392 Mo, Total 921,680 Mo).

MySQL is running on a tiny box:

- CPU 3Ghz
- RAM 512 Mo
- HDD 80Go (IDE, 55 Mo/s)

skip-networking
table_cache			= 256

innodb_file_per_table		= 1

innodb_buffer_pool_size		= 256M (I cannot raise it up...)
innodb_additional_mem_pool_size	= 8M

innodb_log_file_size		= 256M
innodb_log_buffer_size		= 8M

innodb_flush_log_at_trx_commit	= 2

Could it be because ak_path index does not fit in buffer pool?

Note: update speed is quite normal (15 000 - 20 000 rows/s).

I have also tried to use "path" field as PRIMARY KEY hoping that it would be faster using a clustered index but in this case total index size grows up and update speed is slower because of B-TREE rebalancing.

Thank you for your help!

PS: No, I do not plan to use nested sets... ;o) (not yet!)

Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic:Filesort/Query performace
Next Topic:deadlock mysql
Goto Forum:

  


Current Time: Thu Jul 9 20:35:32 EDT 2009

Total time taken to generate the page: 0.05746 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 2.7.7.
Copyright ©2001-2007 FUD Forum Bulletin Board Software

MySQL is a trademark of Sun Microsystems.
InnoDB is a trademark of Oracle Corp.

Percona Performance Forums are a service of Percona, Inc.
Not affiliated with Sun Microsystems or Oracle Corp.