Home » Performance » MySQL » Clustered index decision in a large table
Clustered index decision in a large table [message #2210] Fri, 16 November 2007 12:13 Go to next message
eduardorochabr  is currently offline eduardorochabr
Messages: 4
Registered: November 2007
Junior Member
Hi,

There is a table (InnoDB) with 200 million rows, like this:

`id` int(11) NOT NULL auto_increment,
`user_id` int(11) NOT NULL,
`date` datetime NOT NULL,
`message` text,
PRIMARY KEY (`id`),
CONSTRAINT `users_fk` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE,
KEY `users_date_idx` (`user_id`,`date`)

This table is very INSERT and SELECT intensive, with few UPDATES and DELETE statements.

I am having trouble with AUTO_INC locks at INSERTs, but I think migrating to 5.1.23 can help this one (right?)

Also, I would like to improve SELECT speed without sacrificing INSERT's too much.

The SELECT is basically "SELECT text FROM t WHERE user_id = n ORDER BY date DESC";

I am wondering about changing the primary key to [`user_id`, `date`, `random_number`] and remove the `id` and the secondary index. The INSERTs are random concerning "user_id", so I am affraid that it will slow down them. However, this could improve SELECT performance.

Could someone post some advice on this change, having in mind that both INSERTs and SELECTs are important? Thanks!!!
Re: Clustered index decision in a large table [message #2211 is a reply to message #2210 ] Fri, 16 November 2007 18:31 Go to previous messageGo to next message
sterin  is currently offline sterin
Messages: 324
Registered: March 2007
Location: Sweden
Senior Member
Yes concurrent innodb inserts should be faster with 5.1.23.

How common is it that one user_id has more than one insert per second?
If that basically never happens then you can go right ahead and drop the id column and create the primary key on (user_id, date) and skip the random column (which is not such a good idea).
(if you have rare cases with conflicting (user_id, date) then you could just insert with a value of one second later, since usually a second here or there is not so important).

As for inserts going slower due to random user_id I think it is unlikely since there is one less index to update.
And the fact that you get 4*200 million (no id column) = 800MB less data in the database should also make you gain some speed.
Re: Clustered index decision in a large table [message #2212 is a reply to message #2210 ] Fri, 16 November 2007 18:47 Go to previous messageGo to next message
eduardorochabr  is currently offline eduardorochabr
Messages: 4
Registered: November 2007
Junior Member
There are about 1.3 million "users" (actually it's not a user, but it's a good analogy). There is one insert (multiple rows) per hour for each user, but there are about 50~100 threads doing inserts concurrently, and the table grows fast.

Good tip about the seconds in the date.

I am afraid about the downtime this migration will need. Because it's a PK change, MySQL will have to re-cluster the table, right? Is it better to export/import the table in this case?

Thanks!
Re: Clustered index decision in a large table [message #2213 is a reply to message #2212 ] Fri, 16 November 2007 21:27 Go to previous messageGo to next message
sterin  is currently offline sterin
Messages: 324
Registered: March 2007
Location: Sweden
Senior Member
Yes I think export/import is probably the better solution and when you do, export the data in user_id, date order.
Because then the import don't have to write anywhere else but at the end of the table and InnoDB has an optimization for that and since you already have that index in place in the current table it should be feasible.

But I would suggest that you start with exporting/importing a smaller batch like maybe 100,000 rows or something to begin with just so you can get an estimation on how long time the entire table will take.

To minimize downtime if you only insert once an hour then you could put a read lock on the table before you start exporting.
Then you import the data into a new table while the old one is still available for reading.
When the entire import is complete you rename the original table to something old.
And then rename the new table to the original name.
And then you release the read lock.
After this you can delete the old table.

This works since the renames are very fast.

Only disadvantage with this approach is that the InnoDB tablespace will grow to be twice the needed size since it will store the duplicate table.
But that is maybe not a problem?
Re: Clustered index decision in a large table [message #2219 is a reply to message #2213 ] Mon, 19 November 2007 09:46 Go to previous messageGo to next message
eduardorochabr  is currently offline eduardorochabr
Messages: 4
Registered: November 2007
Junior Member
Hi sterin, you're right, I don't think tablespace is a big concern. Thank you for the information. I will take it into account when doing this migration. Thanks!
Re: Clustered index decision in a large table [message #2254 is a reply to message #2213 ] Thu, 22 November 2007 18:29 Go to previous message
eduardorochabr  is currently offline eduardorochabr
Messages: 4
Registered: November 2007
Junior Member
Hi sterin,

Suppose I import the data in order (user_id, date). For example, I could get the following order in the clustered index:

user_id          date
      1   01 Mar 1990
      1   02 Mar 1990
      2   01 Mar 1990


However, the next insert will be from the user "1" again, then user "2", and this would be repeated many times. Can MySQL handle this well, inserts would be fast? I am afraid MySQL moves the rows from user 2 to make room for new records from user 1 and so on.

Thank you.
Previous Topic:Performance question
Next Topic:large table - index creation
Goto Forum:
  


Current Time: Fri Jul 10 03:17:46 EDT 2009

Total time taken to generate the page: 0.02900 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 2.7.7.
Copyright ©2001-2007 FUD Forum Bulletin Board Software

MySQL is a trademark of Sun Microsystems.
InnoDB is a trademark of Oracle Corp.

Percona Performance Forums are a service of Percona, Inc.
Not affiliated with Sun Microsystems or Oracle Corp.