Home » Performance » MySQL » Selecting where id in (list of 1000 ids)
Selecting where id in (list of 1000 ids) [message #1086] Fri, 13 April 2007 03:02 Go to next message
oordopjes  is currently offline oordopjes
Messages: 15
Registered: January 2007
Junior Member
Hi,

I'm trying to do the following select:
SELECT * FROM table WHERE id IN
(1,2,3..999,1000); (with other id numbers of course)

However, this query takes about 2 seconds. Is there any way to speed this up?
Re: Selecting where id in (list of 1000 ids) [message #1099 is a reply to message #1086 ] Sun, 15 April 2007 14:38 Go to previous messageGo to next message
oordopjes  is currently offline oordopjes
Messages: 15
Registered: January 2007
Junior Member
Nobody who knows a solution?

SELECT * FROM table WHERE (entries.id IN (8631,...351485) LIMIT 1000
Takes: 14 sec

SELECT * FROM table LIMIT 1000
Takes: 0.8 sec
Re: Selecting where id in (list of 1000 ids) [message #1100 is a reply to message #1099 ] Sun, 15 April 2007 18:21 Go to previous messageGo to next message
carpii  is currently offline carpii
Messages: 48
Registered: November 2006
Member
Do you have a unique index on id?
Re: Selecting where id in (list of 1000 ids) [message #1104 is a reply to message #1100 ] Mon, 16 April 2007 15:23 Go to previous messageGo to next message
oordopjes  is currently offline oordopjes
Messages: 15
Registered: January 2007
Junior Member
Yep. It is the primary key..
Re: Selecting where id in (list of 1000 ids) [message #1105 is a reply to message #1086 ] Mon, 16 April 2007 18:06 Go to previous messageGo to next message
sterin  is currently offline sterin
Messages: 324
Registered: March 2007
Location: Sweden
Senior Member
Some questions:

How many entries did you have in the 14 second query?

Why do you need to single out and select so many entries?

How large is the database?

And how much memory have you reserved for key_cache (if MyISAM) or for innodb_buffer_size if you are using InnoDB tables?
Re: Selecting where id in (list of 1000 ids) [message #1110 is a reply to message #1105 ] Tue, 17 April 2007 07:34 Go to previous messageGo to next message
oordopjes  is currently offline oordopjes
Messages: 15
Registered: January 2007
Junior Member
sterin wrote on Tue, 17 April 2007 00:06

Some questions:

1. How many entries did you have in the 14 second query?

2. Why do you need to single out and select so many entries?

3. How large is the database?

4. And how much memory have you reserved for key_cache (if MyISAM) or for innodb_buffer_size if you are using InnoDB tables?



1. 1000 id's (entries) were selected..
2. I get the fulltext results from SphinxSearch. This returns only the id's .. I have to go over as many (at least 1000) to get a good result.
3. Half a million rows.
4. Table is MyISAM. key_buffer_size = 128M

Any suggestions?
Re: Selecting where id in (list of 1000 ids) [message #1111 is a reply to message #1110 ] Tue, 17 April 2007 12:41 Go to previous messageGo to next message
sterin  is currently offline sterin
Messages: 324
Registered: March 2007
Location: Sweden
Senior Member
1.
No, your query _returned_ 1000 rows because you have a LIMIT 1000 at the end.
What I wanted to know is how many entries did you have in the IN(...) expression?

3.
How many MB is the indexes as total in the DB?


And what kind of hardware are you running this on?


You should avoid using LIMIT and instead try to limit the number of rows in the WHERE clause.
The reason is that the LIMIT is the last thing that is performed in the execution of a query.
Which means that during the execution it will have to work on all records that match the WHERE part before you LIMIT away surplus rows.
Re: Selecting where id in (list of 1000 ids) [message #1112 is a reply to message #1111 ] Tue, 17 April 2007 14:46 Go to previous messageGo to next message
oordopjes  is currently offline oordopjes
Messages: 15
Registered: January 2007
Junior Member
Hi, sorry, the LIMIT is unnecessary.. it are exactly 1000 records (ids) that are in the IN(..) part.

I'm running on a dual xeon with 1Gb memory.

3. How do I check this? What do you mean? The table is 82.000KB, the index (which has a lot of fulltext indexes and indexes on many columns) is 262.000KB Very Happy This might be a little bit out of proportions.
Does this matter? Does MySQL put the COMPLETE index in memory??

[Updated on: Tue, 17 April 2007 14:49]

Re: Selecting where id in (list of 1000 ids) [message #1113 is a reply to message #1086 ] Wed, 18 April 2007 08:50 Go to previous messageGo to next message
sterin  is currently offline sterin
Messages: 324
Registered: March 2007
Location: Sweden
Senior Member
No MySQL doesn't put the complete index in memory.
But it will try to use as much of the cache as possible (until the limit that you have set is reached).
And you want it to use as much as possible.

The most ideal situation is when the entire DB with both index and data fits into RAM. Because as soon as it has to read from disk everything slows down by magnitudes.

So if you can fit the entire index into key_buffer_size while still maintaining about 75% free memory for OS cache for the tables is the best solution.

So RAM is generally considered a DBMS best friend.
As much as possible is the rule Smile.
Re: Selecting where id in (list of 1000 ids) [message #1126 is a reply to message #1086 ] Fri, 20 April 2007 18:23 Go to previous messageGo to next message
Peter  is currently offline Peter
Messages: 405
Registered: August 2006
Senior Member
Super Guru
First,

Using sphinx you should not retrieve all 1000 rows but use SetLimit to retrieve only ids you're going to show on the page.

Second the difference is expected - the query needs 1000 random IO operations for large data so it takes what it takes.

Other query does full table scan so it can find 1000 rows and stop pretty fast.


Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
Re: Selecting where id in (list of 1000 ids) [message #1227 is a reply to message #1112 ] Tue, 08 May 2007 15:31 Go to previous messageGo to next message
JGilbert  is currently offline JGilbert
Messages: 31
Registered: May 2007
Location: Chicago, IL
Member

oordopjes wrote on Tue, 17 April 2007 14:46

Hi, sorry, the LIMIT is unnecessary.. it are exactly 1000 records (ids) that are in the IN(..) part.

I'm running on a dual xeon with 1Gb memory.

3. How do I check this? What do you mean? The table is 82.000KB, the index (which has a lot of fulltext indexes and indexes on many columns) is 262.000KB Very Happy This might be a little bit out of proportions.
Does this matter? Does MySQL put the COMPLETE index in memory??


I don't know what Sphinx is, but if it were my table and my query I might try removing any indexes that weren't needed. It sounds like you may have too many. Ideally you can optimize away any with very low cardinality (phpMyAdmin shows this), or any with high null counts. The more unique the value, the better the column is to index.

RAM is cheap. you could always buy more.

You might try changing your query so it says SELECT * FROM BLAH WHERE id BETWEEN lowest_value AND highest_value AND id IN (list of ids)

I have no idea if that will help or not, but logically, it seems like if mysql did a table scan for each of the ids in the IN (list) then you could conceivably make that list smaller. Since those are primary keys, though, it may just be redundant code and possibly even slow things down. Tinkering is best. Still think it sounds like you have too many indexes.
Re: Selecting where id in (list of 1000 ids) [message #3123 is a reply to message #1099 ] Tue, 27 May 2008 01:59 Go to previous messageGo to next message
toddjnsn  is currently offline toddjnsn
Messages: 4
Registered: May 2008
Location: Grand Rapids, MI
Junior Member
I'm not a MYSQL expert, but I'm not a newbie, either.

Like said before, I'd first reduce your indexes and only have them where you need them. Additionally, knowing that Mysql will only use ONE index for a table per query, you should be able to examine your queries and decide which ones should go (if any).

Second... essentially you have 1,000 OR statements. I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).

I know how to speed it up (assuming you're doing PHP):

1) All those ID's you're searching for -- store them in an array, with their key as the ID *and* the value as the ID:
for ($a=0; $a<=$numberOfIDsToSearch; $a++)
{ $val = $myOrigIDArray[$a]; $newIDArray[$val] = $val; }

2) Before executing the query, get the lowest ID # and the highest ID #. Set them at $min, and $max variables. It's easy to grab them from an array thru PHP's built-in functions.

3) Now, we'll make your SQL Statement getting all IDs in between your highest and lowest ($max and $min). You'll fetch an array of all possibilities of where those 1,000 lye, and you saved them all in memory!

Then, just decide which one to get. Here, I'll put the code from the select statement on (do you really need *?? - that'll slow it down):

$getidData = "SELECT id, info1, info2, info3 FROM table WHERE id>=$min AND id<=$max";
$result = mysql_query($getidData, $db_conn) or die("I screwed up");
while ($row = mysql_fetch_array($result))
{
$id = $row[0];
if ($newIDArray[$id] == $id)
{ // ... I found it! I can build a new array of found ones here, to process later! }
} // end while

[Updated on: Tue, 27 May 2008 02:02]

Re: Selecting where id in (list of 1000 ids) [message #3124 is a reply to message #1086 ] Tue, 27 May 2008 02:48 Go to previous messageGo to next message
carpii  is currently offline carpii
Messages: 48
Registered: November 2006
Member
JGilbert wrote on Tue, 08 May 2007 15:31


I don't know what Sphinx is, but if it were my table and my query I might try removing any indexes that weren't needed. It sounds like you may have too many



But removing excessive indexes would not speed up selects. For write operations sure, carefully choose them to make sure theres no io wasted.

toddjnsn wrote on Tue, 27 May 2008 01:59


I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).



I dont understand this. If your query only needs an INNER JOIN then great, but I can think of many queries where a LEFT JOIN is necessary and desirable.

Dont be afraid of LEFT JOIN, it is more important to return the right results, and *then* worry about performance, than having a fast query that misses out some records.

toddjnsn wrote on Tue, 27 May 2008 01:59


3)Now, we'll make your SQL Statement getting all IDs in between your highest and lowest ($max and $min)



I think the IN (1,2,3..999) he pasted was an example Smile
The real IDs will are unlikely to be a nice contiguous range, since they are document ids which are returned by the Sphinx search daemon.

I agree with Peter, try to limit the amount of ids that Sphinx is returning. The Sphinx PHP API (and all other Sphinx APIs) have support for paging, and asking for 100 records from Sphinx and then fetching those 100 from the database should be much faster than using 1000. Unless you really *need* 1000 results?

Theres a few options you can consider.

If youre querying Sphinx for 1000 results, and then doing a JOIN to filter those results down further, you may be better off considering SphinxSE, which integrates with MySQL.
The bad news is that its not as scalable, and Sphinx must run on the same box as MySQL. Because of this, Id choose to avoid SphinxSE personally, but sometimes its a valid choice.

Secondly, could any of the content be cached either to filesystem, or to memcached? Typically most search features have a subset of basic queries which are run frequently, you may be able to cache these to alleviate the load.

I cant think of much else really. Youre essentially selecting a 'random' set of records where the ID is the primary key,so theres not much scope for improvement.

Maybe you need to alter your table schema, or hive some fields off into another table.

You could also consider innodb, which in my experience provides vastly improved performance for selects

[Updated on: Tue, 27 May 2008 02:51]

Re: Selecting where id in (list of 1000 ids) [message #3125 is a reply to message #3124 ] Tue, 27 May 2008 03:07 Go to previous messageGo to next message
toddjnsn  is currently offline toddjnsn
Messages: 4
Registered: May 2008
Location: Grand Rapids, MI
Junior Member
"I dont understand this. If your query only needs an INNER JOIN then great, but I can think of many queries where a LEFT JOIN is necessary and desirable.

Dont be afraid of LEFT JOIN, it is more important to return the right results, and *then* worry about performance, than having a fast query that misses out some records."

I avoid doing SLOW (left) joins when I'd have to do them on my system a lot. So I denormalize things and put redundant data on pieces that WON'T change after writing (ie linking the rows by association). That way, I avoid the left join by a once-in-a-while write, with a million queries w/ avoided left joins. I'd never do such a thing if I had to put a redudant piece somewhere and have to change it after the fact.

As far as my example -- I think it'd solve his problem just fine. I ran into situations where I had to do matching -- 100,000+ IDs to find matches of multiple tables w/ 100,000+ IDs. I couldn't do an OR statement -- that'd be ridiculous.

So I did what I showed by example. Turned 30-45 minutes of query time down to about 20-30 seconds (swallowed ram on the run, but only for that 20-30 seconds).

Re: Selecting where id in (list of 1000 ids) [message #3128 is a reply to message #3125 ] Tue, 27 May 2008 05:25 Go to previous messageGo to next message
sterin  is currently offline sterin
Messages: 324
Registered: March 2007
Location: Sweden
Senior Member
Quote:


Second... essentially you have 1,000 OR statements. I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).



OR's are bad indeed because MySQL can't optimize them very well which means that it can't use an index to solve them.
But IN(...) are not bad because MySQL can use an index to solve IN()'s.
So you should not be afraid of using IN() if you need to.
Reading in all ID's in a PHP array is not usually the answer because reading the values from the DB into a PHP array also takes time and CPU (sometimes a lot).

Quote:


I avoid doing SLOW (left) joins when I'd have to do them on my system a lot. So I denormalize things and put redundant data on pieces that WON'T change after writing (ie linking the rows by association). That way, I avoid the left join by a once-in-a-while write, with a million queries w/ avoided left joins. I'd never do such a thing if I had to put a redundant piece somewhere and have to change it after the fact.


You can't really assume that:
1. all LEFT JOIN's are slow
Yes you are forcing the join order with a LEFT JOIN but the execution time for an INNER JOIN and a LEFT JOIN that are performed in the same table order is the same.
Unless you add extra rows as in your suggestion then those will require extra reading that can be implicitly be set to NULL in a normal LEFT JOIN query.

2. that writes are only once-in-a-while
For web servers then yes this is often true but on a lot of other servers it is not true and writes can be very large part of the load on the machine and on these server every extra write is bad.

3. that there are more matching records than non matching records in the child table
Since if most of your records in the master table don't have a corresponding value in the child table it is a waste of space and speed to add data that doesn't even need to be there.
It will use up CPU cycles and cache memory that can be better spent on something else.
[quote]

Quote:


As far as my example -- I think it'd solve his problem just fine. I ran into situations where I had to do matching -- 100,000+ IDs to find matches of multiple tables w/ 100,000+ IDs. I couldn't do an OR statement -- that'd be ridiculous.


Yes it solves the problem, but if the min and max values are very large then it will take a lot of CPU to read all that data and a lot of RAM to store it before you throw it away.

The rule for DBMS optimization is to always think about how you can as fast as possible throw away surplus data so you don't need to shuffle it around in any way.

In this case a IN() is the much better choice.

I understand that you come from a programmer background and yes some problems are solved faster in an external language.

But when you have understood SQL and indexes and how the optimizer works in a DBMS then you can solve most things much faster than performing loops in your external programming language.

Re: Selecting where id in (list of 1000 ids) [message #3130 is a reply to message #3128 ] Tue, 27 May 2008 07:30 Go to previous messageGo to next message
toddjnsn  is currently offline toddjnsn
Messages: 4
Registered: May 2008
Location: Grand Rapids, MI
Junior Member
Quote:

But IN(...) are not bad because MySQL can use an index to solve IN()'s.


I'll keep that in mind, although virtually all (if not all) of my OR statements aren't indexed (not index worthy; executes the OR after the returning list has already been shaved down - i'm assuming! Hopefully they're not hurting me more than I think!!).

Quote:

Reading in all ID's in a PHP array is not usually the answer because reading the values from the DB into a PHP array also takes time and CPU (sometimes a lot)

I agree. But it's fast. It's a resort to take if you have a lot of RAM and it's not done too routinely and there's seemingly no other way out. If your SQL query's taking 14 seconds, that too is slowing things down (unless there's a better way besides both methods).

Quote:

For web servers then yes this is often true but on a lot of other servers it is not true and writes can be very large part of the load on the machine and on these server every extra write is bad.


I actually may have miscommunicated. I wasn't talking about adding extra rows, but an extra column - set to an ID of a cooresponding table that extends it. When both tables are huge, a left-join was killing me. So I added that column to have the ID of the other one (both having IDs of each). But those IDs don't change -- they're both auto-incremented of each other. In essence, when that extended or child table adds a row, spawning from it's parent, the parent get's the child's ID too -- otherwise it's set at 0, default originally. So I can search WHERE child_id=0, and I get the ones w/o a child, instead of a left join as I was told before.

I didn't mean it as a once-in-a-while write -- just an extra write after inserting a new child row -- and there can only be one child for the parent.

Quote:

Since if most of your records in the master table don't have a corresponding value in the child table it is a waste of space and speed to add data that doesn't even need to be there


I hope I clarified what I was doing above... again, the master table has a corresponding value in the child table too. When the child table's created, it has the parent ID put in. THEN, the "extra write" is to put that new child's ID in the parent table. The once-in-a-while is due to the fact that a larger pct. of activity when dealing with these tables, pound-for-pound, isn't begetting a new child, it's dealing with an already established parent-child situation.

Make sense? Would it make more sense to just have both in one table? The reason I split them was because the child table has heavier data in it and weighs more, and about 20% of the master table doesn't have a child.

Quote:

The rule for DBMS optimization is to always think about how you can as fast as possible throw away surplus data so you don't need to shuffle it around in any way


If this helps -- in my example, it's something neat where the PHP script doesn't have to sort, scan, or organize anything for a match when comparing against the huge result set. Initially, I showed him that he'd make a quick "copy" of his old array by setting the 'key' as the value itself. As far as processing power is concerned, it doesn't take much, as it just takes in the ID from the result set and sees if the $array[$id] == $idFromDB; if so, it found a match without having to run any search whatsoever. But YES, it takes a lot of ram! Depending on what extra junk he's bringing in from the table, I may advise against it. If he's bringing in a lot of varchars, datetimes, etc., and not just mainly a few integer column values, then I'd say he better have a lot of RAM to pull in all that junk into memory (because a lot will be ignored and just sitting in memory). Smile Of course, right after, he can delete the array carried in from the DB, so it'd only be temporary. And of course, depends how often that's run (routinely).

My main goal is to learn more about MySQL and speed up my DB. It's not running slow; in fact, I'm shocked that it's running fast on a 512MB DualCore system w/ both Apache & Mysql. Not a million users -- but a business site system doing all kinds of data gathering and writes and it's highly conditional stuff -- which I need to know more about separating data into different tables, knowing more about how Mysql uses indexes, etc., that doesn't follow the work-flow data activity of say, a forum, which one can quick grab a book on and know the game-plan easily.

[Updated on: Tue, 27 May 2008 07:36]

Re: Selecting where id in (list of 1000 ids) [message #3216 is a reply to message #1086 ] Wed, 25 June 2008 03:16 Go to previous message
ruben  is currently offline ruben
Messages: 1
Registered: June 2008
Junior Member
I don't think mysql optimizes the table at IN (at least in Mysql 5.1) I also get very slow queryes when using this approach..

However I found a workaround.... Create a temporary table that holds your id's and then use a JOIN statement... this works pretty fast.
Previous Topic:Integer overflow with UNSIGNED INT columns
Next Topic:More than 400 concurrent small queries
Goto Forum:
  


Current Time: Sun Jul 5 15:37:14 EDT 2009

Total time taken to generate the page: 0.01857 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 2.7.7.
Copyright ©2001-2007 FUD Forum Bulletin Board Software

MySQL is a trademark of Sun Microsystems.
InnoDB is a trademark of Oracle Corp.

Percona Performance Forums are a service of Percona, Inc.
Not affiliated with Sun Microsystems or Oracle Corp.