Home » Performance » MySQL » Selecting where id in (list of 1000 ids)
| Selecting where id in (list of 1000 ids) [message #1086] |
Fri, 13 April 2007 03:02  |
oordopjes Messages: 15 Registered: January 2007 |
Junior Member |
|
|
Hi,
I'm trying to do the following select:
SELECT * FROM table WHERE id IN
(1,2,3..999,1000); (with other id numbers of course)
However, this query takes about 2 seconds. Is there any way to speed this up?
|
|
| | | | | | | | |
| Re: Selecting where id in (list of 1000 ids) [message #1126 is a reply to message #1086 ] |
Fri, 20 April 2007 18:23   |
Peter Messages: 405 Registered: August 2006 |
Senior Member Super Guru |
|
|
First,
Using sphinx you should not retrieve all 1000 rows but use SetLimit to retrieve only ids you're going to show on the page.
Second the difference is expected - the query needs 1000 random IO operations for large data so it takes what it takes.
Other query does full table scan so it can find 1000 rows and stop pretty fast.
Peter Zaitsev, MySQL Performance Expert
MySQL Performance Blog - http://www.mysqlperformanceblog.com
MySQL Consulting http://www.mysqlperformanceblog.com/mysql-consulting/
|
|
|
| Re: Selecting where id in (list of 1000 ids) [message #1227 is a reply to message #1112 ] |
Tue, 08 May 2007 15:31   |
|
| oordopjes wrote on Tue, 17 April 2007 14:46 | Hi, sorry, the LIMIT is unnecessary.. it are exactly 1000 records (ids) that are in the IN(..) part.
I'm running on a dual xeon with 1Gb memory.
3. How do I check this? What do you mean? The table is 82.000KB, the index (which has a lot of fulltext indexes and indexes on many columns) is 262.000KB This might be a little bit out of proportions.
Does this matter? Does MySQL put the COMPLETE index in memory??
|
I don't know what Sphinx is, but if it were my table and my query I might try removing any indexes that weren't needed. It sounds like you may have too many. Ideally you can optimize away any with very low cardinality (phpMyAdmin shows this), or any with high null counts. The more unique the value, the better the column is to index.
RAM is cheap. you could always buy more.
You might try changing your query so it says SELECT * FROM BLAH WHERE id BETWEEN lowest_value AND highest_value AND id IN (list of ids)
I have no idea if that will help or not, but logically, it seems like if mysql did a table scan for each of the ids in the IN (list) then you could conceivably make that list smaller. Since those are primary keys, though, it may just be redundant code and possibly even slow things down. Tinkering is best. Still think it sounds like you have too many indexes.
|
|
|
| Re: Selecting where id in (list of 1000 ids) [message #3123 is a reply to message #1099 ] |
Tue, 27 May 2008 01:59   |
toddjnsn Messages: 4 Registered: May 2008 Location: Grand Rapids, MI |
Junior Member |
|
|
I'm not a MYSQL expert, but I'm not a newbie, either.
Like said before, I'd first reduce your indexes and only have them where you need them. Additionally, knowing that Mysql will only use ONE index for a table per query, you should be able to examine your queries and decide which ones should go (if any).
Second... essentially you have 1,000 OR statements. I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).
I know how to speed it up (assuming you're doing PHP):
1) All those ID's you're searching for -- store them in an array, with their key as the ID *and* the value as the ID:
for ($a=0; $a<=$numberOfIDsToSearch; $a++)
{ $val = $myOrigIDArray[$a]; $newIDArray[$val] = $val; }
2) Before executing the query, get the lowest ID # and the highest ID #. Set them at $min, and $max variables. It's easy to grab them from an array thru PHP's built-in functions.
3) Now, we'll make your SQL Statement getting all IDs in between your highest and lowest ($max and $min). You'll fetch an array of all possibilities of where those 1,000 lye, and you saved them all in memory!
Then, just decide which one to get. Here, I'll put the code from the select statement on (do you really need *?? - that'll slow it down):
$getidData = "SELECT id, info1, info2, info3 FROM table WHERE id>=$min AND id<=$max";
$result = mysql_query($getidData, $db_conn) or die("I screwed up");
while ($row = mysql_fetch_array($result))
{
$id = $row[0];
if ($newIDArray[$id] == $id)
{ // ... I found it! I can build a new array of found ones here, to process later! }
} // end while
[Updated on: Tue, 27 May 2008 02:02]
|
|
|
| Re: Selecting where id in (list of 1000 ids) [message #3124 is a reply to message #1086 ] |
Tue, 27 May 2008 02:48   |
carpii Messages: 48 Registered: November 2006 |
Member |
|
|
| JGilbert wrote on Tue, 08 May 2007 15:31 |
I don't know what Sphinx is, but if it were my table and my query I might try removing any indexes that weren't needed. It sounds like you may have too many
|
But removing excessive indexes would not speed up selects. For write operations sure, carefully choose them to make sure theres no io wasted.
| toddjnsn wrote on Tue, 27 May 2008 01:59 |
I hate OR statements, almost as much as I refuse to use LEFT JOINs (I just use standard joins 100%).
|
I dont understand this. If your query only needs an INNER JOIN then great, but I can think of many queries where a LEFT JOIN is necessary and desirable.
Dont be afraid of LEFT JOIN, it is more important to return the right results, and *then* worry about performance, than having a fast query that misses out some records.
| toddjnsn wrote on Tue, 27 May 2008 01:59 |
3)Now, we'll make your SQL Statement getting all IDs in between your highest and lowest ($max and $min)
|
I think the IN (1,2,3..999) he pasted was an example 
The real IDs will are unlikely to be a nice contiguous range, since they are document ids which are returned by the Sphinx search daemon.
I agree with Peter, try to limit the amount of ids that Sphinx is returning. The Sphinx PHP API (and all other Sphinx APIs) have support for paging, and asking for 100 records from Sphinx and then fetching those 100 from the database should be much faster than using 1000. Unless you really *need* 1000 results?
Theres a few options you can consider.
If youre querying Sphinx for 1000 results, and then doing a JOIN to filter those results down further, you may be better off considering SphinxSE, which integrates with MySQL.
The bad news is that its not as scalable, and Sphinx must run on the same box as MySQL. Because of this, Id choose to avoid SphinxSE personally, but sometimes its a valid choice.
Secondly, could any of the content be cached either to filesystem, or to memcached? Typically most search features have a subset of basic queries which are run frequently, you may be able to cache these to alleviate the load.
I cant think of much else really. Youre essentially selecting a 'random' set of records where the ID is the primary key,so theres not much scope for improvement.
Maybe you need to alter your table schema, or hive some fields off into another table.
You could also consider innodb, which in my experience provides vastly improved performance for selects
[Updated on: Tue, 27 May 2008 02:51]
|
|
|
| Re: Selecting where id in (list of 1000 ids) [message #3125 is a reply to message #3124 ] |
Tue, 27 May 2008 03:07   |
toddjnsn Messages: 4 Registered: May 2008 Location: Grand Rapids, MI |
Junior Member |
|
|
"I dont understand this. If your query only needs an INNER JOIN then great, but I can think of many queries where a LEFT JOIN is necessary and desirable.
Dont be afraid of LEFT JOIN, it is more important to return the right results, and *then* worry about performance, than having a fast query that misses out some records."
I avoid doing SLOW (left) joins when I'd have to do them on my system a lot. So I denormalize things and put redundant data on pieces that WON'T change after writing (ie linking the rows by association). That way, I avoid the left join by a once-in-a-while write, with a million queries w/ avoided left joins. I'd never do such a thing if I had to put a redudant piece somewhere and have to change it after the fact.
As far as my example -- I think it'd solve his problem just fine. I ran into situations where I had to do matching -- 100,000+ IDs to find matches of multiple tables w/ 100,000+ IDs. I couldn't do an OR statement -- that'd be ridiculous.
So I did what I showed by example. Turned 30-45 minutes of query time down to about 20-30 seconds (swallowed ram on the run, but only for that 20-30 seconds).
|
|
| |
| Re: Selecting where id in (list of 1000 ids) [message #3130 is a reply to message #3128 ] |
Tue, 27 May 2008 07:30   |
toddjnsn Messages: 4 Registered: May 2008 Location: Grand Rapids, MI |
Junior Member |
|
|
| Quote: | But IN(...) are not bad because MySQL can use an index to solve IN()'s.
|
I'll keep that in mind, although virtually all (if not all) of my OR statements aren't indexed (not index worthy; executes the OR after the returning list has already been shaved down - i'm assuming! Hopefully they're not hurting me more than I think!!).
| Quote: | Reading in all ID's in a PHP array is not usually the answer because reading the values from the DB into a PHP array also takes time and CPU (sometimes a lot)
|
I agree. But it's fast. It's a resort to take if you have a lot of RAM and it's not done too routinely and there's seemingly no other way out. If your SQL query's taking 14 seconds, that too is slowing things down (unless there's a better way besides both methods).
| Quote: | For web servers then yes this is often true but on a lot of other servers it is not true and writes can be very large part of the load on the machine and on these server every extra write is bad.
|
I actually may have miscommunicated. I wasn't talking about adding extra rows, but an extra column - set to an ID of a cooresponding table that extends it. When both tables are huge, a left-join was killing me. So I added that column to have the ID of the other one (both having IDs of each). But those IDs don't change -- they're both auto-incremented of each other. In essence, when that extended or child table adds a row, spawning from it's parent, the parent get's the child's ID too -- otherwise it's set at 0, default originally. So I can search WHERE child_id=0, and I get the ones w/o a child, instead of a left join as I was told before.
I didn't mean it as a once-in-a-while write -- just an extra write after inserting a new child row -- and there can only be one child for the parent.
| Quote: | Since if most of your records in the master table don't have a corresponding value in the child table it is a waste of space and speed to add data that doesn't even need to be there
|
I hope I clarified what I was doing above... again, the master table has a corresponding value in the child table too. When the child table's created, it has the parent ID put in. THEN, the "extra write" is to put that new child's ID in the parent table. The once-in-a-while is due to the fact that a larger pct. of activity when dealing with these tables, pound-for-pound, isn't begetting a new child, it's dealing with an already established parent-child situation.
Make sense? Would it make more sense to just have both in one table? The reason I split them was because the child table has heavier data in it and weighs more, and about 20% of the master table doesn't have a child.
| Quote: | The rule for DBMS optimization is to always think about how you can as fast as possible throw away surplus data so you don't need to shuffle it around in any way
|
If this helps -- in my example, it's something neat where the PHP script doesn't have to sort, scan, or organize anything for a match when comparing against the huge result set. Initially, I showed him that he'd make a quick "copy" of his old array by setting the 'key' as the value itself. As far as processing power is concerned, it doesn't take much, as it just takes in the ID from the result set and sees if the $array[$id] == $idFromDB; if so, it found a match without having to run any search whatsoever. But YES, it takes a lot of ram! Depending on what extra junk he's bringing in from the table, I may advise against it. If he's bringing in a lot of varchars, datetimes, etc., and not just mainly a few integer column values, then I'd say he better have a lot of RAM to pull in all that junk into memory (because a lot will be ignored and just sitting in memory). Of course, right after, he can delete the array carried in from the DB, so it'd only be temporary. And of course, depends how often that's run (routinely).
My main goal is to learn more about MySQL and speed up my DB. It's not running slow; in fact, I'm shocked that it's running fast on a 512MB DualCore system w/ both Apache & Mysql. Not a million users -- but a business site system doing all kinds of data gathering and writes and it's highly conditional stuff -- which I need to know more about separating data into different tables, knowing more about how Mysql uses indexes, etc., that doesn't follow the work-flow data activity of say, a forum, which one can quick grab a book on and know the game-plan easily.
[Updated on: Tue, 27 May 2008 07:36]
|
|
|
| Re: Selecting where id in (list of 1000 ids) [message #3216 is a reply to message #1086 ] |
Wed, 25 June 2008 03:16  |
ruben Messages: 1 Registered: June 2008 |
Junior Member |
|
|
I don't think mysql optimizes the table at IN (at least in Mysql 5.1) I also get very slow queryes when using this approach..
However I found a workaround.... Create a temporary table that holds your id's and then use a JOIN statement... this works pretty fast.
|
|
|
Goto Forum:
Current Time: Sun Jul 5 15:45:54 EDT 2009
Total time taken to generate the page: 0.01797 seconds |