Home » Performance » MySQL » Help optimizing UPDATE statement
| Help optimizing UPDATE statement [message #1519] |
Sat, 07 July 2007 03:20  |
erland Messages: 4 Registered: June 2007 |
Junior Member |
|
|
Could someone see any way to make this query faster ?
Any help or ideas are welcome
UPDATE customscan_track_attributes
INNER JOIN tracks on
tracks.url=customscan_track_attributes.url and
customscan_track_attributes.musicbrainz_id is null
set customscan_track_attributes.track=tracks.id
where customscan_track_attributes.track!=tracks.id
I've tried this version also, but it is a little bit slower:
UPDATE tracks,customscan_track_attributes
SET customscan_track_attributes.track=tracks.id
where
customscan_track_attributes.musicbrainz_id is null and
tracks.url=customscan_track_attributes.url and
customscan_track_attributes.track!=tracks.id
The tables look like this:
CREATE TABLE `tracks` (
`id` int(10) unsigned NOT NULL auto_increment,
`url` text collate utf8_unicode_ci NOT NULL,
`title` blob,
`titlesort` text collate utf8_unicode_ci,
`titlesearch` text collate utf8_unicode_ci,
`customsearch` text collate utf8_unicode_ci,
`album` int(10) unsigned default NULL,
`tracknum` int(10) unsigned default NULL,
`content_type` varchar(255) collate utf8_unicode_ci default NULL,
`timestamp` int(10) unsigned default NULL,
`filesize` int(10) unsigned default NULL,
`audio_size` int(10) unsigned default NULL,
`audio_offset` int(10) unsigned default NULL,
`year` smallint(5) unsigned default NULL,
`secs` float unsigned default NULL,
`cover` blob,
`thumb` blob,
`vbr_scale` varchar(255) collate utf8_unicode_ci default NULL,
`bitrate` float unsigned default NULL,
`samplerate` int(10) unsigned default NULL,
`samplesize` int(10) unsigned default NULL,
`channels` tinyint(1) unsigned default NULL,
`block_alignment` int(10) unsigned default NULL,
`endian` tinyint(1) default NULL,
`bpm` smallint(5) unsigned default NULL,
`tagversion` varchar(255) collate utf8_unicode_ci default NULL,
`drm` tinyint(1) default NULL,
`rating` tinyint(1) unsigned default NULL,
`disc` tinyint(1) unsigned default NULL,
`playCount` int(10) unsigned default NULL,
`lastPlayed` int(10) unsigned default NULL,
`audio` tinyint(1) default NULL,
`remote` tinyint(1) default NULL,
`lossless` tinyint(1) default NULL,
`lyrics` text collate utf8_unicode_ci,
`moodlogic_id` int(10) unsigned default NULL,
`moodlogic_mixable` tinyint(1) default NULL,
`musicbrainz_id` varchar(40) collate utf8_unicode_ci default NULL,
`musicmagic_mixable` tinyint(1) default NULL,
`replay_gain` float default NULL,
`replay_peak` float default NULL,
PRIMARY KEY (`id`),
KEY `trackTitleIndex` (`title`(255)),
KEY `trackAlbumIndex` (`album`),
KEY `ctSortIndex` (`content_type`),
KEY `trackSortIndex` (`titlesort`(255)),
KEY `trackSearchIndex` (`titlesearch`(255)),
KEY `trackCustomSearchIndex` (`customsearch`(255)),
KEY `trackBitrateIndex` (`bitrate`),
KEY `trackDiscIndex` (`disc`),
KEY `trackFilesizeIndex` (`filesize`),
KEY `trackTimestampIndex` (`timestamp`),
KEY `trackTracknumIndex` (`tracknum`),
KEY `trackRatingIndex` (`rating`),
KEY `trackPlayCountIndex` (`playCount`),
KEY `trackLastPlayedIndex` (`lastPlayed`),
KEY `trackAudioIndex` (`audio`),
KEY `trackLyricsIndex` (`lyrics`(255)),
KEY `trackRemoteIndex` (`remote`),
KEY `trackLosslessIndex` (`lossless`),
KEY `urlIndex` (`url`(255)),
KEY `trackStatMBIndex` (`musicbrainz_id`),
CONSTRAINT `tracks_ibfk_1` FOREIGN KEY (`album`) REFERENCES `albums` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `customscan_track_attributes` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`track` int(10) default NULL,
`url` varchar(511) collate utf8_unicode_ci NOT NULL,
`musicbrainz_id` varchar(40) collate utf8_unicode_ci default NULL,
`module` varchar(40) collate utf8_unicode_ci NOT NULL,
`attr` varchar(255) collate utf8_unicode_ci NOT NULL,
`value` varchar(255) collate utf8_unicode_ci default NULL,
`valuesort` varchar(255) collate utf8_unicode_ci default NULL,
`extravalue` varchar(255) collate utf8_unicode_ci default NULL,
`valuetype` varchar(255) collate utf8_unicode_ci default NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`),
KEY `track_attr_idx` (`track`,`module`,`attr`,`id`),
KEY `musicbrainzIndex` (`musicbrainz_id`),
KEY `urlIndex` (`url`(255)),
KEY `module_attr_value_idx` (`module`,`attr`,`value`),
KEY `attr_module_idx` (`attr`,`module`),
KEY `extravalue_attr_module_track_idx` (`extravalue`,`attr`,`module`,`track`),
KEY `track_module_attr_extravalue_idx` (`track`,`module`,`attr`,`extravalue`),
KEY `module_attr_extravalue_idx` (`module`,`attr`,`extravalue`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
When the tracks table contains about 40000 rows and the customscan_track_attributes table contains about 160000 rows the UPDATE statements takes about 5 seconds to execute on Linux, but on Windows it takes about 15 seconds. A user has also reported that it takes several minutes when the customscan_track_attributes table is larger. This is the case even if the update doesn't actually update any rows.
I tried to make a SELECT first to check if the update was required, like this:
SELECT count(*) FROM customscan_track_attributes
INNER JOIN tracks on
tracks.url=customscan_track_attributes.url and
customscan_track_attributes.musicbrainz_id is null
where customscan_track_attributes.track!=tracks.id
When running this select with explain I get the following result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tracks ALL urlIndex (null) (null) (null) 40541
1 SIMPLE customscan_track_attributes ref musicbrainzIndex,urlIndex urlIndex 767 slimserver65_large.tracks.url 1 Using where
This SELECT statement takes approximately the same time to execute as the UPDATE statement.
I have full control of the customscan_track_attributes table, so if some changes in column types or indexes would make it better that is possible. The tracks table is from an external product, I can add index to this but I can't change any column definitions.
|
|
| |
| Re: Help optimizing UPDATE statement [message #1521 is a reply to message #1520 ] |
Sat, 07 July 2007 08:24   |
erland Messages: 4 Registered: June 2007 |
Junior Member |
|
|
| sterin wrote on Sat, 07 July 2007 11:43 | Some pointers:
2.
Joining based on a large string like your url is not recommended since it degrades performance.
3.
Especially since your url columns are of different types. That requires type conversion during the join which is also degrading performance.
tracks table:
`url` text collate utf8_unicode_ci NOT NULL
customscan_track_attributes table:
`url` varchar(511) collate utf8_unicode_ci NOT NULL
Since I'm guessing that you don't really want to make any larger changes in the table structure I suggest that you start by change the tracks.url to VARCHAR instead.
Then create a combined index on tracks(url,id) and possibly on customscan_track_attributes(url, track) that way it can perform the entire join using one index.
|
Unfortunately I don't have a choice. The 'tracks' table is defined by an external product (SlimServer) which I can't change. The additional data I store is stored in my own table 'customscan_track_attributes'. So I cannot change the 'tracks' table and the only identifier that can be used is the 'url'.
I've tried to define customscan_track_attributes.url as a 'text', but that makes it even worse, on Windows MySQL locks the whole computer when joining two text columns in large tables so you need to use the power switch to get back the control.
However, I tried to create a temporary table where I defined the tracks.url as a varchar(511) and it didn't make it any faster. It looked like this:
create temporary table tmp_tracks (
id int(10),
url varchar(511))
select id,url from tracks;
UPDATE customscan_track_attributes
INNER JOIN tmp_tracks on
tmp_tracks.url=customscan_track_attributes.url and
customscan_track_attributes.track!=tmp_tracks.id
set customscan_track_attributes.track=tmp_tracks.id
where customscan_track_attributes.musicbrainz_id is null;
DROP temporary table tmp_tracks;
| sterin wrote on Sat, 07 July 2007 11:43 |
Also what is your innodb_buffer_pool_size set to?
Because when the DB starts to grow this gets very important to keep performance up.
|
innodb_buffer_pool_size was set to 15M
I tried changing it to 128M and this made it faster, the UPDATE took about 12 seconds with 15M and with 128M it takes 8 seconds the first time it is executed and about 3 seconds the second and third time it is executed.
When innodb_buffer_pool_size was set to 15M, there is always a lot of disk activity when I run the UPDATE. When changing the pool to 128M, there is disk activity the first time the update is run but the second time there doesn't seem to be any disc activity at all.
I you have any more ideas I really appreciate them.
|
|
| |
| Re: Help optimizing UPDATE statement [message #1524 is a reply to message #1523 ] |
Sun, 08 July 2007 07:05  |
erland Messages: 4 Registered: June 2007 |
Junior Member |
|
|
| sterin wrote on Sun, 08 July 2007 12:19 | Although the track table is defined by an external product you should still be able to run a alter table statement on it to change the column to VARCHAR.
And combine it with the combined index on (url, id) and (url,tracks) I suggested.
|
I'm sorry if I didn't give you the whole picture. The external product is SlimServer and I'm making a plugin for it. This plugin is installed by other users in their SlimServer installation. The reason I don't like to change the 'tracks' table is that this might change the behaviour of other things in the external product since it might depend on that the url column is defined as 'TEXT'.
However, I tested this:
create table tmp_tracks (
id int(10),
url varchar(511),
primary key (id),
index urlIndex (url,id)
) type=innodb
select id,url from tracks;
The result is that I have a tmp_tracks table that has the exact same type of its (id,url) columns as I have for the (track,url) columns in customscan_track_attributes. It also has an index for (url,id). I also added an (url,track) index to the customscan_track_attributes table.
I then changed the UPDATE to run towards the tmp_tracks table instead of the tracks table. The result is that the speed is exactly the same as when I used the standard tracks table with the 'TEXT' url column. So the datatype doesn't seem to matter, at least there isn't any difference between a TEXT and VARCHAR(511).
I also tested to remove the != directive in the UPDATE statement, but this made the query even slower.
Finally I also tested to move the != statement inside the JOIN and move out the musicbrainz_id part to the where statement like like below, but this also made the query a bit slower:
UPDATE customscan_track_attributes
JOIN tmp_tracks on
tmp_tracks.url=customscan_track_attributes.url and
customscan_track_attributes.track!=tmp_tracks.id
set customscan_track_attributes.track=tmp_tracks.id
where customscan_track_attributes.musicbrainz_id is null;
Thanks for all the ideas
|
|
|
Goto Forum:
Current Time: Fri Jul 10 04:19:56 EDT 2009
Total time taken to generate the page: 0.03009 seconds |