Home » Performance » MySQL » Help optimizing UPDATE statement
Help optimizing UPDATE statement [message #1519] Sat, 07 July 2007 03:20 Go to next message
erland  is currently offline erland
Messages: 4
Registered: June 2007
Junior Member
Could someone see any way to make this query faster ?
Any help or ideas are welcome

UPDATE customscan_track_attributes 
	INNER JOIN tracks on 
		tracks.url=customscan_track_attributes.url and 
		customscan_track_attributes.musicbrainz_id is null 
set customscan_track_attributes.track=tracks.id 
where customscan_track_attributes.track!=tracks.id


I've tried this version also, but it is a little bit slower:
UPDATE tracks,customscan_track_attributes 
SET customscan_track_attributes.track=tracks.id 
where 
	customscan_track_attributes.musicbrainz_id is null and 
	tracks.url=customscan_track_attributes.url and 
	customscan_track_attributes.track!=tracks.id


The tables look like this:
CREATE TABLE `tracks` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `url` text collate utf8_unicode_ci NOT NULL,
  `title` blob,
  `titlesort` text collate utf8_unicode_ci,
  `titlesearch` text collate utf8_unicode_ci,
  `customsearch` text collate utf8_unicode_ci,
  `album` int(10) unsigned default NULL,
  `tracknum` int(10) unsigned default NULL,
  `content_type` varchar(255) collate utf8_unicode_ci default NULL,
  `timestamp` int(10) unsigned default NULL,
  `filesize` int(10) unsigned default NULL,
  `audio_size` int(10) unsigned default NULL,
  `audio_offset` int(10) unsigned default NULL,
  `year` smallint(5) unsigned default NULL,
  `secs` float unsigned default NULL,
  `cover` blob,
  `thumb` blob,
  `vbr_scale` varchar(255) collate utf8_unicode_ci default NULL,
  `bitrate` float unsigned default NULL,
  `samplerate` int(10) unsigned default NULL,
  `samplesize` int(10) unsigned default NULL,
  `channels` tinyint(1) unsigned default NULL,
  `block_alignment` int(10) unsigned default NULL,
  `endian` tinyint(1) default NULL,
  `bpm` smallint(5) unsigned default NULL,
  `tagversion` varchar(255) collate utf8_unicode_ci default NULL,
  `drm` tinyint(1) default NULL,
  `rating` tinyint(1) unsigned default NULL,
  `disc` tinyint(1) unsigned default NULL,
  `playCount` int(10) unsigned default NULL,
  `lastPlayed` int(10) unsigned default NULL,
  `audio` tinyint(1) default NULL,
  `remote` tinyint(1) default NULL,
  `lossless` tinyint(1) default NULL,
  `lyrics` text collate utf8_unicode_ci,
  `moodlogic_id` int(10) unsigned default NULL,
  `moodlogic_mixable` tinyint(1) default NULL,
  `musicbrainz_id` varchar(40) collate utf8_unicode_ci default NULL,
  `musicmagic_mixable` tinyint(1) default NULL,
  `replay_gain` float default NULL,
  `replay_peak` float default NULL,
  PRIMARY KEY  (`id`),
  KEY `trackTitleIndex` (`title`(255)),
  KEY `trackAlbumIndex` (`album`),
  KEY `ctSortIndex` (`content_type`),
  KEY `trackSortIndex` (`titlesort`(255)),
  KEY `trackSearchIndex` (`titlesearch`(255)),
  KEY `trackCustomSearchIndex` (`customsearch`(255)),
  KEY `trackBitrateIndex` (`bitrate`),
  KEY `trackDiscIndex` (`disc`),
  KEY `trackFilesizeIndex` (`filesize`),
  KEY `trackTimestampIndex` (`timestamp`),
  KEY `trackTracknumIndex` (`tracknum`),
  KEY `trackRatingIndex` (`rating`),
  KEY `trackPlayCountIndex` (`playCount`),
  KEY `trackLastPlayedIndex` (`lastPlayed`),
  KEY `trackAudioIndex` (`audio`),
  KEY `trackLyricsIndex` (`lyrics`(255)),
  KEY `trackRemoteIndex` (`remote`),
  KEY `trackLosslessIndex` (`lossless`),
  KEY `urlIndex` (`url`(255)),
  KEY `trackStatMBIndex` (`musicbrainz_id`),
  CONSTRAINT `tracks_ibfk_1` FOREIGN KEY (`album`) REFERENCES `albums` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci


CREATE TABLE `customscan_track_attributes` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `track` int(10) default NULL,
  `url` varchar(511) collate utf8_unicode_ci NOT NULL,
  `musicbrainz_id` varchar(40) collate utf8_unicode_ci default NULL,
  `module` varchar(40) collate utf8_unicode_ci NOT NULL,
  `attr` varchar(255) collate utf8_unicode_ci NOT NULL,
  `value` varchar(255) collate utf8_unicode_ci default NULL,
  `valuesort` varchar(255) collate utf8_unicode_ci default NULL,
  `extravalue` varchar(255) collate utf8_unicode_ci default NULL,
  `valuetype` varchar(255) collate utf8_unicode_ci default NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `id` (`id`),
  KEY `track_attr_idx` (`track`,`module`,`attr`,`id`),
  KEY `musicbrainzIndex` (`musicbrainz_id`),
  KEY `urlIndex` (`url`(255)),
  KEY `module_attr_value_idx` (`module`,`attr`,`value`),
  KEY `attr_module_idx` (`attr`,`module`),
  KEY `extravalue_attr_module_track_idx` (`extravalue`,`attr`,`module`,`track`),
  KEY `track_module_attr_extravalue_idx` (`track`,`module`,`attr`,`extravalue`),
  KEY `module_attr_extravalue_idx` (`module`,`attr`,`extravalue`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci


When the tracks table contains about 40000 rows and the customscan_track_attributes table contains about 160000 rows the UPDATE statements takes about 5 seconds to execute on Linux, but on Windows it takes about 15 seconds. A user has also reported that it takes several minutes when the customscan_track_attributes table is larger. This is the case even if the update doesn't actually update any rows.

I tried to make a SELECT first to check if the update was required, like this:
SELECT count(*) FROM customscan_track_attributes 
	INNER JOIN tracks on 
		tracks.url=customscan_track_attributes.url and 
		customscan_track_attributes.musicbrainz_id is null 
where customscan_track_attributes.track!=tracks.id


When running this select with explain I get the following result:
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	tracks	ALL	urlIndex	(null)	(null)	(null)	40541	
1	SIMPLE	customscan_track_attributes	ref	musicbrainzIndex,urlIndex	urlIndex	767	slimserver65_large.tracks.url	1	Using where


This SELECT statement takes approximately the same time to execute as the UPDATE statement.

I have full control of the customscan_track_attributes table, so if some changes in column types or indexes would make it better that is possible. The tracks table is from an external product, I can add index to this but I can't change any column definitions.
Re: Help optimizing UPDATE statement [message #1520 is a reply to message #1519 ] Sat, 07 July 2007 05:43 Go to previous messageGo to next message
sterin  is currently offline sterin
Messages: 324
Registered: March 2007
Location: Sweden
Senior Member
Some pointers:
1.
Your inner join and where should be written correctly as:
SELECT
  COUNT(*)
FROM customscan_track_attributes
  INNER JOIN tracks ON 
    tracks.url = customscan_track_attributes.url
    AND tracks.id != customscan_track_attributes.track
WHERE
  customscan_track_attributes.musicbrainz_id IS NULL


2.
Joining based on a large string like your url is not recommended since it degrades performance.

3.
Especially since your url columns are of different types. That requires type conversion during the join which is also degrading performance.

tracks table:
`url` text collate utf8_unicode_ci NOT NULL

customscan_track_attributes table:
`url` varchar(511) collate utf8_unicode_ci NOT NULL


Since I'm guessing that you don't really want to make any larger changes in the table structure I suggest that you start by change the tracks.url to VARCHAR instead.

Then create a combined index on tracks(url,id) and possibly on customscan_track_attributes(url, track) that way it can perform the entire join using one index.

Also what is your innodb_buffer_pool_size set to?
Because when the DB starts to grow this gets very important to keep performance up.
Re: Help optimizing UPDATE statement [message #1521 is a reply to message #1520 ] Sat, 07 July 2007 08:24 Go to previous messageGo to next message
erland  is currently offline erland
Messages: 4
Registered: June 2007
Junior Member
sterin wrote on Sat, 07 July 2007 11:43

Some pointers:
2.
Joining based on a large string like your url is not recommended since it degrades performance.

3.
Especially since your url columns are of different types. That requires type conversion during the join which is also degrading performance.

tracks table:
`url` text collate utf8_unicode_ci NOT NULL

customscan_track_attributes table:
`url` varchar(511) collate utf8_unicode_ci NOT NULL


Since I'm guessing that you don't really want to make any larger changes in the table structure I suggest that you start by change the tracks.url to VARCHAR instead.

Then create a combined index on tracks(url,id) and possibly on customscan_track_attributes(url, track) that way it can perform the entire join using one index.


Unfortunately I don't have a choice. The 'tracks' table is defined by an external product (SlimServer) which I can't change. The additional data I store is stored in my own table 'customscan_track_attributes'. So I cannot change the 'tracks' table and the only identifier that can be used is the 'url'.

I've tried to define customscan_track_attributes.url as a 'text', but that makes it even worse, on Windows MySQL locks the whole computer when joining two text columns in large tables so you need to use the power switch to get back the control.

However, I tried to create a temporary table where I defined the tracks.url as a varchar(511) and it didn't make it any faster. It looked like this:
create temporary table tmp_tracks (
	id int(10),
	url varchar(511)) 
	select id,url from tracks;

UPDATE customscan_track_attributes 
	INNER JOIN tmp_tracks on 
		tmp_tracks.url=customscan_track_attributes.url and 
		customscan_track_attributes.track!=tmp_tracks.id 
	set customscan_track_attributes.track=tmp_tracks.id 
	where customscan_track_attributes.musicbrainz_id is null;

DROP temporary table tmp_tracks;


sterin wrote on Sat, 07 July 2007 11:43


Also what is your innodb_buffer_pool_size set to?
Because when the DB starts to grow this gets very important to keep performance up.


innodb_buffer_pool_size was set to 15M
I tried changing it to 128M and this made it faster, the UPDATE took about 12 seconds with 15M and with 128M it takes 8 seconds the first time it is executed and about 3 seconds the second and third time it is executed.

When innodb_buffer_pool_size was set to 15M, there is always a lot of disk activity when I run the UPDATE. When changing the pool to 128M, there is disk activity the first time the update is run but the second time there doesn't seem to be any disc activity at all.

I you have any more ideas I really appreciate them.
Re: Help optimizing UPDATE statement [message #1523 is a reply to message #1519 ] Sun, 08 July 2007 06:19 Go to previous messageGo to next message
sterin  is currently offline sterin
Messages: 324
Registered: March 2007
Location: Sweden
Senior Member
Although the track table is defined by an external product you should still be able to run a alter table statement on it to change the column to VARCHAR.

And combine it with the combined index on (url, id) and (url,tracks) I suggested.

And another setting that you also want to do to speed up insert/updates.
innodb_flush_log_at_trx_commit = 0
this gives you less security against corruption but you will very often gain a lot in speed.
Re: Help optimizing UPDATE statement [message #1524 is a reply to message #1523 ] Sun, 08 July 2007 07:05 Go to previous message
erland  is currently offline erland
Messages: 4
Registered: June 2007
Junior Member
sterin wrote on Sun, 08 July 2007 12:19

Although the track table is defined by an external product you should still be able to run a alter table statement on it to change the column to VARCHAR.

And combine it with the combined index on (url, id) and (url,tracks) I suggested.


I'm sorry if I didn't give you the whole picture. The external product is SlimServer and I'm making a plugin for it. This plugin is installed by other users in their SlimServer installation. The reason I don't like to change the 'tracks' table is that this might change the behaviour of other things in the external product since it might depend on that the url column is defined as 'TEXT'.

However, I tested this:
create table tmp_tracks (
	id int(10),
	url varchar(511),
	primary key (id),
	index urlIndex (url,id)
) type=innodb 
select id,url from tracks;

The result is that I have a tmp_tracks table that has the exact same type of its (id,url) columns as I have for the (track,url) columns in customscan_track_attributes. It also has an index for (url,id). I also added an (url,track) index to the customscan_track_attributes table.

I then changed the UPDATE to run towards the tmp_tracks table instead of the tracks table. The result is that the speed is exactly the same as when I used the standard tracks table with the 'TEXT' url column. So the datatype doesn't seem to matter, at least there isn't any difference between a TEXT and VARCHAR(511).

I also tested to remove the != directive in the UPDATE statement, but this made the query even slower.

Finally I also tested to move the != statement inside the JOIN and move out the musicbrainz_id part to the where statement like like below, but this also made the query a bit slower:
UPDATE customscan_track_attributes 
	JOIN tmp_tracks on 
		tmp_tracks.url=customscan_track_attributes.url and 
		customscan_track_attributes.track!=tmp_tracks.id
	set customscan_track_attributes.track=tmp_tracks.id 
	where customscan_track_attributes.musicbrainz_id is null;


Thanks for all the ideas
Previous Topic:Multiple-column index not working
Next Topic:Fulltext index not being used
Goto Forum:
  


Current Time: Fri Jul 10 04:19:56 EDT 2009

Total time taken to generate the page: 0.03009 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 2.7.7.
Copyright ©2001-2007 FUD Forum Bulletin Board Software

MySQL is a trademark of Sun Microsystems.
InnoDB is a trademark of Oracle Corp.

Percona Performance Forums are a service of Percona, Inc.
Not affiliated with Sun Microsystems or Oracle Corp.