Home » Performance » MySQL » 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort
| 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3727] |
Sun, 23 November 2008 13:09  |
batfastad Messages: 6 Registered: November 2008 Location: London, UK |
Junior Member |
|
|
Hi everyone
I'm relatively new to MySQL and gradually learning new stuff, but I'm having a problem with a particular query. I've asked around in the PHP DevNet forums and it seems I should be getting much better performance on this query, as my tables are pretty small and the query's not particularly advanced.
The structure is nicely normalised, something I've wanted to have for years. I'm gradually switching our company intranet subscribers database from filemaker to MySQL.
My server is Ubuntu Server 8.04 LTS, MySQL5, PHP5 and Apache2. On a pretty recent Intel Core 2 server with 4GB memory.
I'd really appreciate it if someone could give me some advice on this, as it's been driving me crazy!
contacts has about 36,000 records
companies: 38,000
tags_data: 42,000
tags: 980
My total DB size is about 26MiB, so it's tiny in comparison to many MySQL databases. I think this query should not be taking 32 seconds to complete!
Basically the tags table has various tags/classifications which can be applied to companies. So you can tag a company as MANUFACTURER, DISTRIBUTOR, SHOW EVENT ORGANISER... etc and there's 980 different tags/classifications to choose from. Each company can have multiple tags applied, and the tags_data table stores which tags are applied to which company.
Here's a quick diagram of part of the database which should help to visualise

green bold are the primary key columns. Red are the columns which link the data together.
Here's the query I'm trying to run:
SELECT `contacts`.`contact_name`, `companies`.`company_name`, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR ',') AS `classifications`
FROM `contacts`
LEFT JOIN `companies` USING(`company_id`)
LEFT JOIN `tags_data` ON `companies`.`company_id`=`tags_data`.`company_id`
LEFT JOIN `tags` ON `tags_data`.`tag_id`=`tags`.`tag_id`
WHERE `contacts`.`subscribed`='VIP'
GROUP BY `contacts`.`contact_id`
The syntax on that might be improved I guess, but I don't know how to do that. That's the limit of my MySQL knowledge really.
I'm trying to get contact_name, company_name and a column called classifications which is a comma separated list of the tags/classifications names that are selected for that particular company.
contacts.subscribed can have 3 different values: NULL, FREE or VIP
This query only returns 777 records, but I'll be looking to return up to 20,000 with a variation of this query if I can get it running faster!
Initially this query took about 80 seconds, but I added a composite index (only just learnt about them) on the tags_data table (index called tags_companies) which reduced it to about 30 seconds.
I removed the ORDER BY or LIMIT from the query earlier as well, but it made very little difference.
Here's the output of EXPLAIN on this query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE contacts ref IX_subscribed IX_subscribed 15 const 1103 Using where; Using temporary; Using filesort
1 SIMPLE companies eq_ref PRIMARY PRIMARY 4 dwcintranet.contacts.company_id 1
1 SIMPLE tags_data index company_id tags_companies 26 NULL 41930 Using index
1 SIMPLE tags eq_ref PRIMARY PRIMARY 4 dwcintranet.tags_data.tag_id 1
So I think the problem is the Using filesort. I've tried taking the query apart to eliminate the filesort, and it seems it's the 2nd join that's causing the filesort: tags_data ON companies
But I don't know what else I could index to sort this out
Any ideas/suggestions?
I would really appreciate some help on this, I'm convinced that this query shouldn't be taking 30 seconds!
Thanks, Ben
[Updated on: Sun, 23 November 2008 13:18]
|
|
|
| Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3728 is a reply to message #3727 ] |
Sun, 23 November 2008 13:14   |
batfastad Messages: 6 Registered: November 2008 Location: London, UK |
Junior Member |
|
|
Here's the SHOW CREATE TABLE of the 4 tables
contacts
CREATE TABLE `contacts` (
`contact_id` int(11) NOT NULL auto_increment,
`company_id` int(11) default NULL,
`address1_contact` varchar(255) default NULL,
`address2_contact` varchar(255) default NULL,
`address3_contact` varchar(255) default NULL,
`business_card` varchar(2) default NULL,
`contact_id_old` int(20) default NULL,
`contact_name` varchar(255) default NULL,
`county_state_contact` varchar(100) default NULL,
`create_name` varchar(60) default NULL,
`create_stamp` int(10) default NULL,
`email_contact` varchar(255) default NULL,
`email_contact_optout` tinyint(1) default NULL,
`fax_contact` varchar(255) default NULL,
`geo_contact` varchar(255) default NULL,
`job_title` varchar(255) default NULL,
`subscribed` varchar(4) default NULL,
`mail_contact` tinyint(1) default NULL,
`modify_count` int(7) default '0',
`modify_name` varchar(60) default NULL,
`modify_stamp` int(10) default NULL,
`note` text,
`postcode_zip_contact` varchar(50) default NULL,
`reg_stamp` int(10) default NULL,
`tel_cell` varchar(255) default NULL,
`tel_contact` varchar(255) default NULL,
`tel_ext` varchar(255) default NULL,
`tel_home` varchar(255) default NULL,
`temp` varchar(50) default NULL,
`town_city_contact` varchar(100) default NULL,
`view_count` int(7) default '0',
`view_name` varchar(60) default NULL,
`view_stamp` int(10) default NULL,
PRIMARY KEY (`contact_id`),
KEY `company_id` (`company_id`),
KEY `contact_name` (`contact_name`),
KEY `IX_subscribed` (`subscribed`),
KEY `companies_contacts` (`company_id`,`contact_id`),
KEY `contacts_companies` (`contact_id`,`company_id`)
) ENGINE=MyISAM AUTO_INCREMENT=37645 DEFAULT CHARSET=utf8
companies
CREATE TABLE `companies` (
`company_id` int(11) NOT NULL auto_increment,
`ad_lead` tinyint(1) default NULL,
`address_priv` tinyint(1) default NULL,
`address1` varchar(255) default NULL,
`address2` varchar(255) default NULL,
`address3` varchar(255) default NULL,
`advisory` varchar(255) default NULL,
`advisory_name` varchar(60) default NULL,
`advisory_stamp` int(10) default NULL,
`agency_address` varchar(255) default NULL,
`agency_company` varchar(255) default NULL,
`agency_contact` varchar(255) default NULL,
`agency_email` varchar(255) default NULL,
`agency_fax` varchar(255) default NULL,
`agency_id` int(11) default NULL,
`agency_tel` varchar(255) default NULL,
`budget_year1` varchar(3) default NULL,
`budget_year2` varchar(3) default NULL,
`company_name` varchar(255) default NULL,
`county_state` varchar(100) default NULL,
`create_name` varchar(60) default NULL,
`create_stamp` int(10) default NULL,
`currency` varchar(3) default NULL,
`dir_contact` varchar(255) default NULL,
`email` text,
`email_optout` tinyint(1) default NULL,
`email_priv` tinyint(1) default NULL,
`fax` varchar(255) default NULL,
`fax_priv` tinyint(1) default NULL,
`geo` varchar(2) default NULL,
`hanging_file` tinyint(1) default NULL,
`iso_code` varchar(20) default NULL,
`locked_name` varchar(60) default NULL,
`locked_stamp` int(10) default NULL,
`master_id` int(11) default NULL,
`modify_count` int(7) default '0',
`modify_name` varchar(60) default NULL,
`modify_stamp` int(10) default NULL,
`postcode_zip` varchar(50) default NULL,
`rep` varchar(10) default NULL,
`tel` varchar(255) default NULL,
`tel_priv` tinyint(1) default NULL,
`town_city` varchar(255) default NULL,
`vat_number` varchar(20) default NULL,
`view_count` int(7) default '0',
`view_name` varchar(60) default NULL,
`view_stamp` int(10) default NULL,
`website` text,
PRIMARY KEY (`company_id`),
KEY `geo` (`geo`),
KEY `company_name` (`company_name`)
) ENGINE=MyISAM AUTO_INCREMENT=39200 DEFAULT CHARSET=utf8
tags_data
CREATE TABLE `tags_data` (
`tagdata_id` int(11) NOT NULL auto_increment,
`tag_id` int(11) default NULL,
`company_id` varchar(6) default NULL,
`create_name` varchar(60) default NULL,
`create_stamp` int(10) default NULL,
PRIMARY KEY (`tagdata_id`),
KEY `tag_id` (`tag_id`),
KEY `company_id` (`company_id`),
KEY `tags_companies` (`tag_id`,`company_id`)
) ENGINE=MyISAM AUTO_INCREMENT=42205 DEFAULT CHARSET=utf8
tags
CREATE TABLE `tags` (
`tag_id` int(7) NOT NULL default '0',
`category` varchar(255) default NULL,
`description` varchar(255) default NULL,
`name` varchar(255) default NULL,
PRIMARY KEY (`tag_id`),
KEY `category` (`category`),
KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
I'd really appreciate it if someone could take a look through to see how I can speed this up!
Thanks so much, Ben
|
|
|
| Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3730 is a reply to message #3727 ] |
Mon, 24 November 2008 11:44   |
vgatto Messages: 92 Registered: November 2006 |
Member |
|
|
I think your problem is here:
1 SIMPLE tags_data index company_id tags_companies 26 NULL 41930 Using index
The previous line should produce a set of company ids used to join into tags_data. Unfortunately, MySQL is doing an index scan for every row produced from the previous step. Basically, MySQL is evaluating 41930 * 1103 rows to produce your result set. That's over 40M, which explains why things are taking so long.
I think you can clear this up by eliminating the left join from tags_data into tags. Since this is a join table, you really shouldn't have orphaned rows. If you do, they should be easy to clean up.
You may also want another covering index on tags_data, on (company_id, tag_id). This will be used in joins from companies to tags. Your existing index only helps when joining from tags to companies. Since you're using MyISAM, and only indexes are cached, having (company_id, tag_id) instead of just company_id should help things move faster as well.
|
|
| | |
| Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3733 is a reply to message #3727 ] |
Mon, 24 November 2008 15:33   |
batfastad Messages: 6 Registered: November 2008 Location: London, UK |
Junior Member |
|
|
Thanks for the reply.
I just changed that tags on tags_data.tag_id=tags.tag_id into a regular JOIN, rather than LEFT JOIN
And it ran much quicker, but it returned fewer results.
I do want to return companies even if they have no tags selected in tags_data. So I guess that has to stay a LEFT JOIN.
I added that extra covering index between tags_data and companies.
So now I have 2 on there: company_id, tag_id and tag_id, company_id
But the query takes about the same time.
Here's an updated EXPLAIN output after adding that 2nd index, but all still as LEFT JOINS as before:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE contacts ref IX_subscribed IX_subscribed 15 const 1103 Using where; Using temporary; Using filesort
1 SIMPLE companies ref PRIMARY,company_id_geo company_id_geo 4 dwcintranet.contacts.company_id 1
1 SIMPLE tags_data index company_id,companies_tags tags_companies 26 NULL 41930 Using index
1 SIMPLE tags eq_ref PRIMARY,tag_id_name PRIMARY 4 dwcintranet.tags_data.tag_id 1
Any idea on what else I can try indexing or changing?
Would it be quicker to use subquery/subselect instead of 3 joins? I've never tried using those before so not sure if that's possible.
Thanks for all your help so far!
Cheers, B
[Updated on: Mon, 24 November 2008 15:54]
|
|
| | |
| Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3737 is a reply to message #3735 ] |
Tue, 25 November 2008 11:22   |
vgatto Messages: 92 Registered: November 2006 |
Member |
|
|
No index is being used according to that plan, which means that given how much data you have, MySQL thinks a table scan is faster than using the index. It could have used one of those indexes, but it decided not to.
See what the explain looks like for this:
SELECT tags_data.company_id, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR ',') AS `classifications`
FROM tags_data INNER JOIN tags USING(tag_id)
GROUP BY tags_data.company_id
If that's not terrible (the product of the rows columns is not huge), you can try using it to eliminate the last left join.
SELECT `contacts`.`contact_name`, `companies`.`company_name`, company_tags.classifications
FROM `contacts`
LEFT JOIN `companies` USING(`company_id`)
LEFT JOIN
(
SELECT tags_data.company_id, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR ',') AS `classifications`
FROM tags_data INNER JOIN tags USING(tag_id)
GROUP BY tags_data.company_id
) AS company_tags
ON companies.company_id = company_tags.company_id
WHERE `contacts`.`subscribed`='VIP'
However, I'm not sure this is a good idea, since that inner select doesn't get any benefit from the restriction applied to contacts and therefor companies. It will always load all your tags, which might be fine if that's the common case.
I also didn't try either of these myself, so you might have to clean up any syntactic problems I have.
The other thing you should consider since you're implementing tags and you're already using MyISAM, is using fulltext search. You'd end up just adding a text column to companies that contains a comma separated list of tags and creating a fulltext index on that column.
The query you're trying to write will then stop after the left join into companies, and your classifications will already be available to you without any grouping. As an added bonus, searching by tags, especially a combination of tags, can be done through the fulltext index instead of some awful looking query which joins the tags table over and over.
You'll need to change your code to maintain this column, which is probably best done outside of SQL in whatever language you're already using for your application.
|
|
|
| Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3741 is a reply to message #3737 ] |
Wed, 26 November 2008 07:51  |
batfastad Messages: 6 Registered: November 2008 Location: London, UK |
Junior Member |
|
|
Hi vgatto
I tried out your suggestion, and it returned the desired records quickly. However it didn't seem to scale that well for larger datasets, but I'll be keeping that in the bank.
What's the terminology for that type of query?
Is that a sub-select?
I did actually solve my problem with the 3 LEFT JOINs.
On comp.databases.mysql someone noticed that I'd made an error in my table structure.
companies.company_id was INT(11)
tags_data.company_id was VARCHAR(6)
A while ago I did have company_id as VARCHAR(6) for compatibility with our filemaker solution, but I changed and forgot to change it in tags_data
Making that one change made the original query run at 0.12 seconds, rather than 32 seconds!!!
Thanks so much, B
|
|
|
Goto Forum:
Current Time: Sun Jul 5 17:27:25 EDT 2009
Total time taken to generate the page: 0.00855 seconds |