Home » Performance » MySQL » 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort
3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3727] Sun, 23 November 2008 13:09 Go to next message
batfastad  is currently offline batfastad
Messages: 6
Registered: November 2008
Location: London, UK
Junior Member
Hi everyone

I'm relatively new to MySQL and gradually learning new stuff, but I'm having a problem with a particular query. I've asked around in the PHP DevNet forums and it seems I should be getting much better performance on this query, as my tables are pretty small and the query's not particularly advanced.
The structure is nicely normalised, something I've wanted to have for years. I'm gradually switching our company intranet subscribers database from filemaker to MySQL.
My server is Ubuntu Server 8.04 LTS, MySQL5, PHP5 and Apache2. On a pretty recent Intel Core 2 server with 4GB memory.

I'd really appreciate it if someone could give me some advice on this, as it's been driving me crazy!

contacts has about 36,000 records
companies: 38,000
tags_data: 42,000
tags: 980
My total DB size is about 26MiB, so it's tiny in comparison to many MySQL databases. I think this query should not be taking 32 seconds to complete!

Basically the tags table has various tags/classifications which can be applied to companies. So you can tag a company as MANUFACTURER, DISTRIBUTOR, SHOW EVENT ORGANISER... etc and there's 980 different tags/classifications to choose from. Each company can have multiple tags applied, and the tags_data table stores which tags are applied to which company.

Here's a quick diagram of part of the database which should help to visualise
http://www.highbeeches.plus.com/devnet_dbstructure.png
green bold are the primary key columns. Red are the columns which link the data together.

Here's the query I'm trying to run:
SELECT `contacts`.`contact_name`, `companies`.`company_name`, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR ',') AS `classifications`
FROM `contacts`
LEFT JOIN `companies` USING(`company_id`)
LEFT JOIN `tags_data` ON `companies`.`company_id`=`tags_data`.`company_id`
LEFT JOIN `tags` ON `tags_data`.`tag_id`=`tags`.`tag_id`
WHERE `contacts`.`subscribed`='VIP'
GROUP BY `contacts`.`contact_id`

The syntax on that might be improved I guess, but I don't know how to do that. That's the limit of my MySQL knowledge really.

I'm trying to get contact_name, company_name and a column called classifications which is a comma separated list of the tags/classifications names that are selected for that particular company.
contacts.subscribed can have 3 different values: NULL, FREE or VIP

This query only returns 777 records, but I'll be looking to return up to 20,000 with a variation of this query if I can get it running faster!
Initially this query took about 80 seconds, but I added a composite index (only just learnt about them) on the tags_data table (index called tags_companies) which reduced it to about 30 seconds.
I removed the ORDER BY or LIMIT from the query earlier as well, but it made very little difference.

Here's the output of EXPLAIN on this query:
id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	SIMPLE 	contacts 	ref 	IX_subscribed 	IX_subscribed 	15 	const 	1103 	Using where; Using temporary; Using filesort
1 	SIMPLE 	companies 	eq_ref 	PRIMARY 	PRIMARY 	4 	dwcintranet.contacts.company_id 	1 	 
1 	SIMPLE 	tags_data 	index 	company_id 	tags_companies 	26 	NULL 	41930 	Using index
1 	SIMPLE 	tags 	eq_ref 	PRIMARY 	PRIMARY 	4 	dwcintranet.tags_data.tag_id 	1 	 


So I think the problem is the Using filesort. I've tried taking the query apart to eliminate the filesort, and it seems it's the 2nd join that's causing the filesort: tags_data ON companies
But I don't know what else I could index to sort this out

Any ideas/suggestions?
I would really appreciate some help on this, I'm convinced that this query shouldn't be taking 30 seconds!

Thanks, Ben

[Updated on: Sun, 23 November 2008 13:18]

Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3728 is a reply to message #3727 ] Sun, 23 November 2008 13:14 Go to previous messageGo to next message
batfastad  is currently offline batfastad
Messages: 6
Registered: November 2008
Location: London, UK
Junior Member
Here's the SHOW CREATE TABLE of the 4 tables

contacts
CREATE TABLE `contacts` (
 `contact_id` int(11) NOT NULL auto_increment,
 `company_id` int(11) default NULL,
 `address1_contact` varchar(255) default NULL,
 `address2_contact` varchar(255) default NULL,
 `address3_contact` varchar(255) default NULL,
 `business_card` varchar(2) default NULL,
 `contact_id_old` int(20) default NULL,
 `contact_name` varchar(255) default NULL,
 `county_state_contact` varchar(100) default NULL,
 `create_name` varchar(60) default NULL,
 `create_stamp` int(10) default NULL,
 `email_contact` varchar(255) default NULL,
 `email_contact_optout` tinyint(1) default NULL,
 `fax_contact` varchar(255) default NULL,
 `geo_contact` varchar(255) default NULL,
 `job_title` varchar(255) default NULL,
 `subscribed` varchar(4) default NULL,
 `mail_contact` tinyint(1) default NULL,
 `modify_count` int(7) default '0',
 `modify_name` varchar(60) default NULL,
 `modify_stamp` int(10) default NULL,
 `note` text,
 `postcode_zip_contact` varchar(50) default NULL,
 `reg_stamp` int(10) default NULL,
 `tel_cell` varchar(255) default NULL,
 `tel_contact` varchar(255) default NULL,
 `tel_ext` varchar(255) default NULL,
 `tel_home` varchar(255) default NULL,
 `temp` varchar(50) default NULL,
 `town_city_contact` varchar(100) default NULL,
 `view_count` int(7) default '0',
 `view_name` varchar(60) default NULL,
 `view_stamp` int(10) default NULL,
 PRIMARY KEY  (`contact_id`),
 KEY `company_id` (`company_id`),
 KEY `contact_name` (`contact_name`),
 KEY `IX_subscribed` (`subscribed`),
 KEY `companies_contacts` (`company_id`,`contact_id`),
 KEY `contacts_companies` (`contact_id`,`company_id`)
) ENGINE=MyISAM AUTO_INCREMENT=37645 DEFAULT CHARSET=utf8



companies
CREATE TABLE `companies` (
 `company_id` int(11) NOT NULL auto_increment,
 `ad_lead` tinyint(1) default NULL,
 `address_priv` tinyint(1) default NULL,
 `address1` varchar(255) default NULL,
 `address2` varchar(255) default NULL,
 `address3` varchar(255) default NULL,
 `advisory` varchar(255) default NULL,
 `advisory_name` varchar(60) default NULL,
 `advisory_stamp` int(10) default NULL,
 `agency_address` varchar(255) default NULL,
 `agency_company` varchar(255) default NULL,
 `agency_contact` varchar(255) default NULL,
 `agency_email` varchar(255) default NULL,
 `agency_fax` varchar(255) default NULL,
 `agency_id` int(11) default NULL,
 `agency_tel` varchar(255) default NULL,
 `budget_year1` varchar(3) default NULL,
 `budget_year2` varchar(3) default NULL,
 `company_name` varchar(255) default NULL,
 `county_state` varchar(100) default NULL,
 `create_name` varchar(60) default NULL,
 `create_stamp` int(10) default NULL,
 `currency` varchar(3) default NULL,
 `dir_contact` varchar(255) default NULL,
 `email` text,
 `email_optout` tinyint(1) default NULL,
 `email_priv` tinyint(1) default NULL,
 `fax` varchar(255) default NULL,
 `fax_priv` tinyint(1) default NULL,
 `geo` varchar(2) default NULL,
 `hanging_file` tinyint(1) default NULL,
 `iso_code` varchar(20) default NULL,
 `locked_name` varchar(60) default NULL,
 `locked_stamp` int(10) default NULL,
 `master_id` int(11) default NULL,
 `modify_count` int(7) default '0',
 `modify_name` varchar(60) default NULL,
 `modify_stamp` int(10) default NULL,
 `postcode_zip` varchar(50) default NULL,
 `rep` varchar(10) default NULL,
 `tel` varchar(255) default NULL,
 `tel_priv` tinyint(1) default NULL,
 `town_city` varchar(255) default NULL,
 `vat_number` varchar(20) default NULL,
 `view_count` int(7) default '0',
 `view_name` varchar(60) default NULL,
 `view_stamp` int(10) default NULL,
 `website` text,
 PRIMARY KEY  (`company_id`),
 KEY `geo` (`geo`),
 KEY `company_name` (`company_name`)
) ENGINE=MyISAM AUTO_INCREMENT=39200 DEFAULT CHARSET=utf8


tags_data
CREATE TABLE `tags_data` (
 `tagdata_id` int(11) NOT NULL auto_increment,
 `tag_id` int(11) default NULL,
 `company_id` varchar(6) default NULL,
 `create_name` varchar(60) default NULL,
 `create_stamp` int(10) default NULL,
 PRIMARY KEY  (`tagdata_id`),
 KEY `tag_id` (`tag_id`),
 KEY `company_id` (`company_id`),
 KEY `tags_companies` (`tag_id`,`company_id`)
) ENGINE=MyISAM AUTO_INCREMENT=42205 DEFAULT CHARSET=utf8


tags
CREATE TABLE `tags` (
 `tag_id` int(7) NOT NULL default '0',
 `category` varchar(255) default NULL,
 `description` varchar(255) default NULL,
 `name` varchar(255) default NULL,
 PRIMARY KEY  (`tag_id`),
 KEY `category` (`category`),
 KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8


I'd really appreciate it if someone could take a look through to see how I can speed this up!

Thanks so much, Ben
Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3730 is a reply to message #3727 ] Mon, 24 November 2008 11:44 Go to previous messageGo to next message
vgatto  is currently offline vgatto
Messages: 92
Registered: November 2006
Member
I think your problem is here:

1 	SIMPLE 	tags_data 	index 	company_id 	tags_companies 	26 	NULL 	41930 	Using index


The previous line should produce a set of company ids used to join into tags_data. Unfortunately, MySQL is doing an index scan for every row produced from the previous step. Basically, MySQL is evaluating 41930 * 1103 rows to produce your result set. That's over 40M, which explains why things are taking so long.

I think you can clear this up by eliminating the left join from tags_data into tags. Since this is a join table, you really shouldn't have orphaned rows. If you do, they should be easy to clean up.

You may also want another covering index on tags_data, on (company_id, tag_id). This will be used in joins from companies to tags. Your existing index only helps when joining from tags to companies. Since you're using MyISAM, and only indexes are cached, having (company_id, tag_id) instead of just company_id should help things move faster as well.


Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3731 is a reply to message #3730 ] Mon, 24 November 2008 11:53 Go to previous messageGo to next message
batfastad  is currently offline batfastad
Messages: 6
Registered: November 2008
Location: London, UK
Junior Member
Thanks for the reply.
I figured that's what was happening behind the scenes.

I think I need the LEFT JOIN from tags_data into tags, because I want to access the tags.name field in my GROUP CONCAT()

Do I still need that particular LEFT JOIN to accomplish that?

Cheers, B
Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3732 is a reply to message #3731 ] Mon, 24 November 2008 13:25 Go to previous messageGo to next message
vgatto  is currently offline vgatto
Messages: 92
Registered: November 2006
Member
No, an inner join there should be fine. Basically, with the left join from tags_data to tags, you're making sure rows from tags_data without matching rows in tags are included. I can't think of any reason why a row from tags_data has a tag_id which is not present in tags.

Right now, here's the degenerate rows that can be returned by your query:

A valid contact with no company and no tags
A valid contact with a valid company and no tags
A valid contact with a valid company, rows in tags_data but no tags

If you're certain this is what you want, then your left joins are correct. Otherwise, convert the left joins to inner joins and you'll save yourself some processing.


Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3733 is a reply to message #3727 ] Mon, 24 November 2008 15:33 Go to previous messageGo to next message
batfastad  is currently offline batfastad
Messages: 6
Registered: November 2008
Location: London, UK
Junior Member
Thanks for the reply.

I just changed that tags on tags_data.tag_id=tags.tag_id into a regular JOIN, rather than LEFT JOIN
And it ran much quicker, but it returned fewer results.

I do want to return companies even if they have no tags selected in tags_data. So I guess that has to stay a LEFT JOIN.

I added that extra covering index between tags_data and companies.
So now I have 2 on there: company_id, tag_id and tag_id, company_id

But the query takes about the same time.

Here's an updated EXPLAIN output after adding that 2nd index, but all still as LEFT JOINS as before:
id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	SIMPLE 	contacts 	ref 	IX_subscribed 	IX_subscribed 	15 	const 	1103 	Using where; Using temporary; Using filesort
1 	SIMPLE 	companies 	ref 	PRIMARY,company_id_geo 	company_id_geo 	4 	dwcintranet.contacts.company_id 	1 	 
1 	SIMPLE 	tags_data 	index 	company_id,companies_tags 	tags_companies 	26 	NULL 	41930 	Using index
1 	SIMPLE 	tags 	eq_ref 	PRIMARY,tag_id_name 	PRIMARY 	4 	dwcintranet.tags_data.tag_id 	1 	 


Any idea on what else I can try indexing or changing?
Would it be quicker to use subquery/subselect instead of 3 joins? I've never tried using those before so not sure if that's possible.

Thanks for all your help so far!
Cheers, B

[Updated on: Mon, 24 November 2008 15:54]

Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3734 is a reply to message #3733 ] Mon, 24 November 2008 18:46 Go to previous messageGo to next message
vgatto  is currently offline vgatto
Messages: 92
Registered: November 2006
Member
I'd be curious to see the results of this:

EXPLAIN SELECT companies.company_name, tags_data.*
FROM companies LEFT JOIN tags_data USING(company_id)


If the plan for that query chooses the (tag_id, company_id) index, then I'd try running ANALYZE TABLE to make sure the index statistics are up to date.

Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3735 is a reply to message #3727 ] Tue, 25 November 2008 07:47 Go to previous messageGo to next message
batfastad  is currently offline batfastad
Messages: 6
Registered: November 2008
Location: London, UK
Junior Member
I ran that EXPLAIN and here's the output:

id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	SIMPLE 	companies 	ALL 	NULL 	NULL 	NULL 	NULL 	36557 	 
1 	SIMPLE 	tags_data 	ALL 	company_id,companies_tags 	NULL 	NULL 	NULL 	41930 	 


So it looks like it's using the company_id,tag_id index (companies_tags).
Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3737 is a reply to message #3735 ] Tue, 25 November 2008 11:22 Go to previous messageGo to next message
vgatto  is currently offline vgatto
Messages: 92
Registered: November 2006
Member
No index is being used according to that plan, which means that given how much data you have, MySQL thinks a table scan is faster than using the index. It could have used one of those indexes, but it decided not to.


See what the explain looks like for this:

   SELECT tags_data.company_id, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR ',') AS `classifications`
    FROM tags_data INNER JOIN tags USING(tag_id)
    GROUP BY tags_data.company_id



If that's not terrible (the product of the rows columns is not huge), you can try using it to eliminate the last left join.

SELECT `contacts`.`contact_name`, `companies`.`company_name`, company_tags.classifications
FROM `contacts`
LEFT JOIN `companies` USING(`company_id`)
LEFT JOIN 
  (
    SELECT tags_data.company_id, GROUP_CONCAT(`tags`.`name` ORDER BY `tags`.`name` ASC SEPARATOR ',') AS `classifications`
    FROM tags_data INNER JOIN tags USING(tag_id)
    GROUP BY tags_data.company_id
  ) AS company_tags
ON companies.company_id = company_tags.company_id
WHERE `contacts`.`subscribed`='VIP'


However, I'm not sure this is a good idea, since that inner select doesn't get any benefit from the restriction applied to contacts and therefor companies. It will always load all your tags, which might be fine if that's the common case.

I also didn't try either of these myself, so you might have to clean up any syntactic problems I have.

The other thing you should consider since you're implementing tags and you're already using MyISAM, is using fulltext search. You'd end up just adding a text column to companies that contains a comma separated list of tags and creating a fulltext index on that column.

The query you're trying to write will then stop after the left join into companies, and your classifications will already be available to you without any grouping. As an added bonus, searching by tags, especially a combination of tags, can be done through the fulltext index instead of some awful looking query which joins the tags table over and over.

You'll need to change your code to maintain this column, which is probably best done outside of SQL in whatever language you're already using for your application.





Re: 3 way LEFT JOIN with GROUP CONCAT, eliminate using filesort [message #3741 is a reply to message #3737 ] Wed, 26 November 2008 07:51 Go to previous message
batfastad  is currently offline batfastad
Messages: 6
Registered: November 2008
Location: London, UK
Junior Member
Hi vgatto

I tried out your suggestion, and it returned the desired records quickly. However it didn't seem to scale that well for larger datasets, but I'll be keeping that in the bank.

What's the terminology for that type of query?
Is that a sub-select?

I did actually solve my problem with the 3 LEFT JOINs.
On comp.databases.mysql someone noticed that I'd made an error in my table structure.

companies.company_id was INT(11)
tags_data.company_id was VARCHAR(6)

A while ago I did have company_id as VARCHAR(6) for compatibility with our filemaker solution, but I changed and forgot to change it in tags_data
Making that one change made the original query run at 0.12 seconds, rather than 32 seconds!!!

Thanks so much, B
Previous Topic:Need help with under-performing query: using where, temporary and filesort
Next Topic:MYSQL DISTINCT
Goto Forum:
  


Current Time: Sun Jul 5 17:27:25 EDT 2009

Total time taken to generate the page: 0.00855 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 2.7.7.
Copyright ©2001-2007 FUD Forum Bulletin Board Software

MySQL is a trademark of Sun Microsystems.
InnoDB is a trademark of Oracle Corp.

Percona Performance Forums are a service of Percona, Inc.
Not affiliated with Sun Microsystems or Oracle Corp.