Home » Performance » MySQL » Query Optimization (Large Results)
Query Optimization (Large Results) [message #1901] Wed, 19 September 2007 08:08 Go to next message
MonkeyTech  is currently offline MonkeyTech
Messages: 4
Registered: September 2007
Junior Member
Hey,

Basically what i have is several large tables (14mill records or more each), people send in smaller tables (usually around 20 - 50,000 records) and we flag their records that match ours.

Here is an example of the largest table (16mill records):

Table Layout:
DROP TABLE IF EXISTS `dmd`.`gas`;
CREATE TABLE  `dmd`.`gas` (
  `FORENAME` varchar(20) NOT NULL,
  `SURNAME` varchar(20) NOT NULL,
  `ADDR1` varchar(30) NOT NULL,
  `ADDR2` varchar(30) NOT NULL,
  `ADDR3` varchar(30) NOT NULL,
  `TOWN` varchar(30) NOT NULL,
  `COUNTY` varchar(30) NOT NULL,
  `POSTCODE` varchar(8) NOT NULL,
  `URN` varchar(10) NOT NULL,
  PRIMARY KEY `POSTCODE` (`POSTCODE`),
  KEY `SURNAME` (`SURNAME`),
  KEY `ADDR1` (`ADDR1`),
  KEY `FORENAME` (`FORENAME`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1;

Keyname  	Type  	Cardinality  	Field
FORENAME 	INDEX 	42992  	        FORENAME
SURNAME 	INDEX 	142554   	SURNAME
ADDR1 	        INDEX 	2708531   	ADDRESS1
POSTCODE 	INDEX 	902843   	POSTCODE


A quick note on this, the URNS are used by the company that supply the data, they're effectively useless as our client data doesn't usually have a URN and were required to search on:
- POSTCODE
- First letter of FORENAME
- Complete SURNAME
- First 5 letters of address
Due to the quality of the data were sent.

Anyway, right now the query im using is:
CREATE TEMPORARY TABLE duplicates
	SELECT companytable.POSTCODE FROM companytable
	INNER JOIN GAS ON GAS.POSTCODE=companytable.POSTCODE 
	AND (LEFT(GAS.FORENAME, 1)=LEFT(companytable.FORENAME, 1))
	AND (GAS.SURNAME=companytable.SURNAME)
	AND (LEFT(GAS.ADDR1, 5)=LEFT(companytable.ADDRESS1, 5));
	UPDATE jewelry SET DUPE=1 WHERE POSTCODE IN (SELECT POSTCODE FROM duplicates);
DROP TABLE duplicates;

id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	SIMPLE 	companytable 	ALL 	NULL 	NULL 	NULL 	NULL 	80000 	 
1 	SIMPLE 	GAS 	ref 	POSTCODE,SURNAME 	POSTCODE 	10 	table.companytable.POSTCODE 	11 	Using where


Any help to optimize this query/table would be greatly appreciated, I've been going through the seminars and a million different sites for optimization.

The query does run fairly fast considering the records but it can hang "sending.." while selecting the data for the temp table and obviously i don't feel its quite up to par with the standards that it could be.

On another note about the data, were sent an amended version of the "suppression" tables monthly, they're the same layout but they're usually slightly bigger (around 20,000 or so records bigger).

Anyway, recent ideas I've had are:
- Replace LEFT() functions with INDEX(column())
- Partitioning the table (moving fields not checked against to another table for use when absolutely required)
- Possibly splitting the tables down into smaller tables
- Obviously tweaking server variables to fit such large queries
- Changing Hard-drive to RAID5 (for the reading speed, when updating the smaller table its usually no more than 1000 records changed)

Again, thanks for any suggestions/help in advance!

[Updated on: Wed, 19 September 2007 08:16]

Re: Query Optimization (Large Results) [message #1904 is a reply to message #1901 ] Wed, 19 September 2007 10:08 Go to previous messageGo to next message
allworknoplay  is currently offline allworknoplay
Messages: 58
Registered: September 2007
Location: New York
Member
Although I'm not a pro, I think it would be useful to have the specs of your hardware, operating system and your current configuration so we can see how your buffers/variables are configured...
Re: Query Optimization (Large Results) [message #1905 is a reply to message #1901 ] Wed, 19 September 2007 11:04 Go to previous messageGo to next message
MonkeyTech  is currently offline MonkeyTech
Messages: 4
Registered: September 2007
Junior Member
At the moment i've got it sat on a test server so im not expecting optimal performance from the hardware but eventualy when we upgrade to enterprise we'l be moving it into our main server room onto a IBM System x3850 and runing the sql with perl remotely.

Same applies to the variables, theyre default right now.


Im really looking more toward optimization of the tables and queries, im sure theres alot i can do, id just like advice from someone with perhaps more knowlage that me and a little direction - as i said it runs fairly efficiently as is but im always looking for ways to improve my work.
Re: Query Optimization (Large Results) [message #1906 is a reply to message #1905 ] Wed, 19 September 2007 11:32 Go to previous messageGo to next message
allworknoplay  is currently offline allworknoplay
Messages: 58
Registered: September 2007
Location: New York
Member
Well your company table seems to have a TYPE of ALL. So I don't think you want that, you'd want to have that as specific as possible. Even with that, it's only looking at 80,000 records so that's not that bad, but from what I remember, you don't want a Type=ALL since that will scan everything.

What does yout companytable look like?
Re: Query Optimization (Large Results) [message #1907 is a reply to message #1901 ] Wed, 19 September 2007 11:38 Go to previous message
MonkeyTech  is currently offline MonkeyTech
Messages: 4
Registered: September 2007
Junior Member
Its just the table im checking again, the format can and will change per client but the table is currently:

CREATE TABLE `company` (
  `BigBook` varchar(10) NOT NULL,
  `BBOOK` varchar(6) NOT NULL,
  `TAB` varchar(10) NOT NULL,
  `URN` int(30) NOT NULL auto_increment,
  `Title` varchar(5) NOT NULL,
  `forename` varchar(25) NOT NULL,
  `surname` varchar(25) NOT NULL,
  `ADDRESS1` varchar(40) NOT NULL,
  `ADDRESS2` varchar(40) NOT NULL,
  `ADDRESS3` varchar(40) NOT NULL,
  `ADDRESS4` varchar(40) NOT NULL,
  `ADDRESS5` varchar(40) NOT NULL,
  `ADDRESS6` varchar(40) NOT NULL,
  `POSTCODE` varchar(8) NOT NULL,
  `DOB` varchar(20) NOT NULL,
  PRIMARY KEY  (`URN`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=80002 ;


bad practice to put the primary on a URN i know, i just quickly wrote the statement before i stumbled on this forum.

One obvious problem i can see is using LEFT() is stopping mysql from using the indexes on ADDR1 and FORENAME, i've thought about restricting the size of the index on the column to the equivilent values that im using in LEFT() at the moment but i wonder if it will just search the whole word rather than the index.

[Updated on: Wed, 19 September 2007 11:43]

Previous Topic:mysql memory consumption, memory leak ?
Next Topic:View today / this week / this month / all time
Goto Forum:
  


Current Time: Fri Jul 10 03:21:25 EDT 2009

Total time taken to generate the page: 0.02746 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 2.7.7.
Copyright ©2001-2007 FUD Forum Bulletin Board Software

MySQL is a trademark of Sun Microsystems.
InnoDB is a trademark of Oracle Corp.

Percona Performance Forums are a service of Percona, Inc.
Not affiliated with Sun Microsystems or Oracle Corp.