Home » Performance » MySQL » Query Optimization (Large Results)
| Query Optimization (Large Results) [message #1901] |
Wed, 19 September 2007 08:08  |
MonkeyTech Messages: 4 Registered: September 2007 |
Junior Member |
|
|
Hey,
Basically what i have is several large tables (14mill records or more each), people send in smaller tables (usually around 20 - 50,000 records) and we flag their records that match ours.
Here is an example of the largest table (16mill records):
Table Layout:
DROP TABLE IF EXISTS `dmd`.`gas`;
CREATE TABLE `dmd`.`gas` (
`FORENAME` varchar(20) NOT NULL,
`SURNAME` varchar(20) NOT NULL,
`ADDR1` varchar(30) NOT NULL,
`ADDR2` varchar(30) NOT NULL,
`ADDR3` varchar(30) NOT NULL,
`TOWN` varchar(30) NOT NULL,
`COUNTY` varchar(30) NOT NULL,
`POSTCODE` varchar(8) NOT NULL,
`URN` varchar(10) NOT NULL,
PRIMARY KEY `POSTCODE` (`POSTCODE`),
KEY `SURNAME` (`SURNAME`),
KEY `ADDR1` (`ADDR1`),
KEY `FORENAME` (`FORENAME`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=1;
Keyname Type Cardinality Field
FORENAME INDEX 42992 FORENAME
SURNAME INDEX 142554 SURNAME
ADDR1 INDEX 2708531 ADDRESS1
POSTCODE INDEX 902843 POSTCODE
A quick note on this, the URNS are used by the company that supply the data, they're effectively useless as our client data doesn't usually have a URN and were required to search on:
- POSTCODE
- First letter of FORENAME
- Complete SURNAME
- First 5 letters of address
Due to the quality of the data were sent.
Anyway, right now the query im using is:
CREATE TEMPORARY TABLE duplicates
SELECT companytable.POSTCODE FROM companytable
INNER JOIN GAS ON GAS.POSTCODE=companytable.POSTCODE
AND (LEFT(GAS.FORENAME, 1)=LEFT(companytable.FORENAME, 1))
AND (GAS.SURNAME=companytable.SURNAME)
AND (LEFT(GAS.ADDR1, 5)=LEFT(companytable.ADDRESS1, 5));
UPDATE jewelry SET DUPE=1 WHERE POSTCODE IN (SELECT POSTCODE FROM duplicates);
DROP TABLE duplicates;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE companytable ALL NULL NULL NULL NULL 80000
1 SIMPLE GAS ref POSTCODE,SURNAME POSTCODE 10 table.companytable.POSTCODE 11 Using where
Any help to optimize this query/table would be greatly appreciated, I've been going through the seminars and a million different sites for optimization.
The query does run fairly fast considering the records but it can hang "sending.." while selecting the data for the temp table and obviously i don't feel its quite up to par with the standards that it could be.
On another note about the data, were sent an amended version of the "suppression" tables monthly, they're the same layout but they're usually slightly bigger (around 20,000 or so records bigger).
Anyway, recent ideas I've had are:
- Replace LEFT() functions with INDEX(column())
- Partitioning the table (moving fields not checked against to another table for use when absolutely required)
- Possibly splitting the tables down into smaller tables
- Obviously tweaking server variables to fit such large queries
- Changing Hard-drive to RAID5 (for the reading speed, when updating the smaller table its usually no more than 1000 records changed)
Again, thanks for any suggestions/help in advance!
[Updated on: Wed, 19 September 2007 08:16]
|
|
| | | |
| Re: Query Optimization (Large Results) [message #1907 is a reply to message #1901 ] |
Wed, 19 September 2007 11:38  |
MonkeyTech Messages: 4 Registered: September 2007 |
Junior Member |
|
|
Its just the table im checking again, the format can and will change per client but the table is currently:
CREATE TABLE `company` (
`BigBook` varchar(10) NOT NULL,
`BBOOK` varchar(6) NOT NULL,
`TAB` varchar(10) NOT NULL,
`URN` int(30) NOT NULL auto_increment,
`Title` varchar(5) NOT NULL,
`forename` varchar(25) NOT NULL,
`surname` varchar(25) NOT NULL,
`ADDRESS1` varchar(40) NOT NULL,
`ADDRESS2` varchar(40) NOT NULL,
`ADDRESS3` varchar(40) NOT NULL,
`ADDRESS4` varchar(40) NOT NULL,
`ADDRESS5` varchar(40) NOT NULL,
`ADDRESS6` varchar(40) NOT NULL,
`POSTCODE` varchar(8) NOT NULL,
`DOB` varchar(20) NOT NULL,
PRIMARY KEY (`URN`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=80002 ;
bad practice to put the primary on a URN i know, i just quickly wrote the statement before i stumbled on this forum.
One obvious problem i can see is using LEFT() is stopping mysql from using the indexes on ADDR1 and FORENAME, i've thought about restricting the size of the index on the column to the equivilent values that im using in LEFT() at the moment but i wonder if it will just search the whole word rather than the index.
[Updated on: Wed, 19 September 2007 11:43]
|
|
|
Goto Forum:
Current Time: Fri Jul 10 04:35:32 EDT 2009
Total time taken to generate the page: 0.01211 seconds |