Home » Performance » MySQL » Script to analyze all entries in a table and suggest optimal column data types.
Script to analyze all entries in a table and suggest optimal column data types. [message #1494] Wed, 27 June 2007 16:27 Go to next message
RichardBronosky  is currently offline RichardBronosky
Messages: 20
Registered: June 2007
Junior Member
I've been working on a bash script that does this. I thought it was going to be really short and simple, but it is turning out to be quite complex. Before I continue with it, I want to know if such a thing already exists.

Basically I do a lot of queries to test each column like:
"select 'false.', \`$field\` from $TABLE where lpad(cast(0+\`$field\` as binary), length(\`$field\`), '0') != \`$field\` limit 1;"

Which tests to see if the data is an INT ZEROFILL.

Is there already a solution for this? If not, would anyone else be interested in helping to develop this? I intended to create a source forge project for it once I got a working version. But, I'm not finding the time to complete it.



.+# Richard Bronosky #+.
MySQL DBA, MythTV User
Re: Script to analyze all entries in a table and suggest optimal column data types. [message #1504 is a reply to message #1494 ] Fri, 29 June 2007 13:44 Go to previous messageGo to next message
RichardBronosky  is currently offline RichardBronosky
Messages: 20
Registered: June 2007
Junior Member

Whould you be interested in the tool described here?[ 0 vote(s) ]
1.Yes, sounds cool. I'd like to help. 0 / 0%
2.Yes, sounds cool. I'd like to try it when you get it working. 0 / 0%
3.No, I do think it would work. 0 / 0%
4.No, your are a moron. Can't you just do a select * and do that in your head while 2 million rows of a 500 column wide table scroll by? Evil or Very Mad 0 / 0%
5.No, there is already a good tool for doing this. (please leave a link) 0 / 0%
6.Not unless you write it in lanuage... (leave a comment) 0 / 0%

Okay, since no one is interested in helping me with it.... Is anyone interested in using it when it is done?

Here is the basic outline. For each column in the table, analyze the values:

  1. Decide if the column is currently BINARY.
  2. Get the length of the longest value.
  3. If the values are numeric, then...
    1. Decide whether they require zerofill.
    2. Decide if they are unsigned.
    3. If they are all integers, decide what size is the best fit. Else...
    4. If they are [fixed point] DECIMAL, decide their significant digits and number of decimal places. Else...
    5. Decide if DOUBLE or single precision FLOAT is required.
    6. (optionally) Decide the significant digits and number of decimal places that best fit.
  4. If the values are temporal
    1. Decide which is the best fit. (YEAR, DATE, TIME, TIMESTAMP, or DATETIME in that order of precedence)
  5. If an ENUM would be appropriate, suggest it.
  6. If a CHAR would be most efficient, suggest it (BINARY is needed).
  7. If the length you came up with first is < 255 (or 65535 post-MySQL 5.0.3) use VARCHAR (VARBINARY is needed). Else...
  8. Find the TEXT/BLOB size that best fits.


What's missing:
  1. BIT type (I don't use it)
  2. SET type (I don't want to code it)
  3. Spatial Types (I hope I never need it)


I hope that draws some interest. I'll take your continued silence to mean that I'm an idiot and the only person in the community who would like a tool for doing this. Very Happy



.+# Richard Bronosky #+.
MySQL DBA, MythTV User
Re: Script to analyze all entries in a table and suggest optimal column data types. [message #1505 is a reply to message #1494 ] Sun, 01 July 2007 14:03 Go to previous messageGo to next message
Speeple  is currently offline Speeple
Messages: 91
Registered: August 2006
Member
The path I would take here is parsing the table schema, then contrasting the data type definitions with the actual data in the table.


Martin Gallagher | Speeple: The latest news
Re: Script to analyze all entries in a table and suggest optimal column data types. [message #1507 is a reply to message #1505 ] Sun, 01 July 2007 19:54 Go to previous messageGo to next message
RichardBronosky  is currently offline RichardBronosky
Messages: 20
Registered: June 2007
Junior Member
This is a good idea. It would be worth adding.

My original purpose for the script was to serve my own need. I'm currently working for a company whose idea of data storage was delimited text files or excel files. I've loaded dozens of tables into MySQL from comma/tab delimited files. In the interest of speed, I generally create the tables with all varchar fields. These tables are currently being used by individuals in the company. When it is decided that an application is going to be built against data in one of these tables, I either optimize the table or I pull the needed columns out of many tables and build one streamlined table.

I'd like to be able to take all these nasty tables that I have (and continue to create) and clean them up. So, as it stands, i don't much care about the current schema. I know what it is, and I know it is bad.



.+# Richard Bronosky #+.
MySQL DBA, MythTV User
Re: Script to analyze all entries in a table and suggest optimal column data types. [message #1539 is a reply to message #1494 ] Sun, 15 July 2007 07:53 Go to previous messageGo to next message
roman  is currently offline roman
Messages: 1
Registered: July 2007
Junior Member
I think, You probably can start with use of this query:
SELECT * FROM tbl PROCEDURE ANALYSE()
Re: Script to analyze all entries in a table and suggest optimal column data types. [message #1556 is a reply to message #1539 ] Wed, 25 July 2007 11:30 Go to previous message
RichardBronosky  is currently offline RichardBronosky
Messages: 20
Registered: June 2007
Junior Member
Unfortunately that almost always just suggests that I make everything an ENUM. I think that should be a REALLY powerful tool, but turns out to be basically useless. Sad



.+# Richard Bronosky #+.
MySQL DBA, MythTV User
Previous Topic:Slow one-row INSERTs
Next Topic:HELP: mysqld Threads and Memory Usage Reporting
Goto Forum:
  


Current Time: Sun Jul 5 13:09:45 EDT 2009

Total time taken to generate the page: 0.01299 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 2.7.7.
Copyright ©2001-2007 FUD Forum Bulletin Board Software

MySQL is a trademark of Sun Microsystems.
InnoDB is a trademark of Oracle Corp.

Percona Performance Forums are a service of Percona, Inc.
Not affiliated with Sun Microsystems or Oracle Corp.