Thursday, July 23, 2009

ctas for mysql and postgres - create table as select ...

'Create table as select...' is a useful way of defining a new table based on some stuff already in your database.

It might be that you have just started work on an existing system and your fresh eyes can see a table that can be normalised, or perhaps you are told that some of the tables are bloated with obselete columns.

You might use 'create table as select' (CTAS) in the same way you might rework a file on your filesystem by going through copy, edit, move (where the final move replaces the original)
[ So what you did on the way was create a temporary copy and lastly overwriting the original ]

Completing the analogy - in database terms CTAS might create your temporary table and you might look at how to change the schema, so as to ignore the original copy and use your CTAS created table in its place.

Temporary tables for whatever purpose can be created using CTAS or the lesser preferred 'select into'. From postgres documentation: CREATE TABLE AS is the recommended syntax.




The relevant portion of the mysql syntax is
select_statement:[IGNORE | REPLACE] [AS] SELECT ...   (Some legal select statement)
which is not so easy to find amongst the complex syntactical explanation on the main documentation page.

For perl types who have csv data on the filesystem and want to use CTAS look at DBD::CSV module.

And now for some simple examples using mysql and the sample data from the recent postings:

My suggestion is to just do a standalone select first:

select model,speed_power_ratio,process_comments from amd_bang_per_watt where model not like '%¹%' and process_comments not like '%¹%' order by speed_power_ratio;

which gives this output (abbreviated)

+---------------------+-------------------+---------------------------------------------+
| model | speed_power_ratio | process_comments |
+---------------------+-------------------+---------------------------------------------+
| 550 Black Edition | 38.75 | 45nm Callisto Q3-2009 |
...
| 9950 | 18.57 | 65nm B3 |
+---------------------+-------------------+---------------------------------------------+
34 rows in set (0.00 sec)

By doing such a check that you get what expected, you can avoid
messing around dropping and recreating your target table.
My target table will be named amd_bang_per_watt_sumary and here is the create table as select (CTAS):

mysql> create table amd_bang_per_watt_summary as select model,speed_power_ratio,process_comments from \
-> amd_bang_per_watt where model not like '%¹%' and process_comments not like '%¹%' order by \
-> speed_power_ratio desc;
Query OK, 34 rows affected (0.01 sec)
Records: 34 Duplicates: 0 Warnings: 0

In order to form the sql statement above I needed to do a bit of UTF8 looking up so that the 'superscript one' could be included correctly in the like bit of the statement.

How I pasted 'superscript one' onto the command line is to use charmap application and navigate to 'Latin-1 supplement' where I can see 'superscript one' (0xB9) amongst the superscript numerals.

mysql> select count(*),substr(process_comments,1,4) as nm from amd_bang_per_watt_summary group by nm;
+----------+------+
| count(*) | nm |
+----------+------+
| 15 | 45nm |
| 19 | 65nm |
+----------+------+
2 rows in set (0.00 sec)

mysql> select * from amd_bang_per_watt_summary where speed_power_ratio > 27;
+--------------------+-------------------+---------------------------------------------+
| model | speed_power_ratio | process_comments |
+--------------------+-------------------+---------------------------------------------+
| 550 Black Edition | 38.75 | 45nm Callisto Q3-2009 |
| 705e | 38.46 | 45nm Heka Q3-2009 |
| 905e | 38.46 | 45nm Deneb Q3-2009 |
| 545 | 37.50 | 45nm Callisto Q3-2009 |
| 900e | 36.92 | 45nm Deneb 2009 |
| 945 | 31.58 | 45nm Deneb (part no HDX945WFK4DGI) Q3-2009 |
| 720 Black Edition | 29.47 | 45nm Heka Q1-2009 |
| 9100e | 27.69 | 65nm B2 |
| 9150e | 27.69 | 65nm B3 |
| 910 | 27.37 | 45nm Deneb Q1-2009 |
| 710 | 27.37 | 45nm Heka Q1-2009 |
| 810 | 27.37 | 45nm Deneb Q1-2009 |
+--------------------+-------------------+---------------------------------------------+
12 rows in set (0.00 sec)

The warning count reported by the CTAS is the only thing I might worry about but being zero is good and the quick queries above demonstrated to me that all was well.

The mysql output above is fairly readable but a truer representation of what appeared on my display is in this publicly accessible googledoc.

The superscript one like '%%' bit of the sql might need a bit of explaining - The original wikipedia data
included some processors for which there was no official release date. My target table amd_bang_per_watt_summary will not contain those entries.

Hopefully this posting will save you from having to trawl the documentation for CTAS and saving you the experience of:
  • mysql - the page is so long and complicated that the [AS] bit is easy to miss
  • postgres - whilst CTAS is given a page of its own, there is no link from regular 'create table' so I personally thought it difficult to find.
  • both - The acronym CTAS whilst unofficial might be included as a tag purely for user convenience.

No comments: