Showing posts sorted by relevance for query mysql optionally. Sort by date Show all posts
Showing posts sorted by relevance for query mysql optionally. Sort by date Show all posts

Sunday, July 19, 2009

Sample data for import and xml export - mysql and postgres (part 1 of 3)

Export to xml is a feature of databases that I have never used until now.

I thought that a short guide might be useful to illustrate the results.

As a further task I will later look at --compatible option of mysqldump to see how postgres likes the results (a follow up post perhaps so see parts 2 and 3)

( If you came here just looking to getting data out of oocalc then into mysql, please shuffle along to paragraphs 11 to 13 where this is demonstrated )

There are projects out there to take mysql data into postgres, and vice versa and it will be a useful skill to have.

After reading up about freely available sample data for names and addresses and seeing if I might use any openoffice sample data, I decided on a slightly different tack.

It seems that most 'free' sample data has some sort of copyright restriction but there is plenty that is licensed under creative commons attribution or similar.

Reading up about AMD phenom processors earlier today I viewed the wikipedia table regarding socket am3 power usage and thought that this table of data when pasted into a spreadsheet is as good as any.

(For any reader who cannot get to spreadsheets.google.com or experiences an unexpected 'login to google' prompt, I have mirrored the spreadsheet in open document .ods format here)

It is right at this point to highlight again that any and all power usage data referenced in this blog posting directly/as xml/as a spreadsheet is not my data and I correctly credit Wikipedia explicitly as the source.

In the spreadsheet viewable at google docs I have added two comments to this effect.

Now to do something useful with it - firstly export it to .csv format by following instructions for openoffice calc csv export.

Now looking at the options for mysqlimport:
--fields-terminated-by Is how you will tell mysql that data is comma separated
--fields-optionally-enclosed-by Will be how I tell mysql that double quotes enclose filled fields.

Now openoffice calc (oocalc) export to/save as CSV does not wrap empty fields in double quotes, and so, a six field row with just first and last fields populated might just look like this "rhubarb",,,,,"custard"

Mysql is not going to like the series of commas without field delimiters and so in particular ,,,, will be a problem unless the word optionally in --fields-optionally-enclosed-by will come to my rescue.

This still leaves the question of escaping single quotes (I think mysql wants them all escaped) and the sed command might come to my aid there.

Part 2 will deal with getting the data into mysql using mysqlimport.
I will finish this posting now with a first attempt at a mysql table create that is suitable for the data being loaded.
DROP TABLE IF EXISTS amd_bang_per_watt;
SET character_set_client = utf8;
CREATE TABLE amd_bang_per_watt (model_family  VARCHAR(10) NOT NULL,
model   VARCHAR(30) NOT NULL default 'unknown',
clock_speed  VARCHAR(20) NOT NULL,
l2cache   VARCHAR(10),
l3cache   VARCHAR(10),
ht_bus_ghz  VARCHAR(10),
voltage   VARCHAR(10),
socket   VARCHAR(10),
tdp_watts  VARCHAR(20),
process_comments VARCHAR(20),
speed_power_ratio DECIMAL(5,2) COMMENT 'bang per watt'
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MIN_ROWS=50 CHECKSUM=1
COMMENT='Source: Wikipedia 200907http://en.wikipedia.org/wiki/List_of_CPU_power_dissipation#AMD_Phenom_.28Phenom_64.29';
So create a new mysql database and run that table create should enable you to verify the table using describe as follows:
root@d183z2g:~# mysql amd_power_dissipation -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 61
Server version: 5.0.75-0ubuntu10.2 (Ubuntu)
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> show tables;
+---------------------------------+
| Tables_in_amd_power_dissipation |
+---------------------------------+
| amd_bang_per_watt               |
+---------------------------------+
1 row in set (0.00 sec)
mysql> describe amd_bang_per_watt;
+-------------------+--------------+------+-----+---------+-------+
| Field             | Type         | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| model_family      | varchar(10)  | NO   |     | NULL    |       |
| model             | varchar(30)  | NO   |     | unknown |       |
| clock_speed       | varchar(20)  | NO   |     | NULL    |       |
| l2cache           | varchar(10)  | YES  |     | NULL    |       |
| l3cache           | varchar(10)  | YES  |     | NULL    |       |
| ht_bus_ghz        | varchar(10)  | YES  |     | NULL    |       |
| voltage           | varchar(10)  | YES  |     | NULL    |       |
| socket            | varchar(10)  | YES  |     | NULL    |       |
| tdp_watts         | varchar(20)  | YES  |     | NULL    |       |
| process_comments  | varchar(20)  | YES  |     | NULL    |       |
| speed_power_ratio | decimal(5,2) | YES  |     | NULL    |       |
+-------------------+--------------+------+-----+---------+-------+
11 rows in set (0.00 sec)
mysql>
I have deliberately defined the final column as decimal(5,2) as I am thinking ahead to the import to postgres (much later) and want to see how postgres handles that column.

In part 2 of this posting I will attempt to bring in the power data from the .csv file and refine the mysql table create if necessary.

Monday, July 20, 2009

Sample data for import and xml export - mysql and postgres (part 2 of 3)

Designing tables can be a hit and miss affair but looking at the warnings mysql has thrown up it does not look too far off:
+---------+------+----------------------------------------------------------------------+
| Level   | Code | Message                                                              |
+---------+------+----------------------------------------------------------------------+
| Note    | 1265 | Data truncated for column 'voltage' at row 1                         |
| Warning | 1265 | Data truncated for column 'process_comments' at row 1                |
| Note    | 1265 | Data truncated for column 'voltage' at row 2                         |
| Warning | 1265 | Data truncated for column 'process_comments' at row 2                |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 3  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 4  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 5  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 6  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 7  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 8  |
| Warning | 1265 | Data truncated for column 'process_comments' at row 9                |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 9  |
| Note    | 1265 | Data truncated for column 'voltage' at row 10                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 11                        |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 13 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 14 |
| Warning | 1265 | Data truncated for column 'process_comments' at row 16               |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 16 |
| Warning | 1265 | Data truncated for column 'process_comments' at row 18               |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 18 |
| Note    | 1265 | Data truncated for column 'voltage' at row 23                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 24                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 25                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 26                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 27                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 28                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 29                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 30                        |
| Warning | 1265 | Data truncated for column 'process_comments' at row 30               |
| Note    | 1265 | Data truncated for column 'voltage' at row 31                        |
| Warning | 1265 | Data truncated for column 'process_comments' at row 31               |
| Note    | 1265 | Data truncated for column 'voltage' at row 32                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 33                        |
| Warning | 1265 | Data truncated for column 'process_comments' at row 33               |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 33 |
| Warning | 1265 | Data truncated for column 'process_comments' at row 36               |
| Warning | 1265 | Data truncated for column 'process_comments' at row 37               |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 37 |
| Note    | 1265 | Data truncated for column 'l2cache' at row 47                        |
| Warning | 1265 | Data truncated for column 'voltage' at row 47                        |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 48 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 49 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 50 |
+---------+------+----------------------------------------------------------------------+

Although the above looks a lot, it is really caused by just one or two issues which I will now address. I will provide more details on 'show warnings;' command that produced the above a bit later in this post.

Design alterations:
  • voltage to VARCHAR(15)
  • speed_power_ratio to get a default of '0.00' and might aswell make it not null in that case
  • l2cache and l3cache to VARCHAR(15)
  • process_comments to VARCHAR(50)
DROP TABLE IF EXISTS amd_bang_per_watt;
SET character_set_client = utf8;
CREATE TABLE amd_bang_per_watt (
model_family  VARCHAR(10) NOT NULL,
model   VARCHAR(30) NOT NULL default 'unknown',
clock_speed  VARCHAR(20) NOT NULL,
l2cache   VARCHAR(15),
l3cache   VARCHAR(15),
ht_bus_ghz  VARCHAR(10),
voltage   VARCHAR(15),
socket   VARCHAR(10),
tdp_watts  VARCHAR(20),
process_comments VARCHAR(50),
speed_power_ratio DECIMAL(5,2) NOT NULL default '0.00' COMMENT 'bang per watt'
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MIN_ROWS=50 CHECKSUM=1 COMMENT='Source: Wikipedia 200907
http://en.wikipedia.org/wiki/List_of_CPU_power_dissipation#AMD_Phenom_.28Phenom_64.29';

And here I will show the way of importing the data so as to be able to issue show warnings to get some feedback on how the import went okay.
root@d183z2g:~# mysql amd_power_dissipation -u root -p
Enter password:                                   
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 66                       
Server version: 5.0.75-0ubuntu10.2-log (Ubuntu)      

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> LOAD DATA LOCAL INFILE '/tmp/amd_bang_per_watt.csv' INTO TABLE amd_bang_per_watt
FIELDS  TERMINATED BY ','  OPTIONALLY ENCLOSED BY '"'
LINES TERMINATEDBY '\n' IGNORE 1 LINES;
Query OK, 50 rows affected, 16 warnings (0.00 sec)
Records: 50  Deleted: 0  Skipped: 0  Warnings: 0

mysql> show warnings;
+---------+------+----------------------------------------------------------------------+
| Level   | Code | Message                                                              |
+---------+------+----------------------------------------------------------------------+
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 3  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 4  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 5  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 6  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 7  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 8  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 9  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 13 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 14 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 16 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 18 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 33 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 37 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 48 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 49 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 50 |
+---------+------+----------------------------------------------------------------------+
16 rows in set (0.00 sec)

mysql> select model,speed_power_ratio from amd_bang_per_watt limit 5;
+--------------------+-------------------+
| model              | speed_power_ratio |
+--------------------+-------------------+
| 550 Black Edition  |             38.75 |
| 545                |             37.50 |
| 6??? x¹           |              0.00 |
| 6??? x¹           |              0.00 |
| 6??? x¹           |              0.00 |
+--------------------+-------------------+
5 rows in set (0.00 sec)
Note: The LOAD DATA LOCAL INFILE command in the session above should be all on one line but I split it here to fit the blog posting.

In Summary: The only warnings relate to missing values for speed_power_ratio in the input .csv file and those will be defaulted to 0.00 by means of the table definition.

Now the point of all this importing was to demonstrate xml output from mysql.

Mysql makes it real simple as the following example illustrates (the xml output is in the next post labelled part 2a of 3):
useri@d183z2g:~$ mysql --xml amd_power_dissipation -u root -p -e "select * from amd_bang_per_watt where process_comments like '%allisto%';"
Enter password: