Showing posts sorted by relevance for query sample data amd_bang_per_watt spreadsheet. Sort by date Show all posts
Showing posts sorted by relevance for query sample data amd_bang_per_watt spreadsheet. Sort by date Show all posts

Sunday, July 19, 2009

Sample data for import and xml export - mysql and postgres (part 1 of 3)

Export to xml is a feature of databases that I have never used until now.

I thought that a short guide might be useful to illustrate the results.

As a further task I will later look at --compatible option of mysqldump to see how postgres likes the results (a follow up post perhaps so see parts 2 and 3)

( If you came here just looking to getting data out of oocalc then into mysql, please shuffle along to paragraphs 11 to 13 where this is demonstrated )

There are projects out there to take mysql data into postgres, and vice versa and it will be a useful skill to have.

After reading up about freely available sample data for names and addresses and seeing if I might use any openoffice sample data, I decided on a slightly different tack.

It seems that most 'free' sample data has some sort of copyright restriction but there is plenty that is licensed under creative commons attribution or similar.

Reading up about AMD phenom processors earlier today I viewed the wikipedia table regarding socket am3 power usage and thought that this table of data when pasted into a spreadsheet is as good as any.

(For any reader who cannot get to spreadsheets.google.com or experiences an unexpected 'login to google' prompt, I have mirrored the spreadsheet in open document .ods format here)

It is right at this point to highlight again that any and all power usage data referenced in this blog posting directly/as xml/as a spreadsheet is not my data and I correctly credit Wikipedia explicitly as the source.

In the spreadsheet viewable at google docs I have added two comments to this effect.

Now to do something useful with it - firstly export it to .csv format by following instructions for openoffice calc csv export.

Now looking at the options for mysqlimport:
--fields-terminated-by Is how you will tell mysql that data is comma separated
--fields-optionally-enclosed-by Will be how I tell mysql that double quotes enclose filled fields.

Now openoffice calc (oocalc) export to/save as CSV does not wrap empty fields in double quotes, and so, a six field row with just first and last fields populated might just look like this "rhubarb",,,,,"custard"

Mysql is not going to like the series of commas without field delimiters and so in particular ,,,, will be a problem unless the word optionally in --fields-optionally-enclosed-by will come to my rescue.

This still leaves the question of escaping single quotes (I think mysql wants them all escaped) and the sed command might come to my aid there.

Part 2 will deal with getting the data into mysql using mysqlimport.
I will finish this posting now with a first attempt at a mysql table create that is suitable for the data being loaded.
DROP TABLE IF EXISTS amd_bang_per_watt;
SET character_set_client = utf8;
CREATE TABLE amd_bang_per_watt (model_family  VARCHAR(10) NOT NULL,
model   VARCHAR(30) NOT NULL default 'unknown',
clock_speed  VARCHAR(20) NOT NULL,
l2cache   VARCHAR(10),
l3cache   VARCHAR(10),
ht_bus_ghz  VARCHAR(10),
voltage   VARCHAR(10),
socket   VARCHAR(10),
tdp_watts  VARCHAR(20),
process_comments VARCHAR(20),
speed_power_ratio DECIMAL(5,2) COMMENT 'bang per watt'
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MIN_ROWS=50 CHECKSUM=1
COMMENT='Source: Wikipedia 200907http://en.wikipedia.org/wiki/List_of_CPU_power_dissipation#AMD_Phenom_.28Phenom_64.29';
So create a new mysql database and run that table create should enable you to verify the table using describe as follows:
root@d183z2g:~# mysql amd_power_dissipation -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 61
Server version: 5.0.75-0ubuntu10.2 (Ubuntu)
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> show tables;
+---------------------------------+
| Tables_in_amd_power_dissipation |
+---------------------------------+
| amd_bang_per_watt               |
+---------------------------------+
1 row in set (0.00 sec)
mysql> describe amd_bang_per_watt;
+-------------------+--------------+------+-----+---------+-------+
| Field             | Type         | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| model_family      | varchar(10)  | NO   |     | NULL    |       |
| model             | varchar(30)  | NO   |     | unknown |       |
| clock_speed       | varchar(20)  | NO   |     | NULL    |       |
| l2cache           | varchar(10)  | YES  |     | NULL    |       |
| l3cache           | varchar(10)  | YES  |     | NULL    |       |
| ht_bus_ghz        | varchar(10)  | YES  |     | NULL    |       |
| voltage           | varchar(10)  | YES  |     | NULL    |       |
| socket            | varchar(10)  | YES  |     | NULL    |       |
| tdp_watts         | varchar(20)  | YES  |     | NULL    |       |
| process_comments  | varchar(20)  | YES  |     | NULL    |       |
| speed_power_ratio | decimal(5,2) | YES  |     | NULL    |       |
+-------------------+--------------+------+-----+---------+-------+
11 rows in set (0.00 sec)
mysql>
I have deliberately defined the final column as decimal(5,2) as I am thinking ahead to the import to postgres (much later) and want to see how postgres handles that column.

In part 2 of this posting I will attempt to bring in the power data from the .csv file and refine the mysql table create if necessary.

Friday, July 1, 2011

sample data in postgres

This article takes some sample data and imports it into Postgresql (i) using CSV import and (ii) using INSERTs from an export of MySQL

Now to get the data into postgres we could go back to the spreadsheet (csv loading approach) or maybe take a compatible export from mysql and load that (compatible loading approach)

The data we will be using can be found here and I am about to start working with this file.

( If you are coming from a MySQL background and want to just load the sample 'employees' database into Postgresql, then project at  this link will help with that )

CSV loading approach:

In postgres use COPY FROM to get data from the filesystem into the database.

But wait; surely you need to already created a table so as to have a table to load into?
Yep, otherwise you would see postgres complain about a missing relation like so...

postgres=# COPY amd_bang_per_watt FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv' WITH CSV QUOTE AS E'\042';
ERROR: relation "amd_bang_per_watt" does not exist


The table does not exist so you need to 'create table'.

I cover this in more detail in the next section 'Compatible loading approach' so either consult there or have a go yourself first perhaps.

I now assume you have the table amd_bang_per_watt created.

My table needed emptying first but you can likely ignore the next command:

: #root@156ns1(~) ;echo 'delete from amd_bang_per_watt' | psql amd_power_dissipation postgres
Password for user postgres:
DELETE 50

...and pick things up again here where I will try the COPY FROM:

amd_power_dissipation=# COPY amd_bang_per_watt FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv' WITH CSV QUOTE AS E'\042';
ERROR: value too long for type character varying(10)
CONTEXT: COPY amd_bang_per_watt, line 1, column model_family: "Model Family"
amd_power_dissipation=#

and taking care of the header we execute

amd_power_dissipation=# COPY amd_bang_per_watt FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv' WITH CSV HEADER QUOTE AS E'\042';
ERROR: null value in column "speed_power_ratio" violates not-null constraint
CONTEXT: COPY amd_bang_per_watt, line 4: ""X2 ","6??? x¹ ",2.3,,,"3.4 HT3 ",,,45,"x¹ 2009 ","
amd_power_dissipation=# select count(*) from amd_bang_per_watt;
count
-------
0
(1 row)

which is still not completing as we would wish so attempt 3:

amd_power_dissipation=# COPY amd_bang_per_watt FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv' WITH CSV HEADER QUOTE AS E'\042' FORCE NOT NULL speed_power_ratio;
ERROR: invalid input syntax for type numeric: ""
CONTEXT: COPY amd_bang_per_watt, line 4, column speed_power_ratio: ""

...attempt 4...

amd_power_dissipation=# ;COPY amd_bang_per_watt (1,2,3,4,5,6,7,8,9,10) FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv' WITH CSV HEADER QUOTE AS E'\042';
ERROR: syntax error at or near "1"
LINE 1: COPY amd_bang_per_watt (1,2,3,4,5,6,7,8,9,10) FROM '/tmp/amd...

...attempt 5...

amd_power_dissipation=# COPY amd_bang_per_watt (model_family,model,clock_speed,l2cache,l3cache,ht_bus_ghz,voltage,socket,tdp_watts,process_comments) FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv' WITH CSV HEADER QUOTE AS E'\042';
ERROR: extra data after last expected column
CONTEXT: COPY amd_bang_per_watt, line 2: ""X2 II ","550 Black Edition ",3.1,"2x512k ","6MB ",2,"1.15-1.425 ","AM3 ",80,"45nm Callisto Q3-2009 ..."

...attempt 6...

amd_power_dissipation=# COPY amd_bang_per_watt (model_family,model,clock_speed,l2cache,l3cache,ht_bus_ghz,voltage,socket,tdp_watts,process_comments,process_comments) FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv' WITH CSV HEADER QUOTE AS E'\042';
ERROR: column "process_comments" specified more than once

...and still no success so back to the drawing board (or rather sed in fact):

sed 's/\,$/\,0/' <> amdAM3clockSpeedsAndWattage__200907.csv-truncated
: #root@156ns1(tmp) ;sed 's/\,$/\,0/' <> amdAM3clockSpeedsAndWattage__200907.csv-edited
: #root@156ns1(tmp) ;sed 's/\,\,\,/\,0\,\,/' <> amdAM3clockSpeedsAndWattage__200907.csv-edited2

Here is some dialogue to explain how I came to have three sed commands of which the first is now redundant:

amd_power_dissipation=# COPY amd_bang_per_watt FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv-truncated' WITH CSV HEADER QUOTE AS E'\042';
ERROR: missing data for column "speed_power_ratio"
CONTEXT: COPY amd_bang_per_watt, line 4: ""X2 ","6??? x¹ ",2.3,,,"3.4 HT3 ",,,45,"x¹ 2009 ""
amd_power_dissipation=# COPY amd_bang_per_watt FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv-edited' WITH CSV HEADER QUOTE AS E'\042';
ERROR: null value in column "clock_speed" violates not-null constraint
CONTEXT: COPY amd_bang_per_watt, line 10: ""X2 ","6??? x¹ ",,,"2MB ","DDR2 ",,,,"x¹ 65nm Kuma Q2/2008 ",0"
amd_power_dissipation=# COPY amd_bang_per_watt FROM '/tmp/amdAM3clockSpeedsAndWattage__200907.csv-edited2' WITH CSV HEADER QUOTE AS E'\042';
COPY 50

and now a few selects to check things look okay:

amd_power_dissipation=# select count(*) from amd_bang_per_watt;
count
-------
50
(1 row)

amd_power_dissipation=# select count(*),substr(process_comments,1,4) as nm from amd_bang_per_watt group by nm;
count | nm
-------+------
2 | x¹
3 | x¹ Q
1 | x¹ 4
6 | x¹ 2
19 | 65nm
15 | 45nm
4 | x¹ 6
(7 rows)

All looks well here.

The CSV files (original, and edited using sed) are listed below:
If recommend saving the above files to your filesystem and viewing them from there, however if you do instead open them directly in your browser and you see x¹ rather than x¹ then try using Konqueror instead.

Konqueror is a clickable install on most Linux desktop distributions and is available to Windows users via KDE on Windows project.


Compatible loading approach:

Here is an extract of some ruby code which I looked up, so as I have an idea of what mysql types map to what postgres types:

def convert_type(type)
case type
when "tinyint(1)"
"boolean"
when /tinyint/
"tinyint"
when /int/
"integer"
when /varchar/
"varchar"
when /decimal/
"decimal"
else
type
end
end
You might find a more accessible read in the form of the Mark Slade migration guide for mysql to postgres. That guide includes some discussion about datatype mappings.

What I wanted to check was that decimal in mysql would mean decimal in postgres, and, yes it seems that it does.

One recommendation for using decimal fields in any database is to always specify both parameters rather than relying on database defaults.
That way if you do end up exporting/importing as part of some migration, then you will not be tripped up by differing defaults for precision.

Now in part 3 of the 'sample data and mysql' postings I used --compatible=postgresql flag to produce these files:
Having following the mysqldump documentation link through to reading about server modes it seems that a reasonable expectation is as follows:
  • Mysql is going to do what it can to help you when you ask for --compatible=somerdbms
  • Mysql will try and avoid giving you sql output it knows for sure will cause somerdbms a problem.
  • Mysql does is not promising 100% compatibility but just goes some way to making your task less onerous.
In particular you should not expect --compatible=postgresql to spit out sql which postgres can feed on right away - there is still some work to do.

To give a quick illustration I took the .mysqldump4postgresSansOpt file and removed the row inserts and tried to get psql to execute it:

cat /tmp/amd_bang_per_watt.mysqldump4postgresCreateOnly | psql amd_power_dissipation postgres
Password for user postgres:
ERROR: syntax error at or near "@"
LINE 1: SET @saved_cs_client = @@character_set_client;
^
ERROR: unrecognized configuration parameter "character_set_client"
ERROR: syntax error at or near "COMMENT"
LINE 12: ...power_ratio" decimal(5,2) NOT NULL default '0.00' COMMENT 'b...
^
ERROR: syntax error at or near "@"
LINE 1: SET character_set_client = @saved_cs_client;

As you can see postgres is not happy.

So I clean it up a bit and retry:

: #root@156ns1(~) ;cat /tmp/amd_bang_per_watt.mysqldump4postgresCreateOnlyCleanedUp | psql amd_power_dissipation postgres
Password for user postgres:
CREATE TABLE

...better. Now what I had to do was get rid of the column comment for speed_power_ratio and get rid of the set type stuff mysql had placed before and after the create table block.

Here are the files as I worked on them:
So we have the table - lets get on run the inserts as follows (abbreviated):

: #root@156ns1(~) ;cat /tmp/amd_bang_per_watt.mysqldump4postgresInsertsOnly | psql amd_power_dissipation postgres
Password for user postgres:
INSERT 0 1
...
INSERT 0 1

The 'INSERT 0 1' feedback above is what we should expect given our setup here.
Further reading of the postgres documentation for INSERT and scanning down for oid should make things clear.

A few quick selects to see if things look okay:

postgres@ns1:~$ psql
Password:
Welcome to psql 8.3.7, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

postgres=# \l
List of databases
Name | Owner | Encoding
-----------------------+----------+----------
amd_power_dissipation | postgres | UTF8
postgres | postgres | UTF8
template0 | postgres | UTF8
template1 | postgres | UTF8
(4 rows)

postgres=# \dt
No relations found.
postgres=# \c amd_power_dissipation
You are now connected to database "amd_power_dissipation".
amd_power_dissipation=# select count(*) from amd_bang_per_watt;
count
-------
50
(1 row)

amd_power_dissipation=# select count(*),substr(process_comments,1,4) as nm from amd_bang_per_watt group by nm;
count | nm
-------+------
4 | x¹ 6
19 | 65nm
2 | x¹
3 | x¹ Q
1 | x¹ 4
6 | x¹ 2
15 | 45nm
(7 rows)

amd_power_dissipation=# select model,clock_speed,l3cache,tdp_watts,speed_power_ratio from amd_bang_per_watt where speed_power_ratio > 27;
model | clock_speed | l3cache | tdp_watts | speed_power_ratio
--------------------+-------------+---------+-----------+-------------------
550 Black Edition | 3.1 | 6MB | 80 | 38.75
545 | 3 | 6MB | 80 | 37.50
720 Black Edition | 2.8 | 6MB | 95 | 29.47
710 | 2.6 | 6MB | 95 | 27.37
705e | 2.5 | 6MB | 65 | 38.46
810 | 2.6 | 4MB | 95 | 27.37
900e | 2.4 | 6MB | 65 | 36.92
905e | 2.5 | 6MB | 65 | 38.46
910 | 2.6 | 6MB | 95 | 27.37
945 | 3 | 6MB | 95 | 31.58
9100e | 1.8 | 2MB | 65 | 27.69
9150e | 1.8 | 2MB | 65 | 27.69
(12 rows)

Sidenote: Having tried the substr() funtion, it seems that the sql statement shown below works unaltered in both mysql and postgres:

select count(*),substr(process_comments,1,4) as nm from amd_bang_per_watt group by nm;


Conclusion:

The --compatible option of mysql goes some way to getting your data into postgres and this article hopefully gives a flavour of what manual steps you might have to take.

Having worked through both options for getting the data in (csv and --compatible), I have to say that I am much in favour of using --compatible=postgresql and going that way.