GnuByExample: 2009

Saturday, December 26, 2009

Ubuntu on Dell Laptops - Auto shutdown at 5% battery

If your Dell Laptop came preinstalled with Ubuntu it probably had nice
shutdown setting from Day 1.

If any of the following apply then you may not have the auto shutdown setup ideally:

You installed Ubuntu on your Dell Manually
You have upgraded your version of Ubuntu
The battery you are running with is not the original

(Explanations for the above are given at the end of this posting)

By default, the settings on my preinstalled Dell, were such that at 2 minutes battery life remaining, the thing should shut down.

I now show some settings which I would recommend and then comment at the end regarding the defaults.

Firstly change

use_time_for_policy

so it is not checked...

(the images can be clicked to make them zoom in bigger if on your screen they blur)

...which has the effect of making your system look at percentages instead.

Now change the percentages to the following numbers:

And now your system is set to Auto shutdown at 5% battery.

Some (optional) extra reading now follows, regarding the original settings.

Ubuntu/Gnome has three thresholds which are 'action', 'critical', 'low' and in normal running your system would hit the 'low' threshold, then if still draining you would hit the 'critical' threshold.

Finally when the battery is almost out of juice entirely, the 'action' threshold (the point at which your system will shutdown) is encountered.

Here are the unchanged settings that my system had originally:

...and frankly they are less than ideal for several reasons which I now elaborate upon.

(1) Firstly by using time_action, time_critical, time_low the system is relying on estimations which are at best rough guesswork. By changing use_time_for_policy to be unchecked we are now avoiding this time based stuff.

Time based thresholds work well if you are running an unchanged system and your batteries are well know to the system*

(*Batteries in regular use which have had a couple of full charge and discharge cycles are just fine)

If however your batteries are replacements and/or the system has not got enough historic information, then any time remaining estimates are very much guesswork and using time thresholds may be unreliable.

If you have upgraded your system (and/or installed new versions of power manager) then the battery history information may have been emptied out. Without history the power manager is pretty helpless at the time based stuff.

(2) Having auto shutdown at 2% is in my opinion skirting a bit close to the edge. Alright you might have a battery that really does report it's percentage remaining very well. Even so for a 3 hour battery (pretty typical) 2% remaining is 3.6 minutes which is not a lot really.

As your battery ages it's percentage reporting might become less reliable also. Having a setting of 2 minutes (time_action = 120) is not a setting that I would like either.

Having 5% for the percentage_action and 7% for the percentage_critical gives me 3 to 4 minutes to save my work nicely before the system does an auto shutdown.

I want to be warned at 15% (around half hour to go) so that I can get to somewhere with a socket and plug in.

These settings are just my preferences, and you will have your own ideas of good percentages to use that suit your own working style.

Tuesday, August 4, 2009

postgres - xml out (part 3 of 3) table_to_xml

Earlier in the 'postgres xml out' articles we used query_to_xml() to get data out of postgres in xml format.

Postgres also has the function table_to_xml() which is perfect for exporting the whole table in xml format.

Here is the function signature from the postgres documentation page:

table_to_xml(tbl regclass, nulls boolean, tableforest boolean, targetns text)

The first parameter will be 'amd_bang_per_watt' for all the examples in this article.
The fourth parameter I will not be making use of and so '' will be supplied.

The parameters that I will discuss (and vary for illustration) are the boolean second and third parameters, nulls and tableforest respectively.

Four examples of this output are listed below marked with the values I used for the booleans null and tableforest:

amd_bang_per_watt-table_to_xmlFFheaderless.xml (tbl,False,False,'')
amd_bang_per_watt-table_to_xmlFTheaderless.xml (tbl,False,True,'')
amd_bang_per_watt-table_to_xmlTFheaderless.xml (tbl,True,False,'') ***
amd_bang_per_watt-table_to_xmlTTheaderless.xml (tbl,True,True,'')

Option 3 has been marked with *** to indicate this is my preferred output form and I will be using the xml this form produces in the next paragraphs and in some later articles.

Using True for the parameter

nulls boolean

...gets you...

<voltage xsi:nil="true"/>

rather than omission (the voltage node being omitted altogether).

Using False for the parameter

tableforest boolean

means that each row will be wrapped in a

<row> and </row>

pairing which is nice and convenient.

Executing suitable queries in psql and chopping off the head and tail will produce files similar to the four files listed above. An example of *** (Option 3) is shown here:

psql amd_power_dissipation -c "select table_to_xml('amd_bang_per_watt',true,false,'');" | \

tail -n +3 | head -n -3 > amd_bang_per_watt-table_to_xmlTF.xml;

...and the commands in full are in this text file.

Extracting the 1st row, 47th row, and 50th (final) row of sample data from the file amd_bang_per_watt-table_to_xmlTFheaderless.xml, I create row1headerless.xml containing:

<row>
<model_family>X2 II </model_family>
<model>550 Black Edition </model>
<clock_speed>3.1</clock_speed>
<l2cache>2x512k </l2cache>
<l3cache>6MB </l3cache>
<ht_bus_ghz>2</ht_bus_ghz>
<voltage>1.15-1.425 </voltage>
<socket>AM3 </socket>
<tdp_watts>80</tdp_watts>
<process_comments>45nm Callisto Q3-2009 </process_comments>
<speed_power_ratio>38.75</speed_power_ratio>
</row>

and row47headerless.xml containing:

<row>
<model_family>X4 II </model_family>
<model>965 Black Edition </model>
<clock_speed>3.4</clock_speed>
<l2cache>4x512K </l2cache>
<l3cache>6MB </l3cache>
<ht_bus_ghz>2</ht_bus_ghz>
<voltage>1.15-1.425 </voltage>
<socket>AM3 </socket>
<tdp_watts>125</tdp_watts>
<process_comments>x¹ 45nm Deneb Q3-2009 </process_comments>
<speed_power_ratio>0.00</speed_power_ratio>
</row>

and row50headerless.xml containing:

<row>
<model_family>FX </model_family>
<model>FX-82 x¹ </model>
<clock_speed>2.6</clock_speed>
<l2cache>4x512KB </l2cache>
<l3cache>2MB </l3cache>
<ht_bus_ghz>0</ht_bus_ghz>
<voltage xsi:nil="true"/>
<socket>AM2+ </socket>
<tdp_watts>125</tdp_watts>
<process_comments>x¹</process_comments>
<speed_power_ratio>0.00</speed_power_ratio>
</row>

You might recall from previous postings regarding this sample data, that the data contains a superscript numeral character 'superscript one' (¹) having utf8 code 00B9 (sometimes appearing in code analysis output as \xc2\xb9).

In particular just glancing at the rows we just output I can see that row47 and row50 both contain superscript one in some of their fields.

The next post will attempt to deal with getting xml data into postgres from a file.

Saturday, August 1, 2009

postgres - xml out (part 2 of 3) - query_to_xml

Having shown previously several ways of getting data out of postgres in xml form, I thought there might be some benefit in looking at xmlforest, xmlattributes, and xmlelement.

The compact form for row data produced by 'quick report' of pgAdmin is a good candidate for a manual reconstruction, so i begin there.

First row of the sample data in compact form:

<row id="r1" number="1" c1="X2 II " c2="550 Black Edition " c3="3.1"
c4="2x512k " c5="6MB " c6="2" c7="1.15-1.425 " c8="AM3 " c9="80"
c10="45nm Callisto Q3-2009 " c11="38.75" />

could be generated by xmlelement and xmlattributes as follows:


drop sequence row_nbr;create temporary sequence row_nbr;
select xmlelement (name row, xmlattributes (
'r' || nextval('row_nbr') as id,
currval('row_nbr') as number,
model_family as c1, model as c2,
clock_speed as c3, l2cache as c4,
l3cache as c5, ht_bus_ghz as c6,
voltage as c7, socket as c8,
tdp_watts as c9, process_comments as c10,
speed_power_ratio as c11
)) from amd_bang_per_watt limit 1;

Now it seems silly to write a query where we know all the column names then replace them with c1,c2,c3. However this is just an exercise in reconstruction, so as to illustrate xmlelement and xmlattributes.

Now I will illustrate how to obtain the same row, but this time, using less of the attributes and treating every field as a node to obtain:

<row>
<model_family>X2 II </model_family>
<model>550 Black Edition </model>
<clock_speed>3.1</clock_speed>
<l2cache>2x512k </l2cache>
<l3cache>6MB </l3cache>
<ht_bus_ghz>2</ht_bus_ghz>
<voltage>1.15-1.425 </voltage>
<socket>AM3 </socket>
<tdp_watts>80</tdp_watts>
<process_comments>45nm Callisto Q3-2009 </process_comments>
<speed_power_ratio>38.75</speed_power_ratio>
</row>

...we simply run this query which uses xmlforest:

select xmlelement(name row, xmlforest(
model_family, model, clock_speed, l2cache, l3cache,
ht_bus_ghz, voltage, socket, tdp_watts,
process_comments,
speed_power_ratio
)) from amd_bang_per_watt limit 1;

with a bit of sed postprocessing:

sed 's/></>\n</g' <> amd_bang_per_watt-xmlforested.txt

or if you want to avoid the postprocessing just use the handy postgres query_to_xml() function show here executing in psql:


amd_power_dissipation=# select query_to_xml('select * from amd_bang_per_watt
limit 1',true,true,'');
              query_to_xml
---------------------------------------------------------------
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<model_family>X2 II </model_family>
<model>550 Black Edition </model>
<clock_speed>3.1</clock_speed>
<l2cache>2x512k </l2cache>
<l3cache>6MB </l3cache>
<ht_bus_ghz>2</ht_bus_ghz>
<voltage>1.15-1.425 </voltage>
<socket>AM3 </socket>
<tdp_watts>80</tdp_watts>
<process_comments>45nm Callisto Q3-2009 </process_comments>
<speed_power_ratio>38.75</speed_power_ratio>
</row>

Mysql --xml option produces row data in a 'node per field' form discussed previously. However in that output, the field name is an attribute, making it a more tricky task than our examples above.

If you are an XSL or XPath expert then you may well leap straight to more elegant solutions than what I have given here.
I have aimed this posting at postgres users who are not xml experts, however comments showing xsl or xpath ways are very welcome as additional insight.

The 'postgres xml out' postings continue with the next post dealing with more ways to get xml out of postgres.

In this post (sed command) I made use of the bash continuation character pair \newline as described at gnu.org

Thursday, July 30, 2009

postgres - xml in xml out (part 1 of 2)

I will be using familiar sample data and in fact will begin with the assumption that the data is already in postgres.

Hark back to the 'sample data' postings and use COPY amd_bang_per_watt FROM ... if you are coming to this fresh.

I will deal with 'xml out' first as it is a quick win for an already populated table (my situation)

Previous postings have been quite console - this one will be a bit more gui so as to allow us to explore features of the pgAdmin graphical tool.

There are several ways including:

Using psql with the html flag and then munging the html into xml
pgAdmin's table view, clicking 'context menu', and outputting with xml radio button
pgAdmin 'Tools/Query Tool' then 'File/Quick report' (xml radio button selected)

I will ignore the first option (psql) and go for the pgAdmin methods, illustrating with some screenshots.

(Item 3 gave me the best results so feel free to skip ahead if you like)

My pgAdmin screenshots are wide and will not resize well for newspaper style columns. Instead I have put them on picasaweb with links in the text.

2. Tools/Query Tool then File/Quick report:

Table view within pgAdmin
Context menu from Table view within pgAdmin

Selecting 'Reports' from the context menu above gives you six options:

Properties report (an extract is shown below)
DDL report (ddl for table recreation from a script)
Data Dictionary report (an accurate description of what you get)
Statistics Report (Sequential scans count is expectedly high in here)
Dependencies Report (for our table this is empty in an xml style of emptiness)
Dependents Report (whoops - incorrect spellcheck complaint in firefox)

(i) Properties report gives you xml - extract below:

<table>
<columns>
<column id="c1" number="1" name="Property" />
<column id="c2" number="2" name="Value" />
</columns>
<rows>
<row id="r1" number="1" c1="Name" c2="amd_bang_per_watt" />
<row id="r2" number="2" c1="OID" c2="16386" />
<row id="r3" number="3" c1="Owner" c2="postgres" />
<row id="r4" number="4" c1="Tablespace" c2="pg_default" />
<row id="r5" number="5" c1="ACL" c2="{postgres=arwdxt/postgres,someuser=arwd/postgres}" />
<row id="r6" number="6" c1="Primary key" c2="" />
<row id="r7" number="7" c1="Rows (estimated)" c2="50" />
<row id="r8" number="8" c1="Fill factor" c2="" />
<row id="r9" number="9" c1="Rows (counted)" c2="50" />
<row id="r10" number="10" c1="Inherits tables" c2="No" />
<row id="r11" number="11" c1="Inherited tables count" c2="0" />
<row id="r12" number="12" c1="Has OIDs?" c2="No" />
<row id="r13" number="13" c1="System table?" c2="No" />
<row id="r14" number="14" c1="Comment" c2="" />
</rows>
</table>

Click the links in the original six point list for googledocs of all the xml files in full.

What I did not find in any of the six reports was the actual row data from the table - will come to that soon.

3. Tools/Query Tool then File/Quick report:

My pgAdmin screenshots are wide and will not resize well. Instead I have put them on picasaweb with links listed below:

Tools/Query Tool
Query Tool example query
Query Tool 'File/Quick Report' *** bingo ***

So to get the actual row data my approach was to enter the Query Tool then write a 'select * from ...' type of query and Quick Report the results.

Having selected XML in Quick Report you will obtain the actual row data from the table.

Does this row data look the same as mysql --xml style output?
Not exactly. What happens in pgAdmin xml output is that columns are not treated as individual nodes but instead are attributes of a row node.

Sample first row from the full xml output:

<row id="r1" number="1" c1="X2 II " c2="550 Black Edition " c3="3.1"
c4="2x512k " c5="6MB " c6="2" c7="1.15-1.425 " c8="AM3 " c9="80"
c10="45nm Callisto Q3-2009 " c11="38.75" />

Is this a valid way of representing xml? Certainly.
Will it lead to parsing issues? - No. Any parser worth its salt handles nodes and attributes and exposes them in a useful way to the programmer.

Perhaps you might later use python-libxml2 or similar to construct an insert statement of the form INSERT INTO amd_bang_per_watt VALUES(_,_, from the column entries c1,c2,etc.

This 'more compact' format should save you space and may be doing you a favour if you harbour any thoughts of parsing the rows in jQuery or PHP.

One thing I should point out is that this compact form comes at a price. Should you later insert or remove columns from the table, then the column numbers c1, c2, c3, etc are going to be out of step with your altered table.

The facilities for xml output in your database can be useful, but are not always the quickest route to goal. If you data is going out, then into another postgres database then you would likely want to use the backup option in pgAdmin.

Example of such backup output are shown below:

Familiar 'one insert per line' form that this human finds easy to look at.
Postgres compact form

I found the second backup format a refreshing alternative to some of the issues that can otherwise creep in to comma separated data.

xml in xml out (part 2 of 3) is the next post and will continue this theme.

Thursday, July 23, 2009

ctas for mysql and postgres - create table as select ...

'Create table as select...' is a useful way of defining a new table based on some stuff already in your database.

It might be that you have just started work on an existing system and your fresh eyes can see a table that can be normalised, or perhaps you are told that some of the tables are bloated with obselete columns.

You might use 'create table as select' (CTAS) in the same way you might rework a file on your filesystem by going through copy, edit, move (where the final move replaces the original)
[ So what you did on the way was create a temporary copy and lastly overwriting the original ]

Completing the analogy - in database terms CTAS might create your temporary table and you might look at how to change the schema, so as to ignore the original copy and use your CTAS created table in its place.

Temporary tables for whatever purpose can be created using CTAS or the lesser preferred 'select into'. From postgres documentation: CREATE TABLE AS is the recommended syntax.

The relevant portion of the mysql syntax is

select_statement:[IGNORE | REPLACE] [AS] SELECT ...   (Some legal select statement)

which is not so easy to find amongst the complex syntactical explanation on the main documentation page.

For perl types who have csv data on the filesystem and want to use CTAS look at DBD::CSV module.

And now for some simple examples using mysql and the sample data from the recent postings:

My suggestion is to just do a standalone select first:


select model,speed_power_ratio,process_comments from amd_bang_per_watt where model not like '%¹%' and process_comments not like '%¹%' order by speed_power_ratio;

which gives this output (abbreviated)


+---------------------+-------------------+---------------------------------------------+                                                                 
| model               | speed_power_ratio | process_comments                            |
+---------------------+-------------------+---------------------------------------------+
| 550 Black Edition   |             38.75 | 45nm Callisto Q3-2009                       |
...
| 9950                |             18.57 | 65nm B3                                     |
+---------------------+-------------------+---------------------------------------------+
34 rows in set (0.00 sec)

By doing such a check that you get what expected, you can avoid
messing around dropping and recreating your target table.
My target table will be named amd_bang_per_watt_sumary and here is the create table as select (CTAS):


mysql> create table amd_bang_per_watt_summary as select model,speed_power_ratio,process_comments from \
-> amd_bang_per_watt where model not like '%¹%' and process_comments not like '%¹%' order by \
-> speed_power_ratio desc;                                                                   
Query OK, 34 rows affected (0.01 sec)                                                            
Records: 34  Duplicates: 0  Warnings: 0

In order to form the sql statement above I needed to do a bit of UTF8 looking up so that the 'superscript one' could be included correctly in the like bit of the statement.

How I pasted 'superscript one' onto the command line is to use charmap application and navigate to 'Latin-1 supplement' where I can see 'superscript one' (0xB9) amongst the superscript numerals.

                             mysql> select count(*),substr(process_comments,1,4) as nm from amd_bang_per_watt_summary group by nm;
+----------+------+
| count(*) | nm   |
+----------+------+
|       15 | 45nm |
|       19 | 65nm |
+----------+------+
2 rows in set (0.00 sec)

mysql> select * from amd_bang_per_watt_summary where speed_power_ratio > 27;
+--------------------+-------------------+---------------------------------------------+
| model              | speed_power_ratio | process_comments                            |
+--------------------+-------------------+---------------------------------------------+
| 550 Black Edition  |             38.75 | 45nm Callisto Q3-2009                       |
| 705e               |             38.46 | 45nm Heka Q3-2009                           |
| 905e               |             38.46 | 45nm Deneb Q3-2009                          |
| 545                |             37.50 | 45nm Callisto Q3-2009                       |
| 900e               |             36.92 | 45nm Deneb 2009                             |
| 945                |             31.58 | 45nm Deneb (part no HDX945WFK4DGI) Q3-2009  |
| 720 Black Edition  |             29.47 | 45nm Heka Q1-2009                           |
| 9100e              |             27.69 | 65nm B2                                     |
| 9150e              |             27.69 | 65nm B3                                     |
| 910                |             27.37 | 45nm Deneb Q1-2009                          |
| 710                |             27.37 | 45nm Heka Q1-2009                           |
| 810                |             27.37 | 45nm Deneb Q1-2009                          |
+--------------------+-------------------+---------------------------------------------+
12 rows in set (0.00 sec)

The warning count reported by the CTAS is the only thing I might worry about but being zero is good and the quick queries above demonstrated to me that all was well.

The mysql output above is fairly readable but a truer representation of what appeared on my display is in this publicly accessible googledoc.

The superscript one like '%%' bit of the sql might need a bit of explaining - The original wikipedia data
included some processors for which there was no official release date. My target table amd_bang_per_watt_summary will not contain those entries.

Hopefully this posting will save you from having to trawl the documentation for CTAS and saving you the experience of:

mysql - the page is so long and complicated that the [AS] bit is easy to miss
postgres - whilst CTAS is given a page of its own, there is no link from regular 'create table' so I personally thought it difficult to find.
both - The acronym CTAS whilst unofficial might be included as a tag purely for user convenience.

Tuesday, July 21, 2009

Sample data for import and xml export - mysql and postgres (part 3 of 3)

Following up on previous postings I begin by giving the full xml output which should be pastable to your computer and then imported to any database with an xml import facility.

amd_bang_per_watt__20090720.xml
amd_bang_per_watt__20090720.xml-mit-create

or just a couple of rows as appeared in the previous posting:
amd_bang_per_watt__20090720.xml-select

For two of the main open source databases I use, some graphical tools are listed here:

mysql-admin for mysql with source available direct from mysql.com
pgadmin3 for postgres with source availability mentioned in news.

mysql-admin documentation available in English, Greek, Italian, Portuguese, Russian, Spanish, Ukrainian.

If you like a screenshotted introduction to the gui tools then here is a quick idea of the look and feel:

The version 6 release of mysql (maybe 2010) will include
load xml infile

will be a step forward. Version 6 will follow version 5.4 of mysql which is in testing as I write.

Commercial releases of mysql might show slightly different versions but in keeping with the title of this blog I tend to focus on gpl or free and open source flavours of software.

If you have the data in a variety of formats (usually the case if it is your data) then csv format and the formats mysqlimport has supported for the last 10 years should do fine to get the job done.

All major databases will likely soon offer xml import (if not already) which should help data interchange between retailers and suppliers providing xml standards are strictly adhered to. Newer xml proposed standards have public pages for further research at w3.org

If you are itching for bringing in xml data directly to mysql today then you might want to browse the following:

And now back to the data which I give in mysql own .sql format to aid anyone wanting to just load it directly rather than getting hung up on xml.

So at whatever point you have joined this posting thread you can quickly pick up from here if you are interested in what comes next which is all Postgres.

Before leaving mysql entirely we should export the data using the --compatible flag, so as the sample data and postgres postings can discuss what (if anything) must be done to the sql output from --compatible to make it totally postgres.

Here is what mysql produces from --compatible=postgresql:

amd_bang_per_watt.mysqldump4postgresSansCreate
[ data only (no create statement) ]
amd_bang_per_watt.mysqldump4postgresSansOpts
[ data+create (less compact form) ]
amd_bang_per_watt.mysqldump4postgres
[ data+create (compact form) ]

Mark Slade had this to say regarding what to expect in his article about converting databases.

Despite how it sounds, this won't generate PostgreSQL compatible code, but it will get you halfway there.

Halfway sounds pretty good if you do not want to go down the 'csv out csv in' route. I explore both ways in the postgres postings.

Monday, July 20, 2009

Sample data for import and xml export - mysql and postgres (part 2a of 3) - amd_bang_per_watt-callisto

<?xml version="1.0"?>

<resultset statement="select * from amd_bang_per_watt where process_comments like '%allisto%'" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<field name="model_family">X2 II </field>
<field name="model">550 Black Edition </field>
<field name="clock_speed">3.1</field>
<field name="l2cache">2x512k </field>
<field name="l3cache">6MB </field>
<field name="ht_bus_ghz">2</field>
<field name="voltage">1.15-1.425 </field>
<field name="socket">AM3 </field>
<field name="tdp_watts">80</field>
<field name="process_comments">45nm Callisto Q3-2009 </field>
<field name="speed_power_ratio">38.75</field>
</row>

<row>
<field name="model_family">X2 II </field>
<field name="model">545</field>
<field name="clock_speed">3</field>
<field name="l2cache">2x512k </field>
<field name="l3cache">6MB </field>
<field name="ht_bus_ghz">2</field>
<field name="voltage">1.15-1.425 </field>
<field name="socket">AM3 </field>
<field name="tdp_watts">80</field>
<field name="process_comments">45nm Callisto Q3-2009 </field>
<field name="speed_power_ratio">37.50</field>
</row>
</resultset>

Sample data for import and xml export - mysql and postgres (part 2 of 3)

Designing tables can be a hit and miss affair but looking at the warnings mysql has thrown up it does not look too far off:

+---------+------+----------------------------------------------------------------------+
| Level   | Code | Message                                                              |
+---------+------+----------------------------------------------------------------------+
| Note    | 1265 | Data truncated for column 'voltage' at row 1                         |
| Warning | 1265 | Data truncated for column 'process_comments' at row 1                |
| Note    | 1265 | Data truncated for column 'voltage' at row 2                         |
| Warning | 1265 | Data truncated for column 'process_comments' at row 2                |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 3  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 4  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 5  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 6  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 7  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 8  |
| Warning | 1265 | Data truncated for column 'process_comments' at row 9                |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 9  |
| Note    | 1265 | Data truncated for column 'voltage' at row 10                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 11                        |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 13 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 14 |
| Warning | 1265 | Data truncated for column 'process_comments' at row 16               |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 16 |
| Warning | 1265 | Data truncated for column 'process_comments' at row 18               |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 18 |
| Note    | 1265 | Data truncated for column 'voltage' at row 23                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 24                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 25                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 26                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 27                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 28                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 29                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 30                        |
| Warning | 1265 | Data truncated for column 'process_comments' at row 30               |
| Note    | 1265 | Data truncated for column 'voltage' at row 31                        |
| Warning | 1265 | Data truncated for column 'process_comments' at row 31               |
| Note    | 1265 | Data truncated for column 'voltage' at row 32                        |
| Note    | 1265 | Data truncated for column 'voltage' at row 33                        |
| Warning | 1265 | Data truncated for column 'process_comments' at row 33               |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 33 |
| Warning | 1265 | Data truncated for column 'process_comments' at row 36               |
| Warning | 1265 | Data truncated for column 'process_comments' at row 37               |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 37 |
| Note    | 1265 | Data truncated for column 'l2cache' at row 47                        |
| Warning | 1265 | Data truncated for column 'voltage' at row 47                        |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 48 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 49 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 50 |
+---------+------+----------------------------------------------------------------------+

Although the above looks a lot, it is really caused by just one or two issues which I will now address. I will provide more details on 'show warnings;' command that produced the above a bit later in this post.

Design alterations:

voltage to VARCHAR(15)
speed_power_ratio to get a default of '0.00' and might aswell make it not null in that case
l2cache and l3cache to VARCHAR(15)
process_comments to VARCHAR(50)

DROP TABLE IF EXISTS amd_bang_per_watt;
SET character_set_client = utf8;
CREATE TABLE amd_bang_per_watt (
model_family  VARCHAR(10) NOT NULL,
model   VARCHAR(30) NOT NULL default 'unknown',
clock_speed  VARCHAR(20) NOT NULL,
l2cache   VARCHAR(15),
l3cache   VARCHAR(15),
ht_bus_ghz  VARCHAR(10),
voltage   VARCHAR(15),
socket   VARCHAR(10),
tdp_watts  VARCHAR(20),
process_comments VARCHAR(50),
speed_power_ratio DECIMAL(5,2) NOT NULL default '0.00' COMMENT 'bang per watt'
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MIN_ROWS=50 CHECKSUM=1 COMMENT='Source: Wikipedia 200907
http://en.wikipedia.org/wiki/List_of_CPU_power_dissipation#AMD_Phenom_.28Phenom_64.29';

And here I will show the way of importing the data so as to be able to issue show warnings to get some feedback on how the import went okay.

root@d183z2g:~# mysql amd_power_dissipation -u root -p
Enter password:                                   
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 66                       
Server version: 5.0.75-0ubuntu10.2-log (Ubuntu)      

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> LOAD DATA LOCAL INFILE '/tmp/amd_bang_per_watt.csv' INTO TABLE amd_bang_per_watt
FIELDS  TERMINATED BY ','  OPTIONALLY ENCLOSED BY '"'
LINES TERMINATEDBY '\n' IGNORE 1 LINES;
Query OK, 50 rows affected, 16 warnings (0.00 sec)
Records: 50  Deleted: 0  Skipped: 0  Warnings: 0

mysql> show warnings;
+---------+------+----------------------------------------------------------------------+
| Level   | Code | Message                                                              |
+---------+------+----------------------------------------------------------------------+
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 3  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 4  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 5  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 6  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 7  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 8  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 9  |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 13 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 14 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 16 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 18 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 33 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 37 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 48 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 49 |
| Warning | 1366 | Incorrect decimal value: '' for column 'speed_power_ratio' at row 50 |
+---------+------+----------------------------------------------------------------------+
16 rows in set (0.00 sec)

mysql> select model,speed_power_ratio from amd_bang_per_watt limit 5;
+--------------------+-------------------+
| model              | speed_power_ratio |
+--------------------+-------------------+
| 550 Black Edition  |             38.75 |
| 545                |             37.50 |
| 6??? x¹           |              0.00 |
| 6??? x¹           |              0.00 |
| 6??? x¹           |              0.00 |
+--------------------+-------------------+
5 rows in set (0.00 sec)

Note: The LOAD DATA LOCAL INFILE command in the session above should be all on one line but I split it here to fit the blog posting.

In Summary: The only warnings relate to missing values for speed_power_ratio in the input .csv file and those will be defaulted to 0.00 by means of the table definition.

Now the point of all this importing was to demonstrate xml output from mysql.

Mysql makes it real simple as the following example illustrates (the xml output is in the next post labelled part 2a of 3):

useri@d183z2g:~$ mysql --xml amd_power_dissipation -u root -p -e "select * from amd_bang_per_watt where process_comments like '%allisto%';"
Enter password:

Sunday, July 19, 2009

Sample data for import and xml export - mysql and postgres (part 1 of 3)

Export to xml is a feature of databases that I have never used until now.

I thought that a short guide might be useful to illustrate the results.

As a further task I will later look at --compatible option of mysqldump to see how postgres likes the results (a follow up post perhaps so see parts 2 and 3)

( If you came here just looking to getting data out of oocalc then into mysql, please shuffle along to paragraphs 11 to 13 where this is demonstrated )

There are projects out there to take mysql data into postgres, and vice versa and it will be a useful skill to have.

After reading up about freely available sample data for names and addresses and seeing if I might use any openoffice sample data, I decided on a slightly different tack.

It seems that most 'free' sample data has some sort of copyright restriction but there is plenty that is licensed under creative commons attribution or similar.

Reading up about AMD phenom processors earlier today I viewed the wikipedia table regarding socket am3 power usage and thought that this table of data when pasted into a spreadsheet is as good as any.

(For any reader who cannot get to spreadsheets.google.com or experiences an unexpected 'login to google' prompt, I have mirrored the spreadsheet in open document .ods format here)

It is right at this point to highlight again that any and all power usage data referenced in this blog posting directly/as xml/as a spreadsheet is not my data and I correctly credit Wikipedia explicitly as the source.

In the spreadsheet viewable at google docs I have added two comments to this effect.

Now to do something useful with it - firstly export it to .csv format by following instructions for openoffice calc csv export.

Now looking at the options for mysqlimport:
--fields-terminated-by Is how you will tell mysql that data is comma separated
--fields-optionally-enclosed-by Will be how I tell mysql that double quotes enclose filled fields.

Now openoffice calc (oocalc) export to/save as CSV does not wrap empty fields in double quotes, and so, a six field row with just first and last fields populated might just look like this "rhubarb",,,,,"custard"

Mysql is not going to like the series of commas without field delimiters and so in particular ,,,, will be a problem unless the word optionally in --fields-optionally-enclosed-by will come to my rescue.

This still leaves the question of escaping single quotes (I think mysql wants them all escaped) and the sed command might come to my aid there.

Part 2 will deal with getting the data into mysql using mysqlimport.
I will finish this posting now with a first attempt at a mysql table create that is suitable for the data being loaded.

DROP TABLE IF EXISTS amd_bang_per_watt;
SET character_set_client = utf8;
CREATE TABLE amd_bang_per_watt (model_family  VARCHAR(10) NOT NULL,
model   VARCHAR(30) NOT NULL default 'unknown',
clock_speed  VARCHAR(20) NOT NULL,
l2cache   VARCHAR(10),
l3cache   VARCHAR(10),
ht_bus_ghz  VARCHAR(10),
voltage   VARCHAR(10),
socket   VARCHAR(10),
tdp_watts  VARCHAR(20),
process_comments VARCHAR(20),
speed_power_ratio DECIMAL(5,2) COMMENT 'bang per watt'
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MIN_ROWS=50 CHECKSUM=1
COMMENT='Source: Wikipedia 200907http://en.wikipedia.org/wiki/List_of_CPU_power_dissipation#AMD_Phenom_.28Phenom_64.29';

So create a new mysql database and run that table create should enable you to verify the table using describe as follows:

root@d183z2g:~# mysql amd_power_dissipation -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 61
Server version: 5.0.75-0ubuntu10.2 (Ubuntu)
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> show tables;
+---------------------------------+
| Tables_in_amd_power_dissipation |
+---------------------------------+
| amd_bang_per_watt               |
+---------------------------------+
1 row in set (0.00 sec)
mysql> describe amd_bang_per_watt;
+-------------------+--------------+------+-----+---------+-------+
| Field             | Type         | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| model_family      | varchar(10)  | NO   |     | NULL    |       |
| model             | varchar(30)  | NO   |     | unknown |       |
| clock_speed       | varchar(20)  | NO   |     | NULL    |       |
| l2cache           | varchar(10)  | YES  |     | NULL    |       |
| l3cache           | varchar(10)  | YES  |     | NULL    |       |
| ht_bus_ghz        | varchar(10)  | YES  |     | NULL    |       |
| voltage           | varchar(10)  | YES  |     | NULL    |       |
| socket            | varchar(10)  | YES  |     | NULL    |       |
| tdp_watts         | varchar(20)  | YES  |     | NULL    |       |
| process_comments  | varchar(20)  | YES  |     | NULL    |       |
| speed_power_ratio | decimal(5,2) | YES  |     | NULL    |       |
+-------------------+--------------+------+-----+---------+-------+
11 rows in set (0.00 sec)
mysql>

I have deliberately defined the final column as decimal(5,2) as I am thinking ahead to the import to postgres (much later) and want to see how postgres handles that column.

In part 2 of this posting I will attempt to bring in the power data from the .csv file and refine the mysql table create if necessary.