GnuByExample: August 2009

Tuesday, August 4, 2009

postgres - xml out (part 3 of 3) table_to_xml

Earlier in the 'postgres xml out' articles we used query_to_xml() to get data out of postgres in xml format.

Postgres also has the function table_to_xml() which is perfect for exporting the whole table in xml format.

Here is the function signature from the postgres documentation page:

table_to_xml(tbl regclass, nulls boolean, tableforest boolean, targetns text)

The first parameter will be 'amd_bang_per_watt' for all the examples in this article.
The fourth parameter I will not be making use of and so '' will be supplied.

The parameters that I will discuss (and vary for illustration) are the boolean second and third parameters, nulls and tableforest respectively.

Four examples of this output are listed below marked with the values I used for the booleans null and tableforest:

amd_bang_per_watt-table_to_xmlFFheaderless.xml (tbl,False,False,'')
amd_bang_per_watt-table_to_xmlFTheaderless.xml (tbl,False,True,'')
amd_bang_per_watt-table_to_xmlTFheaderless.xml (tbl,True,False,'') ***
amd_bang_per_watt-table_to_xmlTTheaderless.xml (tbl,True,True,'')

Option 3 has been marked with *** to indicate this is my preferred output form and I will be using the xml this form produces in the next paragraphs and in some later articles.

Using True for the parameter

nulls boolean

...gets you...

<voltage xsi:nil="true"/>

rather than omission (the voltage node being omitted altogether).

Using False for the parameter

tableforest boolean

means that each row will be wrapped in a

<row> and </row>

pairing which is nice and convenient.

Executing suitable queries in psql and chopping off the head and tail will produce files similar to the four files listed above. An example of *** (Option 3) is shown here:

psql amd_power_dissipation -c "select table_to_xml('amd_bang_per_watt',true,false,'');" | \

tail -n +3 | head -n -3 > amd_bang_per_watt-table_to_xmlTF.xml;

...and the commands in full are in this text file.

Extracting the 1st row, 47th row, and 50th (final) row of sample data from the file amd_bang_per_watt-table_to_xmlTFheaderless.xml, I create row1headerless.xml containing:

<row>
<model_family>X2 II </model_family>
<model>550 Black Edition </model>
<clock_speed>3.1</clock_speed>
<l2cache>2x512k </l2cache>
<l3cache>6MB </l3cache>
<ht_bus_ghz>2</ht_bus_ghz>
<voltage>1.15-1.425 </voltage>
<socket>AM3 </socket>
<tdp_watts>80</tdp_watts>
<process_comments>45nm Callisto Q3-2009 </process_comments>
<speed_power_ratio>38.75</speed_power_ratio>
</row>

and row47headerless.xml containing:

<row>
<model_family>X4 II </model_family>
<model>965 Black Edition </model>
<clock_speed>3.4</clock_speed>
<l2cache>4x512K </l2cache>
<l3cache>6MB </l3cache>
<ht_bus_ghz>2</ht_bus_ghz>
<voltage>1.15-1.425 </voltage>
<socket>AM3 </socket>
<tdp_watts>125</tdp_watts>
<process_comments>x¹ 45nm Deneb Q3-2009 </process_comments>
<speed_power_ratio>0.00</speed_power_ratio>
</row>

and row50headerless.xml containing:

<row>
<model_family>FX </model_family>
<model>FX-82 x¹ </model>
<clock_speed>2.6</clock_speed>
<l2cache>4x512KB </l2cache>
<l3cache>2MB </l3cache>
<ht_bus_ghz>0</ht_bus_ghz>
<voltage xsi:nil="true"/>
<socket>AM2+ </socket>
<tdp_watts>125</tdp_watts>
<process_comments>x¹</process_comments>
<speed_power_ratio>0.00</speed_power_ratio>
</row>

You might recall from previous postings regarding this sample data, that the data contains a superscript numeral character 'superscript one' (¹) having utf8 code 00B9 (sometimes appearing in code analysis output as \xc2\xb9).

In particular just glancing at the rows we just output I can see that row47 and row50 both contain superscript one in some of their fields.

The next post will attempt to deal with getting xml data into postgres from a file.

Saturday, August 1, 2009

postgres - xml out (part 2 of 3) - query_to_xml

Having shown previously several ways of getting data out of postgres in xml form, I thought there might be some benefit in looking at xmlforest, xmlattributes, and xmlelement.

The compact form for row data produced by 'quick report' of pgAdmin is a good candidate for a manual reconstruction, so i begin there.

First row of the sample data in compact form:

<row id="r1" number="1" c1="X2 II " c2="550 Black Edition " c3="3.1"
c4="2x512k " c5="6MB " c6="2" c7="1.15-1.425 " c8="AM3 " c9="80"
c10="45nm Callisto Q3-2009 " c11="38.75" />

could be generated by xmlelement and xmlattributes as follows:


drop sequence row_nbr;create temporary sequence row_nbr;
select xmlelement (name row, xmlattributes (
'r' || nextval('row_nbr') as id,
currval('row_nbr') as number,
model_family as c1, model as c2,
clock_speed as c3, l2cache as c4,
l3cache as c5, ht_bus_ghz as c6,
voltage as c7, socket as c8,
tdp_watts as c9, process_comments as c10,
speed_power_ratio as c11
)) from amd_bang_per_watt limit 1;

Now it seems silly to write a query where we know all the column names then replace them with c1,c2,c3. However this is just an exercise in reconstruction, so as to illustrate xmlelement and xmlattributes.

Now I will illustrate how to obtain the same row, but this time, using less of the attributes and treating every field as a node to obtain:

<row>
<model_family>X2 II </model_family>
<model>550 Black Edition </model>
<clock_speed>3.1</clock_speed>
<l2cache>2x512k </l2cache>
<l3cache>6MB </l3cache>
<ht_bus_ghz>2</ht_bus_ghz>
<voltage>1.15-1.425 </voltage>
<socket>AM3 </socket>
<tdp_watts>80</tdp_watts>
<process_comments>45nm Callisto Q3-2009 </process_comments>
<speed_power_ratio>38.75</speed_power_ratio>
</row>

...we simply run this query which uses xmlforest:

select xmlelement(name row, xmlforest(
model_family, model, clock_speed, l2cache, l3cache,
ht_bus_ghz, voltage, socket, tdp_watts,
process_comments,
speed_power_ratio
)) from amd_bang_per_watt limit 1;

with a bit of sed postprocessing:

sed 's/></>\n</g' <> amd_bang_per_watt-xmlforested.txt

or if you want to avoid the postprocessing just use the handy postgres query_to_xml() function show here executing in psql:


amd_power_dissipation=# select query_to_xml('select * from amd_bang_per_watt
limit 1',true,true,'');
              query_to_xml
---------------------------------------------------------------
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<model_family>X2 II </model_family>
<model>550 Black Edition </model>
<clock_speed>3.1</clock_speed>
<l2cache>2x512k </l2cache>
<l3cache>6MB </l3cache>
<ht_bus_ghz>2</ht_bus_ghz>
<voltage>1.15-1.425 </voltage>
<socket>AM3 </socket>
<tdp_watts>80</tdp_watts>
<process_comments>45nm Callisto Q3-2009 </process_comments>
<speed_power_ratio>38.75</speed_power_ratio>
</row>

Mysql --xml option produces row data in a 'node per field' form discussed previously. However in that output, the field name is an attribute, making it a more tricky task than our examples above.

If you are an XSL or XPath expert then you may well leap straight to more elegant solutions than what I have given here.
I have aimed this posting at postgres users who are not xml experts, however comments showing xsl or xpath ways are very welcome as additional insight.

The 'postgres xml out' postings continue with the next post dealing with more ways to get xml out of postgres.

In this post (sed command) I made use of the bash continuation character pair \newline as described at gnu.org