GnuByExample: backup

Having worked with many businesses on backup procedures, I try to adopt best practice on my own computers.

With this in mind, I have backup copies of entire hard drives which I refresh once or twice a year. This means that in the event of a critical failure, I can quickly grab the backup hard drive and put my main desktop pc back to the state it was in around 6 months ago.
( I back up business critical data using additional procedures than those discussed here )

Motherboards tend to have a raid controller, and raid mirroring procedure accessible in BIOS or startup screen.

With a similar sized disk I just select source and mirror and let the motherboard look after creating a backup (one time raid mirror). This always worked until recently* when I had to revert to using dd.

In defense of the 26 hours backup time, this is running dd via a system rescue usb stick on a 5 year old socket 754 motherboard, with SataI rather than SataII maximum throughput. Even so the 15.7 MB/s is shockingly slow, but for a one time job, then maybe that is not such a problem.

*Having bought two matching Samsung F2 1.5TB drives, I was surprised to find the Via onboard raid controller complaining that the drive sizes did not match. I can only assume that a 2005 raid controller, might not, in all cases, be good for properly recognising the huge terrabyte plus disks that we can buy today.

Using dd for Mirroring entire Partitions or Disks:

As demonstrated above, dd will do a job for you if your raid setup refuses to do a source -> mirror copy. Using modern hardware you should expect this to take a few hours.
[ using ancient hardware you will be there all day (literally) ]

If you prefer you can use dd for partition by partition mirroring, perhaps in my screenshot I might have used if=/dev/sda1 of=/dev/sdb1, so as to just copy the first partition.

Note: Doing things partition by partition, can be a bit more tricky, as you need to give thought to the partition structure on your output disk and if necessary create suitable partitions there prior to running dd.

Those who might prefer output to a file might like to consider of=/mnt/hugedisk/partition1.dd as a suitable parameter.

Documention for dd can be found in any of these places*:

For Debian and Ubuntu in /usr/share/doc/coreutils/
by typing in a terminal the command man dd
Manpage on the internet (gnu.org)

*Please make an effort to read the dd documentation above before making any comments on this article asking about how to use dd.

I will be using familiar sample data and in fact will begin with the assumption that the data is already in postgres.

Hark back to the 'sample data' postings and use COPY amd_bang_per_watt FROM ... if you are coming to this fresh.

I will deal with 'xml out' first as it is a quick win for an already populated table (my situation)

Previous postings have been quite console - this one will be a bit more gui so as to allow us to explore features of the pgAdmin graphical tool.

There are several ways including:

Using psql with the html flag and then munging the html into xml
pgAdmin's table view, clicking 'context menu', and outputting with xml radio button
pgAdmin 'Tools/Query Tool' then 'File/Quick report' (xml radio button selected)

I will ignore the first option (psql) and go for the pgAdmin methods, illustrating with some screenshots.

(Item 3 gave me the best results so feel free to skip ahead if you like)

My pgAdmin screenshots are wide and will not resize well for newspaper style columns. Instead I have put them on picasaweb with links in the text.

2. Tools/Query Tool then File/Quick report:

Table view within pgAdmin
Context menu from Table view within pgAdmin

Selecting 'Reports' from the context menu above gives you six options:

Properties report (an extract is shown below)
DDL report (ddl for table recreation from a script)
Data Dictionary report (an accurate description of what you get)
Statistics Report (Sequential scans count is expectedly high in here)
Dependencies Report (for our table this is empty in an xml style of emptiness)
Dependents Report (whoops - incorrect spellcheck complaint in firefox)

(i) Properties report gives you xml - extract below:

<table>
<columns>
<column id="c1" number="1" name="Property" />
<column id="c2" number="2" name="Value" />
</columns>
<rows>
<row id="r1" number="1" c1="Name" c2="amd_bang_per_watt" />
<row id="r2" number="2" c1="OID" c2="16386" />
<row id="r3" number="3" c1="Owner" c2="postgres" />
<row id="r4" number="4" c1="Tablespace" c2="pg_default" />
<row id="r5" number="5" c1="ACL" c2="{postgres=arwdxt/postgres,someuser=arwd/postgres}" />
<row id="r6" number="6" c1="Primary key" c2="" />
<row id="r7" number="7" c1="Rows (estimated)" c2="50" />
<row id="r8" number="8" c1="Fill factor" c2="" />
<row id="r9" number="9" c1="Rows (counted)" c2="50" />
<row id="r10" number="10" c1="Inherits tables" c2="No" />
<row id="r11" number="11" c1="Inherited tables count" c2="0" />
<row id="r12" number="12" c1="Has OIDs?" c2="No" />
<row id="r13" number="13" c1="System table?" c2="No" />
<row id="r14" number="14" c1="Comment" c2="" />
</rows>
</table>

Click the links in the original six point list for googledocs of all the xml files in full.

What I did not find in any of the six reports was the actual row data from the table - will come to that soon.

3. Tools/Query Tool then File/Quick report:

My pgAdmin screenshots are wide and will not resize well. Instead I have put them on picasaweb with links listed below:

Tools/Query Tool
Query Tool example query
Query Tool 'File/Quick Report' *** bingo ***

So to get the actual row data my approach was to enter the Query Tool then write a 'select * from ...' type of query and Quick Report the results.

Having selected XML in Quick Report you will obtain the actual row data from the table.

Does this row data look the same as mysql --xml style output?
Not exactly. What happens in pgAdmin xml output is that columns are not treated as individual nodes but instead are attributes of a row node.

Sample first row from the full xml output:

<row id="r1" number="1" c1="X2 II " c2="550 Black Edition " c3="3.1"
c4="2x512k " c5="6MB " c6="2" c7="1.15-1.425 " c8="AM3 " c9="80"
c10="45nm Callisto Q3-2009 " c11="38.75" />

Is this a valid way of representing xml? Certainly.
Will it lead to parsing issues? - No. Any parser worth its salt handles nodes and attributes and exposes them in a useful way to the programmer.

Perhaps you might later use python-libxml2 or similar to construct an insert statement of the form INSERT INTO amd_bang_per_watt VALUES(_,_, from the column entries c1,c2,etc.

This 'more compact' format should save you space and may be doing you a favour if you harbour any thoughts of parsing the rows in jQuery or PHP.

One thing I should point out is that this compact form comes at a price. Should you later insert or remove columns from the table, then the column numbers c1, c2, c3, etc are going to be out of step with your altered table.

The facilities for xml output in your database can be useful, but are not always the quickest route to goal. If you data is going out, then into another postgres database then you would likely want to use the backup option in pgAdmin.

Example of such backup output are shown below:

Familiar 'one insert per line' form that this human finds easy to look at.
Postgres compact form

I found the second backup format a refreshing alternative to some of the issues that can otherwise creep in to comma separated data.

xml in xml out (part 2 of 3) is the next post and will continue this theme.

GnuByExample

Friday, June 4, 2010

When Raid mirroring just won't work - dd to the rescue

Thursday, July 30, 2009

postgres - xml in xml out (part 1 of 2)