Row-level version control with PostgreSQL on AWS

Originally authored in Feb 2014; refreshed in Jun 2015. See change log at the end of this article for a list of updates

Time Travel appears to have been a little known feature of PostgreSQL that fell out of favour as of version 6.2. This feature effectively provided row-level version history

a paper trail of all your changes to data in a table - useful when it comes to auditing, providing version history to users etc. This post covers how to achieve Time Travel-esque behaviour using an AWS PostgreSQL RDS instance via triggers.

Background

Time Travel achieves version history by means of two columns (which could incidentally be called anything) that track when the row was valid from, and when it was valid to. By means of an example (you can ignore the details for now, we come to them later), let us first create a piece of fruit:

myitcv=# insert into fruits (name) values ('apple');
myitcv=# select * from fruits;
 id | name  |          valid_from           | valid_to
----+-------+-------------------------------+----------
  1 | apple | 2015-02-25 19:27:18.011267+00 | infinity

Creating a new record inserts a new row into our table. valid_from = now() corresponds to the creation time. Notice the valid_to = infinity - this tells us that the row is current.

If we then change the name of the piece of fruit (notice we use the restriction valid_to = infinity to refer to the latest version):

myitcv=# update fruits set name = 'pear' where id = 1 and valid_to = 'infinity';
myitcv=# select * from fruits;
 id | name  |          valid_from           |           valid_to
----+-------+-------------------------------+-------------------------------
  1 | apple | 2015-02-25 19:27:18.011267+00 | 2015-02-25 19:27:50.242802+00
  1 | pear  | 2015-02-25 19:27:50.242802+00 | infinity

The update has been translated into two changes:

Update valid_to on the now old version to now()
Insert a new row to represent the new version, following the same logic for valid_from as for insert

And then finally delete the piece of fruit; we don’t need it any more:

myitcv=# delete from fruits where id = 1 and valid_to = 'infinity';
myitcv=# select * from fruits;
 id | name  |          valid_from           |           valid_to
----+-------+-------------------------------+-------------------------------
  1 | apple | 2015-02-25 19:27:18.011267+00 | 2015-02-25 19:27:50.242802+00
  1 | pear  | 2015-02-25 19:27:50.242802+00 | 2015-02-25 19:28:21.066919+00

Notice the row itself is not deleted, rather the valid_to date is simply updated from infinity -> now(). This is in effect a soft delete.

With appropriate indexes and restrictions on key constraints, this can in effect give you a version history, or paper trail if you will, of changes to rows in your table. Nothing gets deleted; we append to the table any changes, including deletes. Good for auditing… Good for all manner of things.

How do I use Time Travel?

It depends on where you are using PostgreSQL.

Recent version of PostgreSQL under your control

Time Travel lives on in later versions of PostgreSQL via the spi contrib module. If you are in control of your PostgreSQL installation, you can simply install the extension:

myitcv=# create extension timetravel;

then follow an example given in the PostgreSQL source to get up and running and start travelling back through time immediately. Eat your heart out, Marty McFly.

At this point you could stop reading this article because you have a working time machine! What follows might however be of interest: it is a trigger-based equivalent to the spi module that (importantly) brings the timetravel behaviour to PostgreSQL on AWS.

PostgreSQL on AWS

If you are using an AWS PostgreSQL RDS instance you will be disappointed to learn that timetravel is not a supported extension, as of 2014-02-25 at least.

As the legacy v6.3 documentation points out however, timetravel-like behaviour can be achieved using triggers. The goal of the remainder of this post, as mentioned earlier, is to try and achieve just that.

What follows assumes:

The you are familiar with PostgreSQL
That you have a PostgreSQL instance running on AWS (my tests were against a v9.3.2 instance in eu-west)
You can connect to that instance using the PostgreSQL CLI or some GUI tool (to execute arbitrary SQL)

Installing the main functions

To get up and running, you will need to execute the following SQL to create the two functions that simulate timetravel. One will be called from a before trigger, the other from an after trigger (we show how to create the triggers below):

The code itself is commented to give some motivation behind certain decisions. But the one point worth making regards deletes.

A delete could simply be achieved by intercepting a user request to delete and translating that to update the valid_to = now() on the corresponding row. We would therefore silently ignore the request to delete something. However, the effect of this is to return that 0 rows were affected by the delete. Client libraries using our table might well have problems with this (ActiveRecrod with optimistic locking does for example). Hence we have to go to the effort of inserting a new row to represent the old version (much like we do for an update) and then allow the original delete to continue.

The functions in action: fruit trigger

By themselves, the aforementioned functions are useless. So how do we use them? The following SQL gives just such an example:

DROP TABLE IF EXISTS fruits;

CREATE TABLE fruits (
  id SERIAL NOT NULL,
  name TEXT,
  valid_from TIMESTAMP WITH TIME ZONE NOT NULL,
  valid_to TIMESTAMP WITH TIME ZONE NOT NULL
);

DROP TRIGGER IF EXISTS fruits_before ON fruits;
CREATE TRIGGER fruits_before
BEFORE INSERT OR UPDATE OR DELETE ON fruits
  FOR EACH ROW EXECUTE PROCEDURE process_timetravel_before();

DROP TRIGGER IF EXISTS fruits_after ON fruits;
CREATE TRIGGER fruits_after
AFTER UPDATE OR DELETE ON fruits
  FOR EACH ROW EXECUTE PROCEDURE process_timetravel_after();

Again, execute this SQL to create the tables and triggers, at which point you should be in a position to run the SQL found earlier in this post where we inserted, updated and deleted an apple (pear).

Known limitations/problems

Feedback, corrections, suggestions etc. would be greatly appreciated. Indeed, please let me know if something should be added to this list:

The main functions assumes that the table columns that provide valid_from and valid_to are called just that. timetravel provided support for providing alternative columns (e.g. starting and ending); our version is limited in that respect
This whole process only safely works within a transaction; our version does not check that it is called within the context of a transaction

References

In writing this trigger-based alternative to timetravel, I’ve relied heavily on:

PostgreSQL’s trigger documentation
PostgreSQL’s explicit locking documentation
Documentation on PL/pgSQL - SQL Procedural Language
Various posts by Pavel Stěhule including this one
An important note about executing dynamic commands

Thanks to the many contributors to PostgreSQL and its community for what is a great DB.

Update 2015-03-22: reverted use of now() at time zone 'utc' because it’s superfluous. Added license details</br> Update 2015-02-25: updated to use now() at time zone 'utc' and TIMESTAMP WITH TIME ZONE for safe default behaviour