Row-level version control with PostgreSQL on AWS
Originally authored in Feb 2014; refreshed in Jun 2015. See change log at the end of this article for a list of updates
Time Travel appears to have been a little known feature of PostgreSQL that fell out of favour as of version
6.2
. This feature effectively provided row-level version history
- a paper trail of all your changes to data in a table - useful when it comes to auditing, providing version history to users etc. This post covers how to achieve Time Travel-esque behaviour using an AWS PostgreSQL RDS instance via triggers.
Background
Time Travel achieves version history by means of two columns (which could incidentally be called anything) that track when the row was valid from, and when it was valid to. By means of an example (you can ignore the details for now, we come to them later), let us first create a piece of fruit:
myitcv=# insert into fruits (name) values ('apple');
myitcv=# select * from fruits;
id | name | valid_from | valid_to
----+-------+-------------------------------+----------
1 | apple | 2015-02-25 19:27:18.011267+00 | infinity
Creating a new record inserts a new row into our table. valid_from = now()
corresponds to the creation time. Notice
the valid_to = infinity
- this tells us that the row is current.
If we then change the name of the piece of fruit (notice we use the restriction valid_to = infinity
to refer to the
latest version):
myitcv=# update fruits set name = 'pear' where id = 1 and valid_to = 'infinity';
myitcv=# select * from fruits;
id | name | valid_from | valid_to
----+-------+-------------------------------+-------------------------------
1 | apple | 2015-02-25 19:27:18.011267+00 | 2015-02-25 19:27:50.242802+00
1 | pear | 2015-02-25 19:27:50.242802+00 | infinity
The update has been translated into two changes:
- Update
valid_to
on the now old version tonow()
- Insert a new row to represent the new version, following the same logic for
valid_from
as for insert
And then finally delete the piece of fruit; we don’t need it any more:
myitcv=# delete from fruits where id = 1 and valid_to = 'infinity';
myitcv=# select * from fruits;
id | name | valid_from | valid_to
----+-------+-------------------------------+-------------------------------
1 | apple | 2015-02-25 19:27:18.011267+00 | 2015-02-25 19:27:50.242802+00
1 | pear | 2015-02-25 19:27:50.242802+00 | 2015-02-25 19:28:21.066919+00
Notice the row itself is not deleted, rather the valid_to
date is simply updated from infinity -> now()
.
This is in effect a soft delete.
With appropriate indexes and restrictions on key constraints, this can in effect give you a version history, or paper trail if you will, of changes to rows in your table. Nothing gets deleted; we append to the table any changes, including deletes. Good for auditing… Good for all manner of things.
How do I use Time Travel?
It depends on where you are using PostgreSQL.
Recent version of PostgreSQL under your control
Time Travel lives on in later versions of PostgreSQL via the spi contrib module. If you are in control of your PostgreSQL installation, you can simply install the extension:
myitcv=# create extension timetravel;
then follow an example given in the PostgreSQL source to get up and running and start travelling back through time immediately. Eat your heart out, Marty McFly.
At this point you could stop reading this article because you have a working time machine! What follows might however be
of interest: it is a trigger-based equivalent to the spi module that (importantly) brings the timetravel
behaviour to
PostgreSQL on AWS.
PostgreSQL on AWS
If you are using an AWS PostgreSQL RDS instance you will be disappointed to learn that timetravel
is not a supported
extension, as of 2014-02-25 at least.
As the legacy v6.3
documentation points out however,
timetravel
-like behaviour can be achieved using triggers. The goal of the remainder of this post, as mentioned
earlier, is to try and achieve just that.
What follows assumes:
- The you are familiar with PostgreSQL
- That you have a PostgreSQL instance running on AWS (my tests were against a
v9.3.2
instance ineu-west
) - You can connect to that instance using the PostgreSQL CLI or some GUI tool (to execute arbitrary SQL)
Installing the main functions
To get up and running, you will need to execute the following SQL to create the two functions that simulate
timetravel
. One will be called from a before trigger, the other from an after trigger (we show how to create the
triggers below):
The code itself is commented to give some motivation behind certain decisions. But the one point worth making regards deletes.
A delete could simply be achieved by intercepting a user request to delete and translating that to update the valid_to
= now()
on the corresponding row. We would therefore silently ignore the request to delete something. However,
the effect of this is to return that 0 rows were affected by the delete. Client libraries using our table might well have problems
with this (ActiveRecrod with optimistic
locking does for example). Hence we have to go
to the effort of inserting a new row to represent the old version (much like we do for an update) and then allow the
original delete to continue.
The functions in action: fruit trigger
By themselves, the aforementioned functions are useless. So how do we use them? The following SQL gives just such an example:
DROP TABLE IF EXISTS fruits;
CREATE TABLE fruits (
id SERIAL NOT NULL,
name TEXT,
valid_from TIMESTAMP WITH TIME ZONE NOT NULL,
valid_to TIMESTAMP WITH TIME ZONE NOT NULL
);
DROP TRIGGER IF EXISTS fruits_before ON fruits;
CREATE TRIGGER fruits_before
BEFORE INSERT OR UPDATE OR DELETE ON fruits
FOR EACH ROW EXECUTE PROCEDURE process_timetravel_before();
DROP TRIGGER IF EXISTS fruits_after ON fruits;
CREATE TRIGGER fruits_after
AFTER UPDATE OR DELETE ON fruits
FOR EACH ROW EXECUTE PROCEDURE process_timetravel_after();
Again, execute this SQL to create the tables and triggers, at which point you should be in a position to run the SQL found earlier in this post where we inserted, updated and deleted an apple (pear).
Known limitations/problems
Feedback, corrections, suggestions etc. would be greatly appreciated. Indeed, please let me know if something should be added to this list:
- The main functions assumes that the table columns that provide
valid_from
andvalid_to
are called just that.timetravel
provided support for providing alternative columns (e.g.starting
andending
); our version is limited in that respect - This whole process only safely works within a transaction; our version does not check that it is called within the context of a transaction
References
In writing this trigger-based alternative to timetravel
, I’ve relied heavily on:
- PostgreSQL’s trigger documentation
- PostgreSQL’s explicit locking documentation
- Documentation on PL/pgSQL - SQL Procedural Language
- Various posts by Pavel Stěhule including this one
- An important note about executing dynamic commands
Thanks to the many contributors to PostgreSQL and its community for what is a great DB.
Update 2015-03-22: reverted use of now() at time zone 'utc'
because it’s superfluous. Added license details</br>
Update 2015-02-25: updated to use now() at time zone 'utc'
and TIMESTAMP WITH TIME ZONE
for safe default
behaviour