Review – “PostgreSQL Server Programming”

Two months ago, I reviewed the Packt Publishing book “Instant PostgreSQL Starter” . And two weeks ago, Packt Publishing offered me to review their latest PostgreSQL book, “PostgreSQL Server Programming”.

I was looking forward reading that book. I’m, like a lot of PostgreSQL users, quite fond of using as much server-side features and functions as possible. And at work, when facing people long used to extremely simple queries (even a join seems weird to some of them), it’s quite difficult to teach that no, a SQL server is not a stupid spreadsheet. I thus was expecting that book to help me teach them other ways.

A simple warning : this book is not for beginners. It is aimed at moderate/advanced PostgreSQL users, people with some SQL knowledge already. It’s also not a SQL reference, covering every corner of the language. There is no CTE in this book for instance.

Now that we know what this book is not about… Let’s have a look at the table of contents.

The book is structured in 10 chapters.

PL/pgSQL programming basics

The first four teach you basics about server side programming (declaring variables in PL/pgSQL, returning more than a basic type…)

It begins with a simple introduction to the principles behind server side programming, the way one can view PostgreSQL as a framework, examples where server side shines, some teasing about triggers, and key reasons for server side programming… Then an entire chapter covers the reasons why the author choose PostgreSQL in the first place, the features it brings to you from a programmer point of view… These 40 pages are welcome for the high quantity of information they bring.

Then the fun begins with «Your First PL/pgSQL Function». We learn the basics about PostgreSQL functions writing, passing parameters, returning values, loops, conditional structures, returning a set… What you need to write your functions, you’ll find it here.

And the last «pure» PL/pgSQL chapter teach you some tricks about returning structured data, arrays, using IN/OUT/INOUT parameters, cursors, even a few words about the XML data type.

Triggers, debugging…

The two following chapters cover two must-have for server-side programming.

The triggers are probably the key feature of server-side programming, the feature you really can not do in another way. Not much to tell here. It covers the classical triggers use cases : auditing, disallowing features, modifying an update before it happens.

And the second chapte covers debugging, because like any real programmer, you will need it. It explains the classical debug through «prints», and a real debugger, integrated with pgadmin. Quite a nice tool I did not know about.

Other languages

The book has an entire chapter about unrestricted languages, especially PL/Python. It is really nice because it uncovers a new set a features with it being an unrestricted language, while it remains simple to use and to understand. A good choice for this book.

And a second chapter goes deeper in PostgreSQL with functions written using C. It explains of course why you would have to decide to write such functions (performances, some features…), and what you need to know to not get lost amid the PostgreSQL code.

Final chapters

The last two chapters cover two more advanced cases. The first one teachs you how to use PL/Proxy to improve your scalability by splitting big tables accross servers, using a simple chat system as example. This is the natural evolution after moving more logic in the server and getting too much data load for one server, you have to split up, and PL/Proxy is an easy way to distribute the load accross servers…

And the final chapter covers writing extensions. Extensions are by far the best way to distribute your PostgreSQL addons. Thanks to this book, you can learn how to write new PostgreSQL features, and that chapter just tells you how to distribute them.

 

To make this quick : this book meet my expectations. It’s a handy book to have when using PL/pgSQL and PL/python, it can help convincing people to use server side programming. It’s not a complete reference, it does not cover advanced queries (that would require an entire book, maybe more than one), but it’s more than enough to start and stay on your desk for a while when writing functions…

My first PostgreSQL patch/hack

After almost two years of lobbying at work to promote PostgreSQL instead of MySQL, after two years of tears against poor DB typing, lack of advanced SQL features, traps in what seemed like basic needs (select * from myview where a=5, if myview is not «simple», triggers a full scan on the view result)… we are finally starting to deploy some PostgreSQL databases.

I wrote my dear sysadmin friends documentations, argumentations. Among my «it’s simpler» arguments, I promoted application_name as a way to quickly know who is hurting that bad your SQL server, along with the so simple ps faux to know who is doing what on your server… That’s the simplest monitoring possible, but when you’re in a hurry at 3AM, it can help you… Sadly, the application_name is not displayed in the ps output. I was looking for a tiny thing to patch PostgreSQL in order to discover its code a bit, it seemed like a good opportunity.

And let’s share also how I did the patch to highlight how easy patching PostgreSQL is :) (the patch has been written together with this blog entry)

 

0) System setup

First thing, you need a minimal development environment for PostgreSQL. Your tools : your favorite compiler (GCC), make, git, $EDITOR, ack-grep and a bit of patience if your CPU is slow (postgresql compiles quite quickly).


~$ cd ~/projects
~/projects$ git clone git://git.postgresql.org/git/postgresql.git
~/projects$ cd postgresql
~/projects/postgresql$ git checkout -b application-name-in-proctitle

1) Exploring the code

We want to modify how the proctitle is constructed. Under BSD systems, it’s done through the setproctitle call, easier to find than the game with ARGV under linux. Soo… let’s search it.

–/projects/postgresql$ ack-grep --type cc setproctitle
src/backend/utils/misc/ps_status.c
42: * use the function setproctitle(const char *, ...)
265: * apparently setproctitle() already adds a `progname:' prefix to the ps
329: setproctitle("%s", ps_buffer);

src/include/pg_config.h
422:/* Define to 1 if you have the `setproctitle’ function. */

Well, only one call… it’s gonna be simple, let’s look at this function. I stripped out comments and ifdefs.

void
set_ps_display(const char *activity, bool force)
{
/* A few exit cases.... */
/* .................... */

/* Update ps_buffer to contain both fixed part and activity */
strlcpy(ps_buffer + ps_buffer_fixed_size, activity,
ps_buffer_size – ps_buffer_fixed_size);

ps_buffer_cur_len = strlen(ps_buffer);

setproctitle(“%s”, ps_buffer);
}

Well, this is quite simple, the activity is added to a buffer containing the fixed portion of the proc title. To prevent useless memory allocations, the same buffer is reused at each call.
Our application_name can change during one connexion, we will need it in this function to dynamically append it before the activity when available.
But where is application_name available ?

~/projects/postgresql$ ack-grep --type cc application_name

This gives us more results, but there is an obvious one :
src/include/utils/guc.h
228:extern char *application_name;

It’s a simple per process global variable…

2) Patching…

So our patch is very straigh-forward now, even for C beginners.

~/projects/postgresql$ $EDITOR src/backend/utils/misc/ps_status.c

The main part we will change is around here :
strlcpy(ps_buffer + ps_buffer_fixed_size, application_name,
ps_buffer_size - ps_buffer_fixed_size);
ps_buffer_dynamic_len = strlen(application_name);


~/projects/postgresql$ git diff

[ too long for a blog, http://pastebin.com/vF9S5CMB ]

It’s not the cleanest patch, but it’s a start.

3) Testing

Compiling and testing PostgreSQL is simple, and can be done along another running instance.

~/projects/postgresql$ ./configure --prefix $HOME/bin/psql-master --enable-debug
~/projects/postgresql$ make -j 6
~/projects/postgresql$ make -j 3 install

Et voilà…

Now that your PostgreSQL has been installed in ~/bin/psql-master, let’s create an instance and start it.
~/projects/postgresql$ cd ~/bin/psql-master/bin
~/bin/psql-master/bin$ ./initdb ../data


If you have a PostgreSQL running on port 5432, you’ll have to change the port in the ../data/postgresql.conf file. I changed it to 5435.
And now…

~/bin/psql-master/bin$ ./postgres -D ../data
LOG: database system was shut down at 2013-06-17 23:51:29 CEST
LOG: database system is ready to accept connections
LOG: autovacuum launcher started

Let’s test it, in another console :

~/bin/psql-master/bin$ ./psql -p 5435
psql (9.4devel)
Type "help" for help.
postgres=# select pg_sleep(10);

And while the sleep is running, we can do a ps :

snoopy 14791 0.0 0.0 164128 4056 ? Ss 23:53 0:00 | \_ postgres: snoopy postgres [local] psql SELECT

And if I change the application name…

snoopy 14791 0.0 0.0 164128 4320 ? Ss 23:53 0:00 | \_ postgres: snoopy postgres [local] my_buggy_script.pl SELECT

Our first patch is working !

But it’s not ready to be submitted yet… Maybe people are using the current ps output and awk/cut-ing it to extract data. Better not crash these by adding a configuration option maybe.

4) Back to code…

First, you must commit your current patch. Git is your friend, use it :)

~/projects/postgresql$ git add src/backend/utils/misc/ps_status.c
~/projects/postgresql$ git commit -m "Display the application_name in proctitle before the current activity"
[application-name-in-proctitle 28a6b94] Display the application_name in proctitle before the current activity
1 file changed, 17 insertions(+), 2 deletions(-)

Now, we can look at adding a new configuration option.
We know that a configuration option exists to disable update_process_title entirely. We can just look at how it works, and copy it.

Here is the code in set_ps_display :

/* update_process_title=off disables updates, unless force = true */
if (!force && !update_process_title)
return;

And where is that variable used/defined ?

~/projects/postgresql$ ack-grep --type cc update_process_title
src/backend/utils/misc/guc.c
1039: {"update_process_title", PGC_SUSET, STATS_COLLECTOR,
1043: &update_process_title,

src/backend/utils/misc/ps_status.c
35:bool update_process_title = true;
295: /* update_process_title=off disables updates, unless force = true */
296: if (!force && !update_process_title)


a few if (update_process_title)


src/include/utils/ps_status.h
15:extern bool update_process_title;

What is that guc thingy ? The file heading comment says :

* guc.c
*
* Support for grand unified configuration scheme, including SET
* command, configuration file, and command line options.

This is the file we are looking for.

Well, let’s just copy the way the update_process_title setting is created…

This is really straigh-forward, nothing interesting here. I named the setting application_in_process_title…


~/projects/postgresql$ git diff --stat src
src/backend/utils/misc/guc.c | 10 ++++++++++
src/backend/utils/misc/ps_status.c | 4 ++--
src/include/utils/ps_status.h | 2 ++
3 files changed, 14 insertions(+), 2 deletions(-)

The full patch is available here : http://pastebin.com/eg0NZGGr

And we have our configuration setting, allowing us to switch between application_name in the process title and the current behaviour.

5) Conclusion

In less than one hour, with no PostgreSQL code experience, it’s really easy to hack a simple patch like this one.
After having nightmares through the MySQL code, it’s a real pleasure to hack on PostgreSQL code : it’s simple, commented, logical…

Now I can start the review process for that patch maybe :)

Review – “Instant PostgreSQL Starter”

Thanks to Shaun M. Thomas, I have been offered a numeric copy of the “Instant PostgreSQL Backup” book from Packt publishing, and was provided with the “Instant PostgreSQL Starter” book to review. Considering my current work-situation, doing a lot of PostgreSQL advertising and basic teaching, I was interested in reviewing this one…

Like the Instant collection ditto says, it’s short and fast. I kind of disagree with the “focused” for this one, but it’s perfectly fine considering the aim of that book.

Years ago, when I was a kid, I discovered databases with a tiny MySQL-oriented book. It teaches you the basis : how to install, basic SQL queries, some rudimentary PHP integration. This book looks a bit like its PostgreSQL-based counterpart. It’s a quick travel through installation, basic manipulation, and the (controversy) “Top 9 features you need to know about”. And that’s exactly the kind of book we need.

So, what’s inside ? I’d say what you need to kick-start with PostgreSQL.

The installation part is straight forward : download, click, done. Now you can launch pgadmin, create an user, a database, and you’re done. Next time someone tells you PostgreSQL ain’t easy to install, show him that book.

The second part is a fast SQL discovery, covering a few PostgreSQL niceties. It’s damn simple : Create, Read, Update, Delete. You won’t learn about indexes, functions, advanced queries here. For someone discovering SQL, it’s what needs to be known to just start…

The last part, “Top 9 features you need to know about”, is a bit more hard to describe. PostgreSQL is a RDBMS with included batteries, choosing 9 features must have been a really hard time for the author, and I think nobody can be blamed for not choosing that or that feature you like : too much choice… The author spends some time on pg_crypto, the RETURNING clause with serial, hstore, XML, even recursive queries… This is, from my point of view, the troublesome part of the book : mentioning all these features means introducing complicated SQL queries. I would never teach someone how to do recursive queries before teaching him joins, it’s like going from elementary school to university in fourty pages. But the positive part is that an open-minded and curious reader will have a great teaser and nice tracks to follow to increase his knowledge of PostgreSQL. Mentioning hstore is really cool, that’s one of the PostgreSQL feature one have to know…

 

To sum up my point of view about this book : it’s a nice book for beginners, especially considering the current NoSQL movement and people forgetting about SQL and databases. It’s a bit sad we don’t have more books like this one about PostgreSQL. I really hope Packt publishing will try to have a complete collection, from introduction (this book) to really advanced needs (PostgreSQL High Performance comes to mind) through advanced SQL queries, administration tips and so on… They have a book about PostgreSQL Server Programming planned next month, I’m really looking forward to this one.

my-post-engine : a postgres-engine for MySQL

Introduction

For a while, I’ve been trying to push some people toward PostgreSQL. It covers their use-cases better than MySQL, should provide greater performance, and should help me keep the hairs on my head.

But most of the time, they are relunctant unless they can really try it. Sadly, when you have huge databases, huge piece of code depending on the MySQL behaviour, you’re in a dead-end.

So I tried to find a way for this MySQL-depending code to speak to a PostgreSQL database seemlessly, in order to ease possible migrations…

What is it ?

After many failed attempts translating MySQL queries to a PostgreSQL syntax on the fly, I decided only MySQL could understand MySQL queries, and hence I decided to take the problem in a different way.

MySQL has a notion of storage engine. It allows you to decide how your tables are stored.

Most known storage engines are MyISAM (the “famous”, transaction-less, lock-full engine) and InnoDB (with transaction support, but say farewell to full-text and spatial indexing)…
Other included, but less known storages, are Blackhole (aka /dev/null) and Federated, where your MySQL server exposes tables coming from other MySQL servers.

And that’s when the crazy idea came : why couldn’t I write a PostgreSQL engine for MySQL ?

How does it work ?

Well, the storage API for MySQL is quite simple : read row, search in an index, delete row, modify row, insert row… That’s really easy to implement that with several simple SQL queries. But… it can not be done with standard-compliant SQL queries. You need to be able to identify each row of a table with a guaranteed unique identifier. Most of tables have a primary key, perfect for that. But you don’t always have that luck. And when you don’t have it, you have to start relying on other tricks, specific to your database engine. MySQL supports limit on delete/update, even without any order by statement. PostgreSQL doesn’t. Instead, we can rely on hidden, system-specific columns like the ctid in PostgreSQL… So your engine can not be really generic.

Anyway… well, using mixed tricks, the end result is quite convincing.

First step, create a simple table with the postgres engine.

CREATE TABLE `todo_pg` ( 
  `id` int(11) NOT NULL AUTO_INCREMENT, 
  `domain` varchar(255) NOT NULL, 
  `date_todo` datetime NOT NULL,  
  `state` varchar(127) NOT NULL,  
  PRIMARY KEY (`id`)
) ENGINE=POSTGRES DEFAULT CHARSET=utf8 CONNECTION='host=localhost port=5432 dbname=todo password=mypass user=mysql'

Now, you can start playing with it.

mysql> select * from todo_pg;
Empty set (0.00 sec)

mysql> insert into todo_pg (domain, date_todo, state) values ('postgresql.org', now(), 'toVisit');
Query OK, 1 row affected (1.14 sec)

mysql> select * from todo_pg;
+----+----------------+---------------------+---------+
| id | domain         | date_todo           | state   |
+----+----------------+---------------------+---------+
|  4 | postgresql.org | 2013-04-01 01:05:52 | toVisit |
+----+----------------+---------------------+---------+
1 row in set (0.00 sec)

mysql> insert into todo_pg (domain, date_todo, state) values ('mysql.org', now(), 'toVisit');
Query OK, 1 row affected (0.08 sec)

mysql> select last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
|                5 |
+------------------+
1 row in set (0.00 sec)

mysql> select * from todo_pg;
+----+----------------+---------------------+---------+
| id | domain         | date_todo           | state   |
+----+----------------+---------------------+---------+
|  4 | postgresql.org | 2013-04-01 01:05:52 | toVisit |
|  5 | mysql.org      | 2013-04-01 01:06:04 | toVisit |
+----+----------------+---------------------+---------+
2 rows in set (0.00 sec)

mysql> update todo_pg set state='done' where domain like '%.org';
Query OK, 2 rows affected (0.19 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql> select * from todo_pg;
+----+----------------+---------------------+-------+
| id | domain         | date_todo           | state |
+----+----------------+---------------------+-------+
|  4 | postgresql.org | 2013-04-01 01:05:52 | done  |
|  5 | mysql.org      | 2013-04-01 01:06:04 | done  |
+----+----------------+---------------------+-------+
2 rows in set (0.00 sec)

mysql> truncate todo_pg;
Query OK, 0 rows affected (0.23 sec)

mysql> select * from todo_pg;
Empty set (0.05 sec)

It ain’t no joke, that really works.

Current status

Well, the current implementation is experimental, lacks broad testing, has possible performance issues, and installation is a real pain because MySQL supports plugins, but don’t provide neither build instructions nor .h files to help you build such plugins… It’s a pity writing plugins for MySQL, really… Installation is sucks, only distributions could provide it along their MySQL packages :/

The code is available in gitorious : https://gitorious.org/my-post-engine

Don’t hesitate, have fun testing it. Play with it, crash it, fork it… I only hope it will be usefull to someone :)