NetApp Closes Acquisition of Instaclustr Read the announcement
  • Technical
  • PostgreSQL
SQLreduce: Reduce Verbose SQL Queries to Minimal Examples

Developers often face very large SQL queries that raise some errors. SQLreduce is a tool to reduce that complexity to a minimal query.

The Problem

Often a developer will have extremely large queries that will fail either during creation, or perhaps when a server migration or data change has happened. It can be difficult to parse a 100-line SQL statement to determine where the error occurred, so this tool will help you to isolate the cause of the bug.

Reduce Complexity With SQLreduce

This issue is solved by SQLreduce. SQLreduce takes as input an arbitrary SQL query which is then run against a PostgreSQL server. Various simplification steps are applied, checking after each step that the simplified query still triggers the same error from PostgreSQL. The end result is a SQL query with minimal complexity.

Note that SQLreduce does not try to derive a query that is semantically identical to the original, or produces the same query result—the input is assumed to be faulty, and we are looking for the minimal query that produces the same error message from PostgreSQL when run against a database. If the input query happens to produce no error, the minimal query output by SQLreduce will just be SELECT.

How It Works

We’ll use a complicated query to demonstrate how SQLreduce works and which steps are taken to remove noise from the input. The query is bogus and contains a bit of clutter that we want to remove:

Let’s pass the query to SQLreduce:

SQLreduce starts by parsing the input using pglast and libpg_query which expose the original PostgreSQL parser as a library with Python bindings. The result is a parse tree that is the basis for the next steps. The parse tree looks like this:

Pglast also contains a query renderer that can render back the parse tree as SQL, shown as the regenerated query below. The input query is run against PostgreSQL to determine the result, in this case ERROR: column pg_database.reltuples does not exist.

SQLreduce works by deriving new parse trees that are structurally simpler, generating SQL from that, and running these queries against the database. The first simplification steps work on the top level node, where SQLreduce tries to remove whole subtrees to quickly find a result. The first reduction tried is to remove LIMIT 10:

The query result is still ERROR: column pg_database.reltuples does not exist, indicated by a ✔ check mark. Next, ORDER BY 1 is removed, again successfully:

Now the entire target list is removed:

This shorter query is still equivalent to the original regarding the error message returned when it is run against the database. Now the first unsuccessful reduction step is tried, removing the entire FROM clause:

That query is also faulty, but triggers a different error message, so the previous parse tree is kept for the next steps. Again a whole subtree is removed, now the WHERE clause:

We have now reduced the input query so much that it doesn’t error out any more. The previous parse tree is still kept which now looks like this:

Now SQLreduce starts digging into the tree. There are several entries in the FROM clause, so it tries to shorten the list. First, pg_database is removed, but that doesn’t work, so pg_class is removed:

Since we have found a new minimal query, recursion restarts at top-level with another try to remove the WHERE clause. Since that doesn’t work, it tries to replace the expression with

NULL, but that doesn’t work either.

Now a new kind of step is tried: expression pull-up. We descend into WHERE clause, where we replace A < B first by A and then by B.

The first try did not work, but the second one did. Since we simplified the query, we restart at top-level to check if the FROM clause can be removed, but it is still required.

From A / B, we can again pull up A:

SQLreduce has found the minimal query that still raises ERROR: column pg_database.reltuples does not exist with this parse tree:

At the end of the run, the query is printed along with some statistics:

This minimal query can now be inspected to fix the bug in PostgreSQL or in the application.

This blog was originally published on credativ.com

Learn the 10 rules that will help you perfect your PostgreSQL installation and get ahead of the curve.

Download Now