Query Routing

SQL Relay's query routing feature allows you to send one set of queries to one database, another set of queries to another.

How is this useful? Here are some examples of what you can do:

These are not the only things you can do. Use your imagination.

How it Works

Normally, SQL Relay maintains a set of persistent connections to a database or database cluster and distributes queries over the connections. The databases to connect to are defined in a configuration file.

When used in query routing mode, SQL Relay is configured to distribute queries over other instances of SQL Relay running in standard mode.

For instance, you can set up one instance of SQL Relay to maintain connections to a master database and another instance of SQL Relay to maintain connections to a pool of slaves, then set up a third instance of SQL Relay to route queries to the other 2 instances.

General Configuration

Each instance of SQL Relay that will be used as a query router must define dbase="route" in the instance tag and include a single routers tag, which must contain a set of router tags. Each router tag defines a router module to use and a connectionid to route queries to. It must also contain a set of tags defining rules for identifying which queries to route to the specified connectionid.

When deciding where to route a query, SQL Relay runs through the router tags in the order that they appear in the configuration file, and uses the specified module to identify where to route the query. For example, if a router tag specifies the regex module, and a query matches a regular expression defined in a pattern tag, then the query will be routed to the connectionid defined in the router tag.

See the examples below. Note that the regular expressions in the examples below aren't perfect. They are not case insensitive, for example, nor do they necessarily cover all possible cases. In some cases, better, or more regular expressions could be written. In other cases, developers may need to structure their queries so that they are guaranteed to match.

Sending Inserts, Updates and Deletes to a Master Database and Distributing Selects Over a Set of Slaves

Here's an example configuration for sending inserts, updates and delete's to a master database and distributing selects over a set of slaves.

In this example, 3 instances of SQL Relay are defined: one to maintain connections to the "master" database, another to maintain connections to a set of "slave" databases, and another to route queries over the "master" and "slave" instances. Clients would connect to the "router" instance (on port 9001 or to the /tmp/router.socket unix socket). In this example, all 3 instances of SQL Relay run on the same machine, but in reality, all 3 could run on separate machines.

<?xml version="1.0"?>
<instances>

        <!-- This instance maintains connections to the "master" MySQL database
                on the masterdb machine.  This instance only listens on the
                unix socket /tmp/master.socket and thus cannot be connected to
                by clients from another machine. -->
        <instance id="master" socket="/tmp/master.socket" dbase="mysql" connections="10">
                <users>
                        <user user="masteruser" password="masterpassword"/>
                </users>
                <connections>
                        <connection string="user=masteruser;password=masterpassword;host=masterdb;db=master;"/>
                </connections>
        </instance>


        <!-- This instance maintains connections to 4 "slave" MySQL databases
                on 4 slave machines.  This instance only listens on the unix
                socket /tmp/slave.socket and thus cannot be connected to by
                clients from another machine. -->
        <instance id="slave" socket="/tmp/slave.socket" dbase="mysql" connections="10">
                <users>
                        <user user="slaveuser" password="slavepassword"/>
                </users>
                <connections>
                        <connection string="user=slaveuser;password=slavepassword;host=slavedb1;db=slave;"/>
                        <connection string="user=slaveuser;password=slavepassword;host=slavedb2;db=slave;"/>
                        <connection string="user=slaveuser;password=slavepassword;host=slavedb3;db=slave;"/>
                        <connection string="user=slaveuser;password=slavepassword;host=slavedb3;db=slave;"/>
                </connections>
        </instance>


        <!-- This instance sends DML (insert,update,delete) and
                DDL (create/delete) queries to the "master" SQL Relay instance
                which, in turn, sends them to the "master" database.
                This instance sends any other queries to the "slave" SQL Relay
                instance which, in turn, distributes them over the "slave"
                databases. -->
        <instance id="router" dbase="router" connections="10">
                <users>
                        <user user="routeruser" password="routerpassword"/>
                </users>
                <routers>
                        <!-- send all DML/DDL queries to "master"  -->
                        <router module="regex" connectionid="master">
                                <pattern pattern="^drop "/>
                                <pattern pattern="^create "/>
                                <pattern pattern="^insert "/>
                                <pattern pattern="^update "/>
                                <pattern pattern="^delete "/>
                        </router>
                        <!-- send all other queries to "slave" -->
                        <router module="regex" connectionid="slave">
                                <pattern pattern=".*"/>
                        </router>
                </routers>
                <connections>
                        <connection connectionid="master" string="socket=/tmp/master.socket;user=masteruser;password=masterpassword"/>
                        <connection connectionid="slave" string="socket=/tmp/slave.socket;user=slaveuser;password=slavepassword"/>
                </connections>
                <notifications>
                        <notification module="events">
                                <events>
                                        <event event="integrity_violation"/>
                                </events>
                                <recipients>
                                        <recipient address="dba@firstworks.com"/>
                                </recipients>
                        </notification>
                </notifications>
        </instance>

</instances>

Note the use of a notification module to notify dba@firstworks.com if an integrity_violation event occurs. SQL Relay must maintain parallel transactions on all databases that a query may be routed to. An integrity violation occurs when a transaction control query (begin, commit, rollback, autocommit on, or autocommit off) succeeds on some of the backends but fails on others. See notifications for information about notification modules.

Provide a Single Point of Access to Multiple Databases

Here's an example configuration providing a single point of access to multiple databases.

In this example, we provide a single point of access to the MySQL database "mysqldb" and the PostgreSQL database "postgresqldb". Since both databases support the "database.tablename" syntax, we can use that format when specifying tables and send all queries containing "mysqldb." to the MySQL database and all queries containing "postgresqldb." to the PostgreSQL database.

<?xml version="1.0"?>
<instances>

        <!-- This instance maintains connections to a MySQL database -->
        <instance id="mysqldb" port="" socket="/tmp/mysqldb.socket" dbase="mysql" connections="10">
                <users>
                        <user user="mysqldbuser" password="mysqldbpassword"/>
                </users>
                <connections>
                        <connection string="user=mysqldbuser;password=mysqldbpassword;host=mysqldb;db=mysqldb;"/>
                </connections>
        </instance>


        <!-- This instance maintains connections to a PostgreSQL database -->
        <instance id="postgresqldb" port="" socket="/tmp/postgresqldb.socket" dbase="postgresql" connections="10">
                <users>
                        <user user="postgresqldbuser" password="postgresqldbpassword"/>
                </users>
                <connections>
                        <connection string="user=postgresqldbuser;password=postgresqldbpassword;host=postgresqldb;db=postgresqldb;"/>
                </connections>
        </instance>


        <!-- This instance sends queries containing "mysqldb." to the mysql
                database and "postgresqldb." to the postgresql database -->
        <instance id="router" dbase="router" connections="10">
                <users>
                        <user user="routeruser" password="routerpassword"/>
                </users>
                <routers>
                        <!-- send all mysqldb queries to "mysqldb" -->
                        <router module="regex" connectionid="mysqldb">
                                <pattern pattern="mysqldb\."/>
                        </router>
                        <!-- send all postgresqldb queries to "postgresqldb" -->
                        <router module="regex" connectionid="postgresqldb">
                                <pattern pattern="postgresqldb\."/>
                        </router>
                </routers>
                <connections>
                        <connection connectionid="mysqldb" string="socket=/tmp/mysqldb.socket;user=mysqldbuser;password=mysqldbpassword"/>
                        <connection connectionid="postgresqldb" string="socket=/tmp/postgresqldb.socket;user=postgresqldbuser;password=postgresqldbpassword"/>
                </connections>
                <notifications>
                        <notification module="events">
                                <events>
                                        <event event="integrity_violation"/>
                                </events>
                                <recipients>
                                        <recipient address="dba@firstworks.com"/>
                                </recipients>
                        </notification>
                </notifications>
        </instance>

</instances>

Note the use of a notification module to notify dba@firstworks.com if an integrity_violation event occurs. SQL Relay must maintain parallel transactions on all databases that a query may be routed to. An integrity violation occurs when a transaction control query (begin, commit, rollback, autocommit on, or autocommit off) succeeds on some of the backends but fails on others. See notifications for information about notification modules.

The examples above all use the regex router module to route queries based on regular expression pattern matching. Router modules are available to route queries based on other things too though, such as the user that ran the query, or the IP address of the client. See Router Modules for more information about router modules.

Quirks and Limitations

Query Normalization

To make pattern matching easier, SQL Relay "normalizes" the query before matching it against the pattern. The original query is run against the database but when matched against the pattern, whitespace is compresssed and the entire query is converted to lower-case.

When matching query operators, you must use lower-cased versions of them such as "select", "insert", "and", "or", etc. When matching table names, you must use a lower-cased version of the table-name.

Examples are Limited

The examples above are good introductions, but don't cover every scenario. For example, they expect the query to begin at the beginning of a line with no preceeding comments or spaces. You may have to write more complex patterns for your applications.

Perl Compatible Regular Expressions

SQL Relay is built upon the Rudiments library. Rudiments can be built with or without support for libpcre which provides support for Perl Compatible Regular Expressions. PCRE's are more powerful than standard posix regular expressions and have many more operators.

As such, if you copy an sqlrelay.conf file from a machine where Rudiments was compiled with PCRE support to a machine where Rudiments wasn't compiled with PCRE support, then it's possible that your patterns may not work on the new machine.

To make matters worse, sufficiently old versions of the posix regular expression functions had fewer operators than modern versions. So, even if Rudiments isn't using PCRE's, it's not impossible that after copying a sqlrelay.conf file from a fairly modern OS to an antique, the patterns won't work on the antique machine either.

The examples above ought to work with PCRE's and all versions of posix regular expressions.

Selects Not Showing Changes

In the scenario above where DML/DDL is sent to the master database and selects are distributed over slaves, an unintuitive thing can happen.

If you begin a transaction and do several inserts, updates and deletes, you'll find that if you do a select, you will not see your changes. This is because in a master-slave configuration, changes to the database are not pushed out to the slaves until the changes have been committed. Since your selects are being run against the slaves, you must first commit before your changes will be visible.

Commits and Rollbacks

Behind the scenes, the router maintains parallel transactions on each of the databases that it is routing queries to. When the client issues a begin, commit or rollback, the router issues a begin, commit or rollback to each of the databases. Similarly, if the client turns auto-commit on or off, the router turns auto-commit on or off on each of the databases.

There are scenarios where a commit, rollback or auto-commit on/off command could succeed on some of the databases and fail on others. Some databases have a 2-phase commit feature to handle these scenarios. With 2-phase commit, you can roll back a commit until you do second commit. Many databases don't support 2-phase commit though. At present, SQL Relay doesn't currently support 2-phase commit for any databases. So, currently, SQL Relay doesn't handle these scenarios very well. It returns false, indicating than the operation failed, but there's no good way to know which databases succeeded and which failed. Ideally, SQL Relay would remove the database that failed, stop sending queries to it, notify someone that it has gotten out of sync and log all of the queries that would have gone to it so they can be replayed later, or something like that. But it doesn't currently do any of that.

Since queries may be routed to different kinds of databases, the router has to employ some tricks to maintain parallel transactions. Some databases run in auto-commit mode by default and must be issued a "begin" query to start a transaction. Other databases implicitly start a new transaction when a client logs in and after each commit or rollback. If any of the databases being routed to require a "begin" query to start a transaction, then the ones that don't are put in auto-commit mode when the client logs in and after each commit or rollback and are taken out of auto-commit mode when the client sends a begin query. If none of the databases being routed to require a "begin" query to start a transaction, then the databases are not put in auto-commit mode when the client logs in or after each commit or rollback. Rather, transactions are implicitly started by the database. For example, if your client application is using a router which routes queries over both PostgreSQL and Oracle databases, then since PostgreSQL requires "begin" queries, you must use a "begin" query to start a transaction, even if your app only intends to send queries which would be run against Oracle. Conversely, if your client application is using a router which only routes queries over a set of Oracle databases, then you do not have to use "begin" queries.

Stored Procedures

It's possible to use stored procedures with SQL Relay's query routing feature. However, since stored procedures are run on the database, SQL Relay can't route the individual queries run inside the stored procedure. So, the stored procedure and all queries run inside of it will be run against whichever database it was routed to.