Blog > Post
Extending database applications with a smart proxy
As its name suggests, a smart database proxy sits between database clients and database servers (SQL or NoSQL). It can observe and modify the network traffic between clients and servers, without having to make any changes to either side because the proxy works at the wire protocol level.
The "smart" part of the proxy is that it can execute custom logic on this traffic, giving you complete control over it. This makes the proxy a potential control point for all database clients.
Smart database proxies are generally used for three reasons:
By definition, a proxy sees everything that goes into, and comes out of a database. A lot can be learned by observing this traffic. This is used for intrusion detection, business intelligence, performance analysis, etc...
There are other ways to observe the activity of a database, like monitoring interfaces and log monitoring, but they usually require special access to the database, and they can put some load on the database. A proxy has no effect on the database server, can be put in place without special access to the database, and can be scaled independently.
Most database systems have some level of access control, but they are usually at the semantic level: you get access to this data, but not that data. But sometimes you need the ability to reject certain requests because they're inefficient, or because they come at the wrong time of day, or for whatever reason.
A trivial application of this is basic query control, but it can also include rejecting certain connections, rate-limiting, redirecting connections, etc...
Because a database proxy has complete control over the network traffic between database clients and database servers, it can also modify this traffic.
For instance, an inefficient query can be rewritten to be more efficient before it reaches the database.
Similarly, a result set can be modified on the fly on its way back to the database client. This is particularly useful for fine-grained control, such as custom data masking, data classification enforcement, and the like.
Typical use cases
A smart database proxy is not a solution per se: it's a platform that enables you to create a solution. Each situation is unique, but the majority of uses fall into a few broad use cases.
The most common use case for a smart database proxy is when you need to change how an application interacts with a database, but you cannot change that application -- usually a third-party application that you do not control, or an application that is no longer maintained.
This is the most common use case because there is no alternative: it's either that, or give up the application.
In practical terms, this is usually a fairly straightforward affair: you set up a filter in the proxy to catch a certain request and replace it with another request. In most cases, the replacement request is logically equivalent to the one it replaces, since it needs to result in something that can be consumed by the application. For instance, that could mean rephrasing a query to be more efficient or to behave differently, or changing a syntax or feature that is no longer supported by the database.
In some cases, you might even change a request so that it returns an error message (e.g. "This data is not available"), an empty result set, or even a data set from a different data source.
Result set modification
This is the mirror image of request modification. There are situations in which you may need to change the data returned by some queries in a way that would be difficult or impossible to do by modifying the request. You may need to change one value in a large result set, or convert currencies, or mask certain values in ways that are not supported by the database. Whatever the reason is, the proxy gives you the flexibility to change the results of queries at the most atomic level possible.
Fine-grained access control
Sometimes, it's impossible to express complex access control requirements using the database's mechanisms. You may need to specify that certain users should get only some access to some very specific data items some of the time, and most databases are simply not very good at that -- it's assumed to be the job of the database clients. Even the databases that do support it (such as Oracle with its fine-grained access control) tend to make it painful and expensive.
In this context, a proxy can implement extremely fine-grained access control for known queries, though it may not be able to do so for arbitrary queries.
Most applications access their database(s) in a relatively predictable manner, therefore you can record the requests during a period of time (the recording phase), and then lock down the system by rejecting any requests that have not been seen before (the enforcement phase). This is easily done with a smart database proxy, with plenty of flexibility to accommodate the inevitable exceptions and idiosyncrasies that are to be expected in any non-trivial IT system.
Getting a reliable, real-time view of what an application is doing in a database can be surprisingly difficult. Some databases offer an interface that gives you visibility into its activities, but they tend to focus on monitoring and performance. A proxy can easily pluck out whatever type of database activity is relevant, and record it or send it wherever needed. The selling points of the proxy are that this can be done without any special access to the database, without any effect on the database or the clients, and for a subset of all database accesses.
Factoring out database access logic
An emerging use of proxies consists of leveraging them consciously in applications, rather than after the fact. If you have many applications that need to access the same database with the same requirements, and you already have a proxy in place, it can make sense to farm out some of the database access logic to the proxy. For instance, specific queries can be marked so that the proxy will recognize and modify them in a way that is consistent across all applications.
In this context, you can think of the proxy as an extension of the database: you are knowingly accessing an extended database, rather than the database itself.
A smart database proxy is a platform that runs your logic against the database traffic, so you can get pretty creative if needed. For instance, you can do some light-weight integration by meshing data from multiple sources into a single result set, or generate test data dynamically, or do on-the-fly encryption and decryption of data. It's always interesting to see what creative people can do when you give them this kind of power.
The risks of database proxies
With great power comes great responsibility. All these capabilities do not always come without a cost.
A database proxy sees everything that goes into, and comes out of a database, therefore it can be a sensitive point. Whoever controls the proxy controls the applications that go through it: it should be made as secure as the most secure application using it.
The additional complexity of a proxy is also something to take into account. It's one more system that must be planned, secured, and administered. In addition, any logic deployed to the proxy must be managed, tested, source-controlled, and so on. It's easy to get started -- a typical query substitution filter is usually no more than a couple of lines of code -- but as the proxy runs more and more logic, you'll need to manage that logic.
Finally, like any other tool, database proxies can be misused.
The most common issue is that it becomes too easy to defer application updates -- just let the proxy take care of it. The ability to modify requests and responses can be a life-saver in many cases, but it can also get out of control. If you find yourself swimming in too many filters, with unreasonable numbers of query rewrites and result set edits, it may be time to consider updating at least some of your apps. But it's up to you to determine when that number becomes unreasonable.
A smart database proxy can be a powerful addition to many IT projects. Once you realize that the connection between database clients and database servers can be opened and exploited, all sorts of interesting possibilities open up.
Most people start by using a database proxy as a point solution, often to fix a specific issue in a specific application, but once the proxy is in place, it can become a great leverage point.
|About the Author:
Max Tardiveau is the founder of Gallium Data.
He has worked in enterprise software for 25 years. He co-founded Espresso Logic in 2012, which was acquired by CA Technologies in 2015.
He lives in Oakland, California.
Share this page