As I’m learning MySQL and ProxySQL at my job, I’ve come across what appear to be two approaches to database proxying. I’d like to organize my understanding by drawing comparisons with PostgreSQL and its proxy solutions.
Underlying Differences
MySQL uses one thread per connection/client. PostgreSQL uses one process per connection/process. Choosing between the two could boil down to these revealing characteristics, as they indicate preferences around performance/resource usage and fault isolation levels. But to be clear, there are other foundational differences.
Transparency.
This approach focuses on being invisible, it doesn’t alter or interfere with database operations. The proxy is transparent and sits between the application and databases. If removed, everything would work the same exact way, with different connection characteristics.
e.g. pgBouncer. pgBouncer maintains a client(incoming connection) and server (actual postgreSQL connection) pool. It is protocol aware and checks transaction boundaries. It receives incoming requests from the client end, acquires an available database connection from the server pool, forwards the query, has postgreSQL perform, and then streams the response back to the client. It has the following pooling mode options: session(long-lived), transaction or statement(releases connection after each statement). How connection state is managed is maybe a TIL for another day.
Pros & Cons: Reliability and compatibility are advantages, but optimization options are limited. Do one thing and do it well.
Intelligence.
Here, the proxy is an active participant in query processing, has knowledge of the protocol and manipulates data flow. The focus is on feature richness and implementing it as a valued architectural component.
e.g. ProxySQL. In contrast, ProxySQL utilizes multiple workers in event-driven ways with queues which can potentially scale thousands of connections per thread. It has a frontend and backend, similarly to pgBouncer. Connection pool sizes are configurable. Things like connection warmimg, connection recycling and connection reuse based on connection usage are supported features. ProxySQL has deeper MySQL protocol awareness and parses prepared statements, multi statements, and stored procedures. It has the ability to rewrite queries on the fly and cache at the query level. Because it is threaded, connection state is easily persisted across connections. ProxySQL is more than pooling, it can also load balance (with failover) using query types. ProxySQL can be dynamically reconfigured, meaning no server restarts.
Pros & Cons: Powerful, but greatly increases complexity and risk as an extra POF.
Unfair comparison.
After comparing the two further, they’re almost different animals altogether with the only shared ability being connection handling. I think users might conflate the two types and label them all as proxies, because they do sit between the database and applications- not incorrect. With a slightly better understanding, I’ll now regard pgBouncer as a pooler to distinguish. Also, cool