Connections are the hidden mechanism using which computer systems talk to each other—and they've become so fundamental that we overlook how important they are, how they work, and when they fail. We're often ignorant of them until there's a problem, which usually shows up a massive failure when our systems are doing their most amount of work. But because they're present everywhere and are so important in pretty much every system, they're worth spending a little time understanding.
What are connections?
Connections are a link between two systems that allows them to exchange information as a sequence of zeroes and ones—to send and receive bytes.
Depending on where the systems are running relative to each other, a combination of underlying software and hardware will work hard to handle the physical movement of information, which abstracts it away. For example, if the communicating systems are two Unix processes, the IPC system will handle allocating memory for the data exchanged and will handle pick-up and delivery of the bytes on both sides. If the systems are running on different computers, they will likely communicate over TCP, which will handle moving the data over a wired or wireless system between the computers. The details of how computers work together to reliably handle, transmit and receive the data is more a standardization problem, and most systems use the building blocks provided by the UDP and TCP protocols. How these connections are handled at each end is a more relevant problem for application development, which is what we'll look at now.
Where do we use connections?
But the server also made connections of its own to give you this page. It used a connection to speak to a database, sending over the URL of this page inside a query and receiving the contents of the page in return. In this scenario, the application server was the client and the database server was the server. The application server might have also made connections to other third-party services, like a subscription or payment service, or a location service.
For static files, like the JS, CSS and images, there's a CDN system in between your browser and the blog server. A connection was made from your browser (client) to the CDN server (server) that's closest to you, and if the files weren't available in the cache near you there would have been another connection from the CDN server (client) to the blog server (server).
If you think carefully about all the systems you use or build, you'll see connections all over the place—but they're often hidden from view, and not understanding their invisible presence and limits will come back to bite you when (and where) you least expect it.
Why is connection handling important?
Understanding how connections are handled is important because the cost of connections is asymmetric—the cost is different on the client and server. In a peer to peer (P2P) system, like a torrent cloud, this is false, and connections have the same cost at both ends—but this is rarely the case. The common uses of connections have a client and a server, and the cost to the server is different from the cost to the client.
Before we look at how connections can be handled, we need to quickly review the different ways in which computers run programs and how programs do work in parallel. When you run a program, the operating system runs your code as one instance of a process. A process occupies one CPU core and some memory when it's running, and does not share its memory with any other processes. The process can start threads, which are like children of the process that can run concurrently. Threads share memory with the process that spawned them, and might allocate more memory for their use. Or the process might use an event loop, which is a single-processes system that keeps track of tasks it has to do and loops over all its tasks continuously and infinitely, each time doing the tasks that it can, or skipping them if they're blocked. Or the process might use internal constructs called fibers, green-threads, coroutines, or actors—each of these work a little differently and having varying costs—but they're all managed internally by the process and its threads.
Coming back to how connections are handled, let's look at database connections first—from your application server (the client in this case), you see a TCP connection paid for with a small memory buffer and one port allocation. On the server side, if you're using PostgreSQL, each connection is handled at the server by spawning a new process that handles all the queries being sent over that connection. This process occupies a CPU core, and about 10MB or more of memory in RAM. MySQL handles each connection by spawning a thread inside of a process. The RAM/memory requirements are much lower in a threaded model, but it comes at the cost of context switching those threads. Redis handles each connection as an iteration in an event loop, which makes the resource costs low—but a price is paid in having to loop over every connection's queries and serving them strictly one at a time.
Consider a request to an application server. Your browser initiates a TCP connection as a client, which is cheap (small memory buffer and one port). On the server, the story is different. If the server is using Ruby on Rails, each connection is handled by one thread spawned inside a fixed number of running processes (the Puma web server) or by one process (Unicorn). If it's using PHP, the CGI systems start a new PHP process for each connection, and the more popular FastCGI systems keep a few of the processes running to make handling the next connection faster. If you're using Go, one goroutine (a cheap and light thread-like structure, managed & scheduled internally by the Go runtime) will be spawned to handle each connection. If you're using NodeJS/Deno, the incoming connections are handled in an event loop by iterating over them and responding to requests one at a time. In systems like Erlang/Elixir, each connection will be handled by an actor, which is another internally scheduled lightweight thread-like construct.
Connection Handling Architectures
The examples of how connections are handled have a few common strategies which we can identify:
Processes: Each connection is handled by a separate process, either started exclusively for the connection (CGI, PostgreSQL), or maintained as part of a group of available processes (Unicorn, FastCGI).
Threads: Each connection is handled by a separate thread, either spawned exclusively for the connection or held in reserve after spawning. The threads might be spread over multiple processes, but all threads are equivalent (Puma/Ruby, Tomcat/Java, MySQL).
Event Loop: Each connection is an task in the event loop, and connections that have data to be read are processed by iterating over them (Node, Redis). These systems are normally single-process and single-threaded, but they may sometimes be multi-process, where each process acts as a semi-independent system with separate event loops.
Coroutines / Green-Threads / Fibers / Actors: Each connection is handled by a lightweight construct whose scheduling is managed internally (Go, Erlang, Scala/Akka).
Knowing how your server is handling connections is crucial to understanding what its limits and scaling patterns are. Even basic usage or configuration requires knowledge of how the connection handling works: Redis and PostgreSQL, for example, offer different transaction & locking semantics that are influenced by their respective connection handling mechanisms. Process & thread based servers can crash because of resource exhaustion if their max counts are not set to a reasonable limit, and when limits are set they might be horribly under-utilized because the limits are too low. Event-loop based systems may not benefit at all from running on 64-core CPUs, unless 64 copies of them are configured to run simultaneously—which works great for web servers but not very often with databases.
Each of these ways of handling connections perform differently when used in application servers and databases, because of the distributed or centralized nature of each system. Application servers, for instance, tend to be horizontally scalable—they work correctly and in the same way whether you have 1 server or 10 or 10,000. In these cases, Moving away from the process / thread model tends to result in higher performance, because we want as much work to be done with minimal memory usage and CPU context switching. Event loops like Node work great on single core servers, but need to be clustered correctly to use multi-core severs. Coroutine / actor based systems like Go or Erlang will utilise every core of the CPU much easier because they're designed to work that way, with many thousands of goroutines or actors running simultaneously on a single machine.
Centralized databases, on the other hand, benefit more from process / thread / event loop handling, because based on the transactional guarantees of the system we don't want multiple connections operating on the same data at the same time. The operations happening on multiple connections will have to lock during the transaction-sensitive parts of their work, or use other strategies like MVCC, and the fewer possible connection handlers there are the better. These systems support a few connections on a single machine. On large server, PostgreSQL might manage a few hundred connections, MySQL might handle a couple thousand. Redis can handle the highest number of connections (maybe tens of thousands) because it manages to keep data consistent using an event loop—but this means that only one operation can happen at a time, so it's not a free lunch.
Distributed databases can and will attempt to move away from the process and thread based model: because they're spreading data across multiple machines, they usually give up locking, embrace partitioning, and design for high connection volume across a large number of servers. AWS DynamoDB or Google Datastore for instance, or many distributed databases written in Go, will happily accept millions or billions of simultaneous connections. These decisions have consequences, though—they sacrifice a lot of the operations (joins, ad-hoc queries) and consistency guarantees provided by centralized / single-server databases. In return for this sacrifice, they can handle connections in a partitioned, horizontally scalable, practically unlimited way, allowing them to choose a design that supports many connections across many machines. This makes connections a non-issue here—each individual sever must worry about its connection handling, but in aggregate, across thousands or millions of machines with smart connection routing, these systems often behave like they're infinitely scalable.
[What's | Why] a Pool?
We need to be efficient and frugal with expensive connections. It often isn't easy to realize how expensive they are because of the asymmetry: from the client's point of view they look cheap—it's usually the server that has a problem with too many connections.
When a client performs work, it does not have the luxury of being able to use one connection for one piece of work. An application server, for instance, accesses a database as a client over a connection. If it were to initiate a new connection for each request, it becomes artificially limited in its own capacity by the connection capacity of the database that it's connecting to. There are a few cases where this usage is perfectly efficient—that might happen if the application server is purely a proxy for the database. In reality, the application servers are doing many other things: waiting for the request data to come in to the server, parsing the request, formulating a query, sending the query over the database connection, waiting for the results, parsing the results from the database connection protocol, reading or manipulating the result data, transforming the output into HTML/JSON/RPC formats, making network requests other services and more. The connection remains idle for most of this time, which means an expensive resource is not being used efficiently. And all this doesn't even take into consideration the setup (process start, authentication) and shutdown costs of the connection on the server side.
To increase the efficiency of connection usages, many database clients will use what's called a connection pool. A pool is an object that maintains a set of connections internally, without allowing direct access or usage. These connections are given out by the pool when communication with the database needs to happen, and returned to the pool when the communication is finished. The pool may be initialized with a configured number of connections, or it may be filled up lazily on-demand. The ideal usage of a connection pool would be that code requests a connection from the pool (called a checkout) just when it needs to use it, uses it, and puts it back in the pool (a release) immediately. This way, the code is not holding on to the connection while all the other non-connection related work is being done, greatly increasing efficiency. This allows many pieces of work to be done using one or a few connections. If all the connections in a pool are in use when a new checkout is requested, the requester will usually be made to wait (will block) until a connection has been released.
Different languages and frameworks will handle pooling differently. Ruby on Rails, for instance, handles checking out and release connections back into the pool automatically—but not understanding this leads to inefficient code. If you make a database request, followed by a long network request to another service, and another database request, the connection would have been held idle while the network request was happening (Rails auto-management needs to be conservative and cautious, therefore inefficient). Go has a standard library database driver that automatically uses a connection pool, but not realizing that connections are being released into the pool between database calls leads to surprising and hard-to-reproduce bugs. Developers sometimes assume that sequential operations in the same request will run on the same connection, but the automatic manager will swap connections out under you (Postgres advisory locks heisenbug spectacularly in Go).
Transactions compound this problem: databases often base transaction functionality on a connection (sometimes called a session). If you start a transaction, you can commit it or roll it back only on the same connection that you started it on. Automatic pool management needs to be careful not to release connections while a transaction is in progress. Depending on the database, other functions like locks and prepared statements might also have connection affinity.
So if we want to write efficient code, we need to know how connection pooling is happening in our framework, how much management is happening automatically, and when this management doesn't work or is counter-productive. One common tool that helps take this off your mind is a pooling proxy, like pgBouncer, Odyssey or AWS RDS Proxy. These systems allow you to make as many database connections as you want without worrying about management, because the connections they give you are fake & cheap simulated connections that they handle with low overhead. When you attempt to use one of these simulated connections, they pull out a real connection from an internal pool and map your fake connection on to the real one. After the proxy sees that you've finished using the connection, it keeps your fake connection open, but aggressively releases and re-uses the real connection. The connection counts and release-aggressiveness settings are configurable and help you tune for the gotchas like transactions, prepared statements, and locks.
Whether you choose to understand and manage connections efficiently in your code, or use a tool like pgBouncer, is ultimately a productivity and deployment complexity choice. Either option can work based on how much code you want to write, how easy connection management is in your chosen language, and how much efficiency you need in your project.
Pooling isn't limited to database clients. We've been referring to connections as cheap on the client side, but they are not zero cost. They do use memory and ports and file descriptors on the client, which are not infinite resources. For this reason many languages / libraries will have a connection pool for HTTP connections made to the same server, and will also use pooling for other scarce resources. These pools tend to be hidden out of sight until a system runs out of the scarce resource, at which point it usually crashes. Knowing this helps a lot with debugging problems—besides connections, the other usual suspects are file descriptors.
Configuring Common Pools
Now that we've seen how connections are commonly handled, we can talk about a few different application server + database combinations, and reason about how to maximize the number of requests we can handle on the application server while minimizing the database connection count. While not every combination is covered here, most systems will have the same characteristics as one of the examples we cover—so knowing how these systems work will help you understand your own system even if it's not covered here. If you want me to add more combinations or systems, do let me know.
Process & Thread Based Handling
Puma, a popular application server, runs Ruby applications with two kinds of pooling for the handlers of incoming HTTP requests. The first lever is the number of processes to start, represented by the
workers configuration directive. Each process of the server is distinct and loads the full application stack independently into memory—so if your application occupies N MB or RAM, you'll need to make sure you have at least
workers * N MB of RAM to run that many copies of it. There is a way to alleviate this: Ruby 2+ supports the copy-on-write feature, which allows multiple processes to start as a single process and fork into multiple processes without necessarily copying all the memory—a lot of common memory areas will be shared until they're modified in some way. Activating copy-on-write using the
preload_app! directive might help you use less memory than the full multiple of the application size and the number of workers—but don't count on it too much without testing how much of an advantage it gives you under sustained load.
Purely process based servers like Unicorn stop at this level of configuration, as do the popular servers for Python, PHP and other languages that use a global lock or assume single-threaded single-process execution. Each process can handle one request at a time, but this doesn't guarantee full utilization—if the process is waiting on a database query or a network request to another service, it's not going to pick up a new request, and the CPU core that it's on remains idle. To counter this wastage, you might start more processes than you have CPU cores (which results in context switching costs) or use threads.
Which brings us to the second lever Puma gives you—the number of threads to run in each of the processes / workers you've configured. Using the
threads directive allows you configure the minimum and maximum number of threads in each worker's thread pool. Using these two directives allows you to control the total number of threads that will act as simultaneous request handlers for you application—this is simply the number of workers multiplied by the number of threads.
A rule of thumb would be to place one worker for each CPU core you have available—assuming, of course, that you have enough memory to do so. This utilizes your RAM effectively, so you can plan how much RAM you need based on a few tests with this number. Now we want to fully utilize the CPU—and we do this by increasing the max thread count. You'll remember that threads share memory with their process, so they don't contribute much towards raising your RAM requirements—instead more threads will further utilize your CPU while allowing your to handle more requests simultaneously. This is helpful because when one thread is sleeping because it's waiting on a database query or network request, another thread from the same process can quickly be switched into the CPU core to do its work. But remember that many threads will also cause contention on the process locks, so you will discover a limit to how many threads you can add while still seeing a meaningful increase in performance.
How does all this configuration affect the number of database connections, though? Rails uses automatic connection management on the database, so each thread that you run will need its own database connection to function effectively, without waiting on others. It maintains a connection pool configured in the
database.yml file, and this configuration applies at a process / worker level. So if the default value of
5 is left as-is, Rails will maintain a maximum of 5 connections per worker. This won't work very well if you're changing the max thread count—your many threads will all fight over the 5 connections in the pool. A rule of thumb would be to configure the
pool count to be equal to the max
threads count, as noted in the Heroku Puma deployment guide.
Now this raises a new problem—having
workers * threads database connections is great for application server performance, but will wreak havoc on a database like PostgreSQL and sometimes on MySQL. Depending on how much RAM (in the case of Postgres) and how much CPU (in the case of MySQL) you have, this configuration might not work for you. You might decide to reduce the
pool or max
threads value to make sure you have fewer connections, or the
workers value, or both. Depending on your application, all are likely to have the same effect—if every request requires database queries, the number of database connections you have is the ultimate bottleneck for the number of requests you can handle. But if some requests can function without database access, you can get away with keeping the
threads count high, and the
pool number relatively low—that way many threads will service your requests, but only a smaller subset will actually fight over the database connections in the pool as and when they need them. You can also to this if you take over connection management in your code and make sure to checkout and release connections efficiently—this is especially important if you're making network requests between database calls.
If you find that it's difficult to properly mange connections or tune these numbers, and your application server is being artificially limited by the database connection limits, you can then reach for a tool like pgBouncer, Odyssey, AWS RDS Proxy. Setting up a pooling proxy will allow you to set your pool size to be equal to the max thread count, and be confident that the proxy will make things efficient for you.
When it comes to databases, PostgreSQL uses process based handlers, so you'd want to be relatively stringent about connection counts when using it. MySQL uses threads, so you could get away with making more connections to it—although this can result in performance drops because of context switching and locking.
Event-Loop Based Handling
Node / Deno is the first event-loop based server we're looking at. This implies that starting a server with the default configurations will very effectively use the one CPU core that it's running on, but will basically ignore the others. The internal subsystems and libraries may use the other cores, but for now we're more interested in directly using them—and the way we do this is clustering. Clustering is achieved by starting one process that accepts all incoming connections, which then acts as proxy and distributes the connections over other processes running on the same machine. Node has a standard library clustering module, and popular servers like PM2 will use it effectively for you. The rule of thumb would be to run as many processes as there are CPU cores available, assuming there's enough memory, of course.
Stripe also published the interesting Einhorn project, which is a connection manager that exists outside of the stack you're writing code in. It starts its own process that accepts connections, and distributes them to instances of your application that it starts and manages as child processes. A tool like this is very useful in even loop based systems, which will make sure they fully utilize one CPU core given the chance—but it's not as useful by itself with Ruby/Python, because while it would allow you to have multiple processes, the lack of threads mean that each processes would only be able to service one request at a time.
The clustered event loop approach is also used by systems that modify the default behaviour of a normally process based language. The Tornado server for Python, for example, converts Python request handling into an event-loop based system, also referred to as non-blocking I/O. It can also be configured to cluster itself on all available CPU cores, assuming enough memory.
A similar approach is also used in the Falcon web-server for Ruby. Newer versions of Ruby a kind of green-thread called Fibers, and Falcon handles each incoming request with one Ruby Fiber. Fibers do not automatically spread themselves across all CPU cores, so Falcon starts a copy of your application in every available CPU core—again assuming there's enough memory.
In all these cases, you'd want to configure your connection pools in your database adapters to limit the number of connections each process can make—or you might use a pooling proxy if your application servers are being limited by the database connection limits, or if managing connection checkout and release is getting difficult.
Redis handles connections with an event loop. This means that it can hold as many connections as its port, file descriptor, and memory resources allow—but it handles each operation from each connection one at a time.
Internally Managed / Custom Handling
The poster child for this kind of handling is Go, which unlike all the other examples is completely unconstrained in the way it handles requests—it will unleash itself on your CPU and RAM with no restrictions. Every incoming request is handled by a new goroutine, a lightweight thread-like construct that the Go runtime internally manages and schedules much more efficiently than threads or processes. Go will also automatically spread its goroutines across all the CPU cores you have, although you can rein it in a bit with the
runtime.GOMAXPROCS setting—and because this happens as part of the runtime, there is no memory copying happening. Go runs across all your CPU cores without having to start a new copy of your application on each of them.
Because Go is automatically geared for tens of thousands of simultaneous requests even on very small servers, pairing it with a process-based database like PosgreSQL is often like driving a race-car into a brick wall. If each goroutine is using the standard library SQL package, it will create as many connections as there are goroutines—because the default pool size is unlimited. The first thing you'd want to do on any Go application using a SQL DB is to configure the connection limits using
SetMaxOpenConns and the associated options. Go uses an internal pool on the
sql package, so each query you run will checkout a connection from the pool, use it and release it back immediately. This means that if you want to execute a transaction, you must use the special methods that give you an object that wraps the connection you started the transaction on, which is the only way you can later commit or rollback that transaction. This is also every important when using other functions that are connection specific, like prepared statements or advisory locks.
This automatic approach an unexpected benefit: it removes the need for pooling proxies, because the connections are already managed very aggressively. The database calls you make have efficient connection management by default, and when you do take over a connection it's just as efficient as using a pooling proxy anyway. Systems that work this way default to doing the efficient thing, at the cost of you having to learn about how to handle the edge cases where you do want manual control. Besides the usual prepared statements and advisory locks gotchas, manually managing connections this also has potential deadlock problems. If a max connection count is configured and some requests try to checkout more than one connection simultaneously to do their work, there's a chance they'll get stuck forever waiting for each other to release connections. The pool sizing notes on HikariCP have formulae that help address problems like this.
Other VM based languages, like Java, Scala/Akka, Clojure, Kotlin (all on the JVM) and Elixir/Erlang (on the BEAM VM) will function similarly, where using all available cores on a CPU will be possible without starting a new copy of the application on each one. Each specific system or database connection library will usually have slightly different implementations, but they should be understandable using one or more of the concepts described here.
If you'd me to add examples of how any other specific system works, have any other questions, or would like to point out a mistake or disagreement, you can reach me me at @sudhirj.