Simplicity: Unicorn is a traditional
UNIX prefork web server. No threads are used at all, this makes
applications easier to debug and fix. When your application goes awry, a
BOFH can just "kill -9" the runaway worker process without
worrying about tearing all clients down, just one. Only UNIX-like systems
supporting fork() and file descriptor inheritance are supported.
The Ragel+C HTTP parser is taken from Mongrel. This is the only non-Ruby
part and there are no plans to add any more non-Ruby components.
All HTTP parsing and I/O is done much like Mongrel:
1. read/parse HTTP request headers in full
2. call Rack application
3. write HTTP response back to the client
Like Mongrel, neither keepalive nor pipelining are supported. These
aren‘t needed since Unicorn is
only designed to serve fast, low-latency clients directly. Do one thing, do
it well; let nginx handle slow clients.
Configuration is purely in Ruby and eval(). Ruby is less ambiguous than
YAML and lets lambdas for before_fork/after_fork/before_exec hooks be
defined inline. An optional, separate config_file may be used to modify
supported configuration changes (and also gives you plenty of rope if you
RTFS :>)
One master process spawns and reaps worker processes. The Rack application
itself is called only within the worker process (but can be loaded within
the master). A copy-on-write friendly garbage collector like the one found
in Ruby 2.0.0dev or Ruby Enterprise Edition can be used to minimize memory
usage along with the "preload_app true" directive (see Unicorn::Configurator).
The number of worker processes should be scaled to the number of CPUs,
memory or even spindles you have. If you have an existing Mongrel cluster
on a single-threaded app, using the same amount of processes should work.
Let a full-HTTP-request-buffering reverse proxy like nginx manage
concurrency to thousands of slow clients for you. Unicorn scaling should only be concerned
about limits of your backend system(s).
Load balancing between worker processes is done by the OS kernel. All
workers share a common set of listener sockets and does non-blocking
accept() on them. The kernel will decide which worker process to give a
socket to and workers will sleep if there is nothing to accept().
Since non-blocking accept() is used, there can be a thundering herd when an
occasional client connects when application *is not busy*. The thundering
herd problem should not affect applications that are running all the time
since worker processes will only select()/accept() outside of the
application dispatch.
Additionally, thundering herds are much smaller than with configurations
using existing prefork servers. Process counts should only be scaled to
backend resources, never to the number of expected clients like is
typical with blocking prefork servers. So while we‘ve seen instances
of popular prefork servers configured to run many hundreds of worker
processes, Unicorn deployments are
typically only 2-4 processes per-core.
On-demand scaling of worker processes never happens automatically. Again,
Unicorn is concerned about scaling to
backend limits and should never configured in a fashion where it could be
waiting on slow clients. For extremely rare circumstances, we provide TTIN
and TTOU signal handlers to increment/decrement your process counts without
reloading. Think of it as driving a car with manual transmission: you have
a lot more control if you know what you‘re doing.
Blocking I/O is used for clients. This allows a simpler code path to be
followed within the Ruby interpreter and fewer syscalls. Applications that
use threads continue to work if Unicorn is only serving LAN or localhost
clients.
SIGKILL is used to terminate the timed-out workers from misbehaving apps as
reliably as possible on a UNIX system. The default timeout is a generous 60
seconds (same default as in Mongrel).
The poor performance of select() on large FD sets is avoided as few file
descriptors are used in each worker. There should be no gain from moving to
highly scalable but unportable event notification solutions for watching
few file descriptors.
If the master process dies unexpectedly for any reason, workers will notice
within :timeout/2 seconds and follow the master to its death.
There is never any explicit real-time dependency or communication between
the worker processes nor to the master process. Synchronization is handled
entirely by the OS kernel and shared resources are never accessed by the
worker when it is servicing a client.