Gearman is a system designed to distribute processing jobs to other machines or processes, therefore allowing you to add new computational power whenever your application grows. This concept is known as horizontal scaling. Gearman provides redundancy, failover and allows asynchronous processing. High resource consumption processes are divided to multiple servers.
About Gearman
Gearman offers the architecture to signal processes that there is work available so they can grab a task for themselves. Gearman just by itself does not provide much value. Its purpose is solely to dispatch jobs that need be executed.
To be clear, Gearman is not the answer to all problems and for most applications probably would be useless. In the context of a simple website or platform with a few hundred or thousand visitors per day a hosting server would do just fine as long as all it has to do is take some data from persistence layer and present it to the users.
A good Gearman use case is when things start to get slow and scaling vertically (renting or buying a bigger server) is not an option due to high costs. Using two (or more) small servers to handle your application load would be a more viable alternative.
When to scale horizontally
Now let’s take a look at an example where Gearman may actually be used successfully.
Let’s imagine an application where one can upload images and all their friends receive a notification email along with it. Of course, for each uploaded image, you must take steps to ensure a good image rendering on various devices. The flow may be like this: upload, resize, make a thumbnail and send email. At first sight, this is a straightforward plan and would probably work with a limited amount of users.
Now let’s pretend that the application is met with great enthusiasm and is booming. People will start complaining in no time about the time the resizing process takes. E-mails not being received and image uploads are failing due to the number of concurrent uploads and exhaustion of resources.
The next thing you may do, is to try to approach things differently by making the resizing and email tasks asynchronous and implementing some sort of failover. One way to handle this is to store the resizing and e-mail tasks in a persistence layer (database, Redis) and have a daemon periodically check for new tasks. The daemon will try to execute them and in case of a failure will mark the task accordingly and retry later.
This looks good, but what happens when one server is not enough? It is well known that image processing is CPU intensive and large images will surely keep your CPU at 100% all the time. We reached a point where scaling vertically is no longer feasible. Therefore let’s think about scaling image processing and email sending tasks on multiple servers.
Horizontal scaling techniques
At this point, the database solution is not very promising, as we have to handle multiple daemons pooling the storage. Additionally consider that when an application needs more than one server, another real problem emerges: servers and routers can explode, electricity can go down, someone can execute “sudo rm -Rf /” by mistake or an engineer can stumble upon an ethernet cable. The point being that when an application uses multiple servers you must think what happens when a server goes down or the connection is lost. Nobody can afford an application with 20 servers to go down when one server is lost. The bigger the number of servers, the higher the possibility that something like this may happen. And we all know we cannot afford downtime as users will most likely get upset and may stop using your application, causing revenue loss.
Multiple solutions exist to scale with failover in mind. One of them could be having two types of daemons. One that does the actual work and a manager daemon that receives tasks and delegates them to worker daemons, manages connections and deals with down workers. Sure, you could implement a solution like this, but there is one thing that could backfire: bugs! Even if you are a most-skilled engineer, everybody knows that bugs appear all the time and they can be found only through extensive code reviews and testing, costing money and time. One thing that is best learnt, is that it is very hard to test a distributed system – especially the failover cases.
This is where the beauty of Gearman lies as it does precisely this for you: offers a daemon that manages workers on different servers and handles connection problems. And the best part about it is that it is tested on the “battlefield”. You can sleep well at night knowing that you will not receive an alert because your worker manager crashed under high traffic.
The next part of this article will detail how to integrate Gearman with the used sample application and how to handle the failure cases.