The success of a web application, for most common purposes, is determined by the number of visitors it attracts and retains. Put simply, more loyal visitors == good. There are numerous factors that affect the successful attainment of this business goal, ranging from the technical to marketing. In many cases however (twitter being the most recent famous example) the attainment of the business goal itself becomes a bane. A non scalable system attracting a large number of users could grind to a halt and make it difficult to get it back on its feet. That surely can't be good for business. The Art of Scalability (by Martin L. Abbott, Michael T. Fisher) is probably the best book around on this topic and a lot of our experiences are reflected in it. In this series of posts, however, with the assumption that scalability needs to be a key consideration for any website worth its bytes, we will try and stick to only those processes and techniques that we use at Srijan, that go into making a web application scalable to accommodate increasing traffic.
What do we mean by scaling in this context?
The ability of a web application to serve increasing numbers of visitors with incremental additions to the infrastructure, not requiring major changes in architecture.
Where to Scale? up or out?
Up involves adding resources to a single point of failure. Out involves plugging in resources at multiple points of failure. Up is easier to implement and manage. Out is more fault tolerant purely by virtue of having multiple points of failure. A well configured horizontal scaling architecture will be able to service clients even through a "hardware upgrade". Having said this, we find that scalability is more than throwing hardware at a problem. It starts with designing the application for scalability so that we can approximate how the application will perform under different conditions of load and prepare an infrastructure enhancement plan which can then be put into action as and when the situation demands it. In fact, "What hardware to buy?" should only be the last question on your mind when you are attempting to scale your application.
How do we tackle this scalability puzzle?
We admit that we are some distance away from creating that perfectly scalable application, where a sys-ad can painlessly plugin a node into the cluster just before that spike in traffic, but there are some processes we follow to help us along the way. Essentially, we identify potential points of load and hence failure, starting from the business logic and its translation into a workable design. We then test the capacity of these load points under load. Based on these metrics, we create a detailed infrastructure enhancement plan which can be put into effect depending on the load situation in the live environment. Let's look at each of these activities briefly.
Identifying load points in business logic:
When faced with a business proposal, our design team analyzes the business logic to identify areas which could be potential resource hogs. An example could help illustrate this: Resizing and optimization of images on the fly for a huge magazine site could potentially turn into a resource hog under heavy traffic. A possible solution would be to run the optimization process in a dedicated "thread" to minimize the impact on other sections of the site. Once this process has been forked out, it becomes possible to add resources to support this specific activity under conditions of increased traffic, thereby preventing it from slowing down the entire site. Identifying and formally listing the potential points of application failure at this early stage helps in developing an application architecture with scalability at the back of our mind at all times.
Creating application architecture that lends itself to scaling
At the very minimum, we ensure that the application can be run on multiple boxes with adequate "session" handling. In addition, for applications that could attract heavy traffic, the application architecture needs to taken into consideration the potential load points and allow flexibility in provisioning resources at these points. This predicates the need to be able to split an application into distinct streams (swimlanes as Abbott and Fisher call them).
Scalable data storage
Making data access reliably scalable is probably the most tricky part of this puzzle. While we use "standard" techniques to improve data access, such as caching using memcache (and APC) and using master-slave databases for write and read operations, there are a lot of cool concepts and technologies that break old worldviews and make you look at data in whole new ways. Some of the techniques that we have been experimenting with are de-normalization, sharding, distributed databases, distributed key-value storage, isolating and storing static files on a cluster and using content delivery networks. Our learning and tools based on the use of shards and distributed key-value stores such as Tokyo Tyrant is always based on internal research work, and we incorporate these into our projects at the first available opportunity.
Horizontally scalable hosting infrastructure
At Srijan, our preferred architecture for hosting large traffic web applications is a cluster of servers that are horizontally scalable. The diagram below illustrates the architecture that we use to host some of our most important sites. Essentially, we separate the web application server, static content and the database. Each of these can then be scaled horizontally by adding nodes behind a load balancer. In the next post in this series, we will look at how we do load testing, capacity planning and resource provisioning. We would love to hear your thoughts and ideas on web application scalability and your experiences in trying to achieve it.