Saving Session Data in Web Applications

There are many ways to store session data in web applications. They all differ in scalability, failover capabilities, and complexity. I'll give you a quick rundown on the major themes.

Session Data on the Client

You can often implement simple personalization features or workflows by storing state on the client. From a scalability point of view, it doesn't get any better than that. Not having to keep state on your servers saves you from a lot of trouble.

Cookies are a well-understood mechanism that can even be used in a client-only manner using JavaScript. They are useful for small portions of data that aren't security relevant. Informal polls or simple preferences like language selections are often implemented via cookies. Cookies also play a major supporting role in user tracking and identification, but that's a different story.

Some frameworks save workflow state in hidden form fields (see Apache MyFaces). That data can be encrypted by the server to prevent tampering.

Session Data on Your (Web-)Servers

For many developers, saving sessions on the front server (which is often a web server) is the most natural choice. That's what Java Servlet implementations usually do.

As soon as you need more than one front server, things can become quite tricky. Your environment may support some kind of clustering that provides session replication and failover. No matter which front server gets a request, the session data is already there or will be requested on demand. Depending on the implementation, a failing front server doesn't necessarily mean a loss of session data. Apache Tomcat has this feature, as do commercial products. As powerful as this may sound, the approach comes with a rather large price in complexity.

Fortunately, there's an easier way: sticky sessions. Your loadbalancer (a HTTP reverse proxy) assigns a cookie to each client on its first request. The cookie determines which front server subsequent client requests are directed to. That means, each client is always directed to the same front server, so session data for a client is only needed on one front server. As a result, you don't need replication anymore.

The sticky sessions approach is easy to use but it has two disadvantages: First, if one server crashes, you will lose part of your session data. And second, load balancing may no longer be optimal because it is done on a session basis and no longer on a request basis.

In the Java EE world, session state could also be stored inside the application server (as opposed to the Servlet container). For example, the Seam framework stored conversational data using stateful session beans at some point. I don't know if this is still common practice.

Session Data Inside Your Database

Saving session state inside the database is common in lightweight web frameworks like Django. That way you can add as many front servers as you like without having to worry about session replication and other difficult stuff. You don't tie yourself to a certain web server and you get persistence and all other features databases provide for free. As far as I can tell, this works rather nicely for small to medium size websites.

The problem is the usual: The database server may become your bottleneck. In that case your best bet may be to take a suitcase full of money to Oracle or IBM and buy yourself a database cluster.

Using a Dedicated Session Store

Sometimes, especially for high traffic web sites, it makes sense to store session data on a dedicated server or cluster. This takes complexity out of your web servers at the cost of an increased overall system complexity. No matter what the SOA guys say, distributing your data usually won't make things any easier.

If, for example, you're currently storing sessions inside your application's database, you could move the session store part to its own database. That's about as simple as it gets.

I'm not aware of any off the shelf products, but since storing session data is typically not the most difficult problem, you could try to roll your own based on a caching product (memcached comes to mind, as do at least a dozen Java caching solutions).

If you're particularly ambitious (traffic-wise), using a data grid like Oracle Coherence is a solution to consider. They come with everything you need to implement a high-performance distributed session store. And then some.

Conclusion

Use cookies for small pieces of possibly long-term data where security is not an issue. Cookies can also complement other approaches.

Otherwise, use whatever your web development framework offers. Sticky sessions are a powerful but simple solution when you're starting to scale out your system. I wouldn't turn to session replication if I could get away without it.

If things get rough, a dedicated session server or cluster may be your last chance. But then you shouldn't be needing my advice anyway.

For further information see Martin Fowler's Patterns of Enterprise Application Architecture. It has a short section on session state patterns that covers client, server, and database session state.

social