How to replace session state in a scalable environment?

In all the projects I have worked on for various clients, we had to load, cache and make use of a user's profile data in the application. How to do that in a very efficient and scalable way, is something I care for a lot. Here are my thoughts and ideas about that topic.

Where to get user data from?

Before we can do anything with a customer's profile data, the data need to be loaded and cached.

Which user data to load depends entirely on the type of website. It could be an e-commerce site, a self-service portal, a business application or something else. But here are some ideas: Basic data: names, address, email address, birthdate, gender, credit score, marketing permissions, date of registration etc.

  • Order history and invoices
  • Product interest tags
  • Support inquiries
  • Loyalty club points
  • Visited pages or product

How to load the data is beyond the scope of this post, as it differs for all of the systems and environment out there. But your site could be loading data using e.g. REST, SOAP or a direct database connection. And your backend systems could be something custom built, a CRM system like Salesforce, Dynamics, SAP etc. So this is totally up to you.

For demonstration purposes my sample code will just return static mock data.

Where to store the user data?

Since we would almost never reload customer data from any backend system on every page load, the loaded data should be stored temporarily somewhere closer to our application.

Session state

Traditionally we would use ASP.Net session state for that, since it is easy to work with. However, as soon as session state is turned on (or not turned off), all requests to the application will in fact be served sequentially. Even the static files, such as images, scripts, stylesheets etc. For performance reasons, I personally do not find that acceptable.

Another thing is that by default this session state is configured to use in-process session state, which is by definition internal to the specific worker process on the specific web server. So if I were to scale out (more servers) or deploy to a cloud service I would need to configure the load balancer to use sticky sessions to ensure that every single request related to a given session is handled by the same server that first got this session. This means that I can not completely utilize the sum of all the server. Some of them might be very busy, some of them might not be.

I could change the configuration and use SQL Server to make the session state available for sharing between multiple servers. However this method is does not perform good, Microsoft says. And, although we could improve performance with In-Memory tables, this session state provider would still be inserting and updating objects serialized into blobs in an OLTP table, highly optimized or not.

So what else can be done?

Redis

I propose using Redis for caching all the loaded data.

Redis is an open source NoSQL database server, available for Linux, Microsoft and in Microsoft's Azure cloud. It can scale really well, from tiny Raspberry Pi's to large server clusters. StackExchange is one of the proud users of Redis, with two instances performing billions of operations per day (StackExchange's own statistics).

Redis can store almost every kind of data in-memory in a very efficient way, so both data reads and writes are blazingly fast. This is something we can use for a lot of cases, for instance caching or data queuing.

So with an instance running on a dedicated server, preferably even a cluster of servers, all our webservers can use Redis as a shared cache. This way a webserver can load data from some kind of backend, cache it on Redis, and have it available for all webservers in the server cluster.

Microsoft actually released a session state provider for Redis, making it somewhat easy to replace the default in-process session state. It works and it performs really good. However requests to the web servers are still sequentially served, even across multiple servers. This is because the provider still implements locks to comply with the concept of a session state.

Looking at how the session data gets stored in Redis, the provider adds three keys per session:

  • Data
  • Internal
  • Write Lock

The data key contains all the session items, represented as one hash key-value per item. These are by default serialized to strings using the built-in .Net BinaryFormatter, which could be replaced with a custom formatter e.g. a JSON serializer.

The internal key contains one hash item, SessionTimeout. In this the session timeout (in seconds) are stored to be able to refresh the sliding expiration.

The write lock key represents a string value for temporarily storing a lock id, known by the thread currently locking the session state. As soon as the request that locks the state is ended, this key is deleted.

That is actually a way to use Redis as a shared dictionary, and on each request to a webserver all the session state items are loaded and deserialized on the webserver. We do not really exploit the really cool and high-performing features Redis offers. This could be retrieving specific entries, list paging, retrieving numeric or alphabetically ranges of data and much more. With this solution we would have to load everything to the webserver and let it perform these operations on the full sets of data.

Store data in Redis

If we do disable and forget about session state, how can we then use Redis more specifically?

Lets assume that the data model, I get from loading the user profile from a CRM system, is a large object with a few lists of other objects. That will not fit directly into the Redis database and will need some transformation in either of two ways:

  • Serialize it all into a JSON string
  • Transform it into multiple key-value collections

The JSON way is pretty easy, but then I would just be storing a long string in a distributed cache. And I know that Redis can do so much more, if I use it's specialized data types (introduction to the data types).

If I just need to lookup a specific property from the customer profile, I can request the value of this single key, instead of loading a full JSON string and deserializing it all to an object. Or if the customer has a history of 100 orders and I only want to show a paged list of 10 orders, I could normalize the data into a list and later query Redis to return just the specific range of objects I want to show.

The objects representing a customer profile might look like these (link to Gist):

public class BasicData
{
    public long UserId { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string Email { get; set; }
    public string PhoneNumber { get; set; }

    public string AddressLine1 { get; set; }
    public string PostalCode { get; set; }
    public string City { get; set; }
    public string Country { get; set; }

    public bool HasMarketingPermission { get; set; }
    public DateTime CreatedDate { get; set; }
}

public class SupportInquiry
{
    public long Id { get; set; }
    public bool IsClosed { get; set; }
    public string Description { get; set; }
    public DateTime SubmittedDate { get; set; }
    public string SubmitterEmail { get; set; }
    public string SubmitterName { get; set; }
}

My strategy for storing a user profile looks like this:

  • User token: Add the user id to a hash with a random key (user token). This token will be used as the id for all other Redis entries for a given user.
  • Customer basic data: Transform the simple properties into a hash with key-value entries.
  • Interests: Transform simple string tags into a set of strings.
  • Orders, invoices or support inquiries: For each of the objects, transform their properties into a hash map of key-values. Then store the id of these hashes in a list.

Code example

For my example on how to achieve all this, consider the following steps a process for loading and storing data upon logging in a user:

  1. Preload various user data from backend systems.
  2. Generate a GUID as a user token. Store it in Redis with the user id as value.
  3. Store the user data in Redis, using appropriate data types. The keys contains the user id.
  4. Add the user token to the user's claims identity for storing in a cookie.

Whenever a request hits the webserver, the next could be a way to validate the session and load the data:

  1. Look up the user token, from the claims cookie, in Redis. Validate that it exists. If not, log out the user.
  2. Load the user data from Redis (lazily or eagerly).

In the following example, showing how I store the objects in Redis, I use the quite popular library StackExchange.Redis. This is a high-performing library that reflects the complete set of commands, without adding too much complexity.

First, the two before mentioned model classes I use in my sample code. They consists of simple type properties that can be transformed into Redis hash values.

This is the service class that performs the loading and storing of data in Redis:
Link to Gist

I used two extension methods, ToHashEntries and ConvertFromRedis, to serialize and deserialize any object to and from Redis hashes. They look like this:
Link to Gist

To show how the user profile is loaded and stored, have a look at this AccountController. The authentication part is a mockup made to show the concept of the login process:
Link to Gist

Here is a custom identity validator I created to extend the default cookie authentication provider. It ensures that Redis knows of a user token matching the one in the authentication cookie. If Redis does not have a user token hash key matching that user token, or if it points to another user id than expected, then the user will be logged out:
Link to Gist

Here are the startup class, showing registration of the custom identity validator and adding a simple role policy (for logged in users):
Link to Gist

This controller is made to show one simple way of consuming the stored user data:
Link to Gist

In my sample code the profile data would rarely expire, as the TTL is renewed on every load. And the TTL for the profile data is considerably longer than for the user token, which is renewed on every request.

But what if the profile data would have to be reloaded within an active session? And what if multiple requests for these data would be launched simultaneously?

With the current solution, this might trigger multiple simultaneous calls to backend systems and lead to race conditions in Redis. Therefore a simple "write lock flag" could be considered to signal that other threads should block their requests, while the first thread finishes loading and storing the data. The other threads can then try again loading the data from Redis.

This lock solution will potentially block requests if they need access to user data. Other threads will not be blocked or served sequentially. However, the blocked threads will only be blocked until the first thread completes loading and storing. After that, all threads are again served in parallel.

Summing up

Now you know what I think about session state in ASP.Net applications, even about the shared ones. Replacing session state with a data caching layer is just one possible application of Redis. It could also be applied for output caching of responses, data queuing (replacing large message queues), counting stuff, doing leaderboards etc.

So, I personally find Redis very exciting as a scalability lever. I might even follow up with more applications of it.

And just one thing. Don't use my sample code as-is! It is purely made for demonstration purposes.

You can find the source code here.

Do you have Redis in your projects? If so, how do you use it?