#321 HTTP Caching pro

Jan 30, 2012 | 15 minutes | Performance, Views, Caching

With HTTP response headers you can the cache in the user's browser and proxies. Etags, Last-Modified, Cache-Control and Rack::Cache are all covered here

Click to Play Video ▶

Download:
source codeProject Files in Zip (84.6 KB)
mp4Full Size H.264 Video (31.3 MB)
m4vSmaller H.264 Video (16.8 MB)
webmFull Size VP8 Video (19 MB)
ogvFull Size Theora Video (40.6 MB)

HTTP caching involves a cache that is managed through HTTP headers. This cache is often stored in the user’s web browser, but not always. If you’re unfamiliar with HTTP caching there’s an excellent tutorial by Mark Nottingham that goes through the basics and which is well worth taking the time to read.

This kind of caching can make a web site seem much faster because the browser is reading files from a cache that’s close to the user. How can we use this kind of cache to a Rails application? Rails has some HTTP caching enabled by default so let’s start by seeing what it provides. We’ll use the curl command to see what headers are returned by the response to a request made to a local Rails app running in development mode.

          terminal
        
$ curl -I http://localhost:3000/
HTTP/1.1 200 OK 
Content-Type: text/html; charset=utf-8
X-Ua-Compatible: IE=Edge
Etag: "d3132af4f574ff53eb69e6fa5523fe2a"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: 2cb31eeec2b1beda8e1342f8caa90acd
X-Runtime: 0.018216
Content-Length: 0
Server: WEBrick/1.3.1 (Ruby/1.9.2/2011-07-09)
Date: Fri, 03 Feb 2012 21:09:22 GMT
Connection: Keep-Alive
Set-Cookie: _store_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFRkkiJWQ2N2U5MGEwMWZhNGU4ZTI2NzIxOTYxOWE1ODYyNzcxBjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMVhnSG9qT0ZiT0pJYURtNU15WXM0RkRxVUx1V05DZ0F2aFpoMkVIYW85YTA9BjsARg%3D%3D--7cf9c8c8b4f59426186fc8ae01a960a9a02378aa; path=/; HttpOnly

Etag

The header we’ll focus on first is Etag. This header’s value is a unique string whose value is based on the content in the response body. If we make a request to the same URL twice and the response body is different the second time then the Etag header will have a different value each time but if the two responses are identical then the Etag values will match. This header is generated by default in a Rails app.

This being the case if we run the curl command above again we might expect to get the same Etag value but this isn’t the case.

          terminal
        
$ curl -I http://localhost:3000/
HTTP/1.1 200 OK 
Content-Type: text/html; charset=utf-8
X-Ua-Compatible: IE=Edge
Etag: "2022a6964da4596038f5454b1929a2e1"
(rest of headers omitted)

The values are different because the response body changes slightly each time we make a request. If we view the source of the page, particularly the head section, we’ll see the part that changes.

          html
        
<head>
  <title>Store</title>
  <link href="/assets/application.css?body=1" media="all" rel="stylesheet" type="text/css" />
  <link href="/assets/products.css?body=1" media="all" rel="stylesheet" type="text/css" />
  <script src="/assets/jquery.js?body=1" type="text/javascript"></script>
  <script src="/assets/jquery_ujs.js?body=1" type="text/javascript"></script>
  <script src="/assets/products.js?body=1" type="text/javascript"></script>
  <script src="/assets/application.js?body=1" type="text/javascript"></script>
  <meta content="authenticity_token" name="csrf-param" />
  <meta content="wu8YWep8ZJvxAk0mjb0RbMYhVgivucMnorR94grDUnc=" name="csrf-token" />
</head>

There’s a meta tag in the head section of the page called csrf-token. This tag has a content attribute that contains a unique string which is different for each new visitor. If we reload the page in the browser, however, the value of this attribute remains the same each time. This is because the browser is sent a session cookie to identify its session and this is sent back to the server each time. The curl command doesn’t support cookies and so each request will be treated as a separate session and the csrf-token’s value will change each time. We can simulate the way a browser works by telling curl to store any cookies sent by the response in a cookie jar.

          terminal
        
$ curl -I http://localhost:3000/ -c cookies.txt
HTTP/1.1 200 OK 
Content-Type: text/html; charset=utf-8
X-Ua-Compatible: IE=Edge
Etag: "10537767ffa7661930cbb10aa243fc9c"
Cache-Control: max-age=0, private, must-revalidate
(rest of headers omitted)

We can then send these cookies back with the next request by using curl’s -b option. When we do this we should get the same Etag value as before.

          terminal
        
$ curl -I http://localhost:3000/ -b cookies.txt
HTTP/1.1 200 OK 
Content-Type: text/html; charset=utf-8
X-Ua-Compatible: IE=Edge
Etag: "10537767ffa7661930cbb10aa243fc9c"
Cache-Control: max-age=0, private, must-revalidate
(rest of headers omitted)

This time the two values are the same as the requests are considered to be part of the same session.

So what does this tag have to do with caching? When the browser caches the response it will assign the value of the Etag header to it. If the user requests the same URL again the browser will send the Etag as a header called If-None-Match. If we include this header in the curl command we’ll get a different response.

          terminal
        
$ curl -I http://localhost:3000/ -b cookies.txt --header 'If-None-Match: "10537767ffa7661930cbb10aa243fc9c"'
HTTP/1.1 304 Not Modified 
X-Ua-Compatible: IE=Edge
Etag: "10537767ffa7661930cbb10aa243fc9c"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: 3e97d449f952485c58e84cc72b507c6e
X-Runtime: 0.014820
Server: WEBrick/1.3.1 (Ruby/1.9.2/2011-07-09)
Date: Fri, 03 Feb 2012 21:52:54 GMT
Set-Cookie: _store_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFRkkiJTBiY2NhOGNlNzhjZTIwYzMwZjQxMDhkNjc5ZDdiNWI3BjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMWZyN0s3VEZWR29VYUR1QlgvRXhZQ1FZNHEzaUtuQkMrc2lWeWUvV250K3c9BjsARg%3D%3D--db383e0d358dc803daa47904f20f47911c235c68; path=/; HttpOnly

When we include this header we get a 304 Not Modified response instead of 200 OK. This tells the browser that its cache is up to date and that it can read the file from there. This makes the site appear much faster to the user as the response is sent from the local cache instead of being downloaded again.

Even though this request now appears quicker to the end user the server still has to generate the full response so that it can generate the Etag to send back and so the request takes as long to process on the server as the uncached request. This type of caching won’t save any resources on the server, it just makes the response appear faster to the user.

It’s possible to make things more efficient on the server by customizing the way an Etag is generated. We’ll do this in an application’s ProductsController’s show action. This is a page that displays information about a single product.

The code for this action looks like this:

          /app/controllers/products_controller.rb
        
def show
  @product = Product.find(params[:id])
end

This is a simple action that fetches a Product by its id. If we know that all of the dynamic content on the page is based on this product we can use it to generate the Etag. To do this we use the fresh_when method and pass it the etag option.

          /app/controllers/products_controller.rb
        
def show
  @product = Product.find(params[:id])
  fresh_when etag: @product
end

This will generate an Etag header based on that product instead of the page that would be rendered. Internally this will call cache_key on the product to determine the tag’s value, which is based on the product’s updated_at value. This means that when a product is updated its Etag will change and so any cache based on it will automatically be expired.

The fresh_when method does a couple of things. First it checks that the Etag passed in with the request matches the Etag for the product. If it does then the cache is fresh so the default renderer will be changed to send a 304 Not Modified response instead of rendering the actual template for the show action. If the Etags don’t match the page will be rendered as normal and returned to the client.

We can try this out with a curl request. We no longer need to send the cookies as the Etag is now dependent on the product and not the response body but we need to change the URL so that it points to the page for a product.

          terminal
        
$ curl -I http://localhost:3000/products/1
HTTP/1.1 200 OK 
Etag: "71c9d1d4a8215bd644a5910e17abba30"
Content-Type: text/html; charset=utf-8
Cache-Control: max-age=0, private, must-revalidate
X-Ua-Compatible: IE=Edge
X-Request-Id: b8cc044bdb08d6447d674c00ea976219
X-Runtime: 0.077276
Content-Length: 0
Server: WEBrick/1.3.1 (Ruby/1.9.2/2011-07-09)
Date: Fri, 03 Feb 2012 22:41:44 GMT
Connection: Keep-Alive
Set-Cookie: _store_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFRkkiJWYzNDM2NGUxNTlmZWRmYjBlNzIxN2UwZTAzZWM4NWRlBjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMW9vWjdRVEdnYml5aDVndGtaUkxFUG1LbDhlbThMOWFOWkU0ejNkVVNZMTg9BjsARg%3D%3D--57ee4b7dd461e0fe728176a4557ff90894842ff5; path=/; HttpOnly

When we visit the page again the Etag remains the same as the product hasn’t been updated since the last request.

          terminal
        
$ curl -I http://localhost:3000/products/1
HTTP/1.1 200 OK 
Etag: "71c9d1d4a8215bd644a5910e17abba30"
Content-Type: text/html; charset=utf-8
Cache-Control: max-age=0, private, must-revalidate
(rest of headers omitted)

If we pass this Etag in in an If-None-Match header it will return 304 Not Modified but this time the template won’t be rendered on the server because of the fresh_when method in the controller.

          terminal
        
$ curl -I http://localhost:3000/products/1 --header 'If-None-Match: "71c9d1d4a8215bd644a5910e17abba30"'
HTTP/1.1 304 Not Modified 
Etag: "71c9d1d4a8215bd644a5910e17abba30"
Cache-Control: max-age=0, private, must-revalidate
X-Ua-Compatible: IE=Edge
X-Request-Id: d119bff19b85bd9c8b0f7aac90d57cec
X-Runtime: 0.003060
Server: WEBrick/1.3.1 (Ruby/1.9.2/2011-07-09)
Date: Fri, 03 Feb 2012 22:49:49 GMT

We can see this by looking at the development log. This shows that the first request rendered the show template but that because we sent the right Etag value with the second request the show template wasn’t rendered and 304 Not Modified was returned which saved some processing on the server.

          terminal
        
$ tail -n 14 log/development.log 

Started HEAD "/products/1" for 127.0.0.1 at 2012-02-03 22:46:15 +0000
Processing by ProductsController#show as */*
  Parameters: {"id"=>"1"}
  Product Load (0.1ms)  SELECT "products".* FROM "products" WHERE "products"."id" = ? LIMIT 1  [["id", "1"]]
  Rendered products/show.html.erb within layouts/application (1.1ms)
Completed 200 OK in 9ms (Views: 7.6ms | ActiveRecord: 0.1ms)


Started HEAD "/products/1" for 127.0.0.1 at 2012-02-03 22:49:49 +0000
Processing by ProductsController#show as */*
  Parameters: {"id"=>"1"}
  Product Load (0.1ms)  SELECT "products".* FROM "products" WHERE "products"."id" = ? LIMIT 1  [["id", "1"]]

The fresh_when method works well in our show action because we’re using the default rendering behaviour. If we had a respond_to block in the action’s code this approach wouldn’t work as well because this explicitly renders something and it won’t fall back to the default renderer. For these cases there’s a stale? method that we can use. This will handle any explicit rendering if the Etags don’t match or return 304 Not Modified if they do.

          /app/controllers/products_controller.rb
        
def show
  @product = Product.find(params[:id])
  if stale? etag: @product
    respond_to do |format|
      #.....
    end
  end
end

We don’t need to use this in our show action so we’ll stick with fresh_when. One cool feature of it that we’ve not mentioned here is that it allows us to pass in an array of multiple objects. If a page’s dynamic content is dependent on multiple objects we can pass them in and the Etag will be based on all of them. If we want the Etag to be based on the product and also the current user we could write something like this.

          /app/controllers/products_controller.rb
        
def show
  @product = Product.find(params[:id])
  fresh_when etag: [@product, current_user]
end

Last-Modified

That pretty much covers Etags but there’s another header tag that goes with Etag called Last-Modified. Like etag we can set this header through the fresh_when method and when we use it we should set it to the time that the document that’s returned was last modified.

          /app/controllers/products_controller.rb
        
def show
  @product = Product.find(params[:id])
  fresh_when etag: @product, last_modified: @product.updated_at
end

When we make a request to the product’s page now the response will include the Last-Modified header with a value showing the time at which that product was last modified.

          terminal
        
$ curl -I http://localhost:3000/products/1
HTTP/1.1 200 OK 
Etag: "71c9d1d4a8215bd644a5910e17abba30"
Last-Modified: Fri, 03 Feb 2012 20:53:43 GMT
(rest of headers omitted)

The browser can use this header in a similar way to Etags to determine if its cache is still up to date. It does this by passing in a header option with the request called If-Modified-Since.

          terminal
        
$ curl -I http://localhost:3000/products/1 --header 'If-Modified-Since: Fri, 3rd Feb 2012 21:05:03 GMT'
HTTP/1.1 304 Not Modified 
Etag: "71c9d1d4a8215bd644a5910e17abba30"
Last-Modified: Fri, 03 Feb 2012 20:53:43 GMT
(rest of headers omitted)

As the page hasn’t been modified since the date we sent in If-Modified-Since we get a 304 Not Modified response. If we pass in an time earlier than the time that the product was last modified we’ll get a full 200 OK response.

          terminal
        
$ curl -I http://localhost:3000/products/1 --header 'If-Modified-Since: Fri, 3rd Feb 2012 20:45:00 GMT'
HTTP/1.1 200 OK 
Etag: "71c9d1d4a8215bd644a5910e17abba30"
Last-Modified: Fri, 03 Feb 2012 20:53:43 GMT
(rest of headers omitted)

It’s a good idea to set both the etag and last_modified options when you can. If we’re using Rails 3.2 we can do this in a more concise way by passing the object directly in to fresh_when.

          /app/controllers/products_controller.rb
        
def show
  @product = Product.find(params[:id])
  fresh_when @product
end

There are some things that we can do with the Last-Modified header that we can’t do as easily with Etag. We’ll demonstrate this in the index action which displays a list of products. First, a slight tangent on how we go about fetching all the products.

          /app/controllers/products_controller.rb
        
def index
  @products = Product.all
end

In this action we use Product.all to fetch all the products. This performs the database query immediately in the controller. When we’re working with caching we should try to perform the database query as late as possible. We don’t need the list of products until they’re rendered out in the view and so we shouldn’t perform the query here. If we use the order method this will defer the actual query, alternatively we can use scoped to fetch all the products. This is similar to all but doesn’t perform the database query until we try to work with the products. We can use this in conjunction with fresh_when to set the Last-Modified-Date to the updated_at time for the most recently updated product.

          /app/controllers/products_controller.rb
        
def index
  @products = Product.scoped
  fresh_when last_modified: @products.maximum(:updated_at)
end

This code does make a database query but it only fetches one quick value to determine if the cache is up-to-date. If it is then the action doesn’t need to render the whole view template and therefore we don’t need to fetch all the products from the database. This may or may not give up a big performance boost depending on how many requests come in from browsers that already have this page in their cache. It’s always a good idea to test and performance benchmark this kind of change to see if it has any real world benefits.

Cache-Control

So far we’ve covered two different response headers, Etag and Last-Modified. There’s one more that we’ll cover in this episode, Cache-Control. We can set a variety of options through this header to determine how the caching should behave. Rails sets some default values for this header automatically.

          text
        
Cache-Control: max-age=0, private, must-revalidate

We’ll look at the last value, must-revalidate, first. This value means that the browser should always check with the server before it serves up a page from the cache to ensure that it’s serving an up-to-date version. The max-age option specifies the number of seconds for which the cache can be served to the user before the browser contacts the server to revalidate the cache. If this value was set to 30 rather than 0 then the cached file can be served to the user locally for 30 seconds after the file is first downloaded. After that if the page is requested again the browser should contact the server to make sure that the cached version of the page is still valid. We can customize this max-age option by using the expires_in method and passing it a duration.

          /app/controllers/products_controller.rb
        
def show
  @product = Product.find(params[:id])
  expires_in 5.minutes
  fresh_when @product
end

When we make the same request again now we’ll see that the max-age option is now set to 300 seconds.

          terminal
        
$ curl -I http://localhost:3000/products/1 --header 'If-Modified-Since: Fri, 3rd Feb 2012 20:45:00 GMT'
HTTP/1.1 200 OK 
Etag: "71c9d1d4a8215bd644a5910e17abba30"
Last-Modified: Fri, 03 Feb 2012 20:53:43 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: max-age=300, private
(rest of headers omitted)

This means that the local cache will be considered fresh for 300 seconds. After that the browser will check with the server to see if the page is still valid. If it is then the cached version will be considered valid for another 300 seconds.

We have one more option to cover: private. This value means that the cache should only be stored for this specific user, usually in their web browser, and not stored in a place where multiple people access it such as through a proxy. We can customize this behaviour in any of the caching methods, in expires_in fresh_when or stale. All we need to do is use the public option and set it to true.

          /app/controllers/products_controller.rb
        
def show
  @product = Product.find(params[:id])
  expires_in 5.minutes
  fresh_when @product, public: true
end

When we fetch that page again we’ll see that the Cache-Control header now says public instead of private.

          terminal
        
$ curl -I http://localhost:3000/products/1
HTTP/1.1 200 OK 
Etag: "71c9d1d4a8215bd644a5910e17abba30"
Last-Modified: Fri, 03 Feb 2012 20:53:43 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: max-age=300, public
(rest of headers omitted)

This means that this page can be cached in other places beside the user’s browser such as in a proxy.

Rack::Cache

Speaking of proxies Rails now automatically includes Rack::Cache in production. This is known as a reverse proxy cache or gateway cache but don’t let the name confuse you, the core concept is fairly simple. There’s an excellent guide by Ryan Tomayko called “Things Caches Do” which explains this nicely. Rack::Cache sits on the server between the users’ requests and a Rails application. Normally HTTP caches are stored on the user’s side in their browser but if Rack::Cache is installed and the cache is marked as public the response will be stored inside Rack::Cache as well. There’s a nice example of this in Ryan’s article.

If Alice makes a request to a server and the response for this request isn’t cached the request will pass through Rack::Cache to the Rails backend. The response is tagged with maxage=600 so the response is cached for ten minutes and as the cache is marked as public the response will be cached in Rack::Cache too before its sent to Alice.

Let’s say that within that ten minutes another user called Bob comes along and he makes the same request. Even though he doesn’t have this page cached in his browser Rack::Cache does so the Rails app won’t be hit as Rack::Cache knows that its cached version is still fresh.

When Rack::Cache has a fresh copy of the requested resource it will return it.

Rack::Cache also supports the validations of the Last-Modified and Etag headers and there’s more information about these in Ryan’s article. We can think of Rack::Cache as a mini in-between browser. It will cache just like a browser does but it works for all users and only for public users.

As we mentioned earlier Rack::Cache is only available in production mode by default but if we want to try it out we can enable it in development by setting the perform_caching option to true.

          /config/environments/development.rb
        
config.action_controller.perform_caching = true

With this option set if we run rake middleware we see that the first piece of middleware in the stack is Rack::Cache.

          terminal 
        
$ rake middleware
use Rack::Cache
use ActionDispatch::Static
use Rack::Lock
(rest of stack omitted)

We’ll need to restart the server after we’ve made this change but once we have if we visit the page for a product in a browser Rack::Cache will cache it as we’ve set the public option to true in this action. The next time we visit this page it will be server by Rack::Cache instead of from the Rails application.

Caching Sensitive Data

While using a public cache can speed up our application we need to be careful. There’s a danger of sensitive information on a page or information that changes depending on the user that’s visiting the page being cached for every user so that other users see information that they shouldn’t. If there’s user-specific information on a page we don’t want to be stored in a public cache such as the csrf_meta_tag in a layout file. We can hide this dynamically so that it’s not shown if the cache for the current response is public.

          /app/views/layouts/application.html.erb
        
<%= csrf_meta_tag unless response.cache_control[:public] %>

We should do this to the flash messages that are shown, too.

          /app/views/layouts/application.html.erb
        
<% unless response.cache_control[:public] %>
  <% flash.each do |name, msg| %>
    <%= content_tag :div, msg, id: "flash_#{name}" %>
  <% end %>
<% end %>

Now that we know all about these fancy HTTP caching techniques the big question is “when should we use it”? How do we know what cache to use and when? First of all we should avoid premature optimization and wait until we know which pages are going to be hit often then focus on caching them first. If the page doesn’t change frequently and if it doesn’t matter if it’s a little out of date then the expires_in method is the way to go. This will lead to the fastest user response because the cache doesn’t have to be validated against the server. On the other hand if the page changes more frequently the we should consider fresh_when or stale as these gives us more control over when the cache expires. Finally we should only get public to true if we don’t have sensitive information on the page.