#137 Memoization (revised)
- Download:
- source codeProject Files in Zip (55.5 KB)
- mp4Full Size H.264 Video (13.8 MB)
- m4vSmaller H.264 Video (8.53 MB)
- webmFull Size VP8 Video (11.9 MB)
- ogvFull Size Theora Video (16.5 MB)
If you find yourself running a slow method multiple times in a single request where the output is consistent you might want to consider caching its return value using a technique called Memoization. Below is a screenshot from a simple example app which executes a method and displays its output and the time it takes to process.
We call this method twice and in a view template and it takes half a second to run each time. The method itself just sleeps for half a second to simulate a complex operation then returns a value. We’ll cache this value using memoization.
class Product < ActiveRecord::Base def filesize sleep 0.5 4815162342 end end
When we first covered this topic we used the ActiveSupport::Memoizable
module but this has since been deprecated. The deprecation message tells us to “simply use Ruby memoization pattern instead”. The most common approach to doing this is to store the result in an instance variable which we can do like this:
def filesize @filesize ||= begin sleep 0.5 4815162342 end end
As we’re caching the result of more than one line of code here we’ve used a begin
end
block here along with ||=
to store the result in the instance variable the first time the method is called. When we reload the page now the method still takes half a second to run the first time but runs almost instantly the second time.
Using a begin
block like this isn’t the best approach especially if the logic inside the block is more complex. In these scenarios it’s better to move the code out into a separate method like this:
class Product < ActiveRecord::Base def filesize @filesize ||= calculate_filesize end private def calculate_filesize sleep 0.5 4815162342 end end
Now that we have the code cleaned up a little we’ll focus on the ||=
operator. How does this work? The line @filesize ||= calculate_filesize
is roughly equivalent to @filesize || @filesize = calculate_filesize
. This returns the value of the @filesize
if it isn’t nil
or false
. Otherwise it calculates the value and stores the result in @filesize
. The two forms aren’t exactly equivalent, however, there are some minor differences between them. We’ll investigate these by using irb with warnings turned on. If we try using the long form to return or set an instance variable we’ll get a warning telling us the the instance variable hasn’t been initialized. If we use the short ||=
form we won’t see a warning.
>> @a || @a = "foo" (irb):1: warning: instance variable @a not initialized => "foo" >> @b ||= "bar" => "bar"
This is even more noticeable if we try to set a local variable this way as we’ll get an error if we use the long form.
>> a || a = "foo" NameError: undefined local variable or method `a' for main:Object from (irb):3 from /Users/eifion/.rbenv/versions/1.9.3-p194/bin/irb:12:in `<main>' >> b ||= "bar" => "bar"
Something we should always keep in mind when using this approach is that the value won’t be read from the cache if it is equal to false
or nil
. If we have the calculate_filesize
method return one of these values it’s called twice when we reload the page and the page still takes a full second to load. To get around this problem we can set the cache value unless it has already been defined, like this:
def filesize @filesize = calculate_filesize unless defined? @filesize @filesize end
Now the page caches the value the first time it’s calculated and the second time it’s read from the cache correctly.
Handling Methods That Take Arguments
We probably won’t run into this scenario very often so the ||=
operator will generally work well enough. Another scenario we might encounter is that when the returned value is dependent on an argument passed into the method. We can demonstrate this by having calculate_filesize
take an argument.
class Product < ActiveRecord::Base def filesize(*args) @filesize ||= calculate_filesize(*args) end private def calculate_filesize(num = 1) sleep 0.5 4815162342 * num false end end
In the view we’ll add another call to this method that passes an argument.
<h1>Memoization</h1> <%= report_time "@product.filesize" %> <%= report_time "@product.filesize" %> <%= report_time "@product.filesize(2)" %>
When we reload the page now we can see the problem: the last call to product.filesize
has returned the same value as the others as the cached value has been returned.
To solve this we need to keep a separate cache for each different argument that is passed in. One of the way to do this is to use a hash with the argument as the key.
def filesize(*args) @filesize ||= {} @filesize[args] ||= calculate_filesize(*args) end
When we reload the page now the final snippet has the correct value and takes half a second to run as it’s being calculated rather than being fetched from the cache.
If we take this approach we should keep the arguments that are passed in as simple objects so that they work as hash keys. While we’re on the subject of hashes we’ll show you another approach to memoization which uses the hash default value. We can call Hash.new
and pass in a block which will be executed when the key isn’t found. When we set a hash key with a given argument it will trigger a miss which will execute the block and cause the value to be cached. What’s nice about this approach is that if the method returns a false
or nil
value it will be cached and used again if the method is called with the same argument again.
A More General Solution
For fun let’s see what it takes to generalize this memoization approach so that we can easily add it to any method. We want to be able to revert to just having our filesize
method again and to have a memoize
class method that we can pass a method name to memoize it.
class Product < ActiveRecord::Base def self.memoize(name) alias_method "_unmemoized_#{name}", name define_method(name) do |*args| @_memoized ||= {} @_memoized[name] ||= Hash.new do |hash, key| hash[key] = send("_unmemoized_#{name}", *key) end @_memoized[name][args] end end def filesize(num = 1) sleep 0.5 4815162342 * num end memoize :filesize end
Our new memoize
method takes the name of the method as an argument and the first thing we do is alias the method so that we can use it later. Next we define a method with that name which overrides the existing method in the class. This overriding method has a @_memoized
instance variable which defaults to a new hash. We then create a new hash key with the same name as the method so that each memoized method can have its own key, and we store the value by setting a nested key with a key that’s the name we’ve passed in and with a value that’s determined by calling the original version of the method we’ve overridden. When we reload the page now to see if this works we see the same results as before with the second call to filesize
without any arguments being cached as we’d expect.
Doing this abstraction works as an exercise but it has limited practical use. There aren’t many situations where we’d want to do this kind of thing. Most of the time we want to craft the memoization around an application’s specific needs as different apps generally require different approaches. For our application the first approach we took is good enough and doesn’t involve a lot of code. If we do want a more general approach the Memoist gem is a direct extraction of ActiveSupport::Memoizable
. This is more feature complete that what we showed in our abstraction and includes functionality such as expiring caches.
Speaking of expiring the cache if we do need to make a cache stale with our approach we can simply clear the instance variable where the cached values are stored. Expiring the cache isn’t usually necessary, however, as the cache only hangs around for the duration of the object and this is often only a dingle request. If we do want to cache something that sticks around for a longer period we should consider using Rails’ built-in caching system. Also keep in mind that Memoization isn’t free: we’re saving CPU cycles but at the cost of memory. If we’re storing cache values based on arguments this can quickly grow out of hand if the method is called with a large number of different arguments. When in doubt it’s best to profile and benchmark different scenarios and avoid premature optimization.