#22 Eager Loading (revised)
- Download:
- source codeProject Files in Zip (64.7 KB)
- mp4Full Size H.264 Video (16.6 MB)
- m4vSmaller H.264 Video (9.25 MB)
- webmFull Size VP8 Video (7.53 MB)
- ogvFull Size Theora Video (26 MB)
Eager loading is a way to improve performance by preloading associated content from the database and we’ll show you what that means in this episode. Below is a page that displays a list of products, each of which belongs to a category.
The category isn’t just a column in the products
table but is an associated record. A Product
belongs to a Category
so to display the category name in the view we need to go through the association and call product.category.name
.
This works well but if performance is an issue and we want to optimize the list page a good place to start is to look at the database queries that are made. If we look in the development log we’ll see that there are a lot of database queries being made by this request.
Product Load (0.4ms) SELECT "products".* FROM "products" ORDER BY name Category Load (0.2ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 2 LIMIT 1 CACHE (0.0ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 2 LIMIT 1 Category Load (0.2ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 1 LIMIT 1 CACHE (0.0ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 2 LIMIT 1 CACHE (0.0ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 1 LIMIT 1 CACHE (0.0ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 2 LIMIT 1 CACHE (0.0ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 1 LIMIT 1 Category Load (2.6ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 5 LIMIT 1 CACHE (0.0ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 2 LIMIT 1 CACHE (0.0ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 1 LIMIT 1 CACHE (0.0ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" = 2 LIMIT 1 ...
This is known as the “n+1 problem” which means that we have one query to fetch the initial products and then another for each of the products to fetch its category. Thankfully, ActiveRecord has some built-in caching support and if it detects a duplicate query it will return the cached result instead of querying the database again. This can help with performance in scenarios like this and we can see that most of the queries made above have returned a result from the cache.
There is still some unnecessary overhead in our request. In general the fewer database queries we need to make per request the better the performance is, although this isn’t always the case. Whenever we’re trying to improve the performance of a page it’s always a good idea to profile the request in a setup that’s a close to the production environment as possible as any little thing could be a factor. For example if the database server is on a separate machine from the web server it will have to communicate over the network each time a query is made and this can add overhead and make the number of queries made a bigger factor in the poor performance. Profiling was covered in episode 3681; here we’ll focus on how eager loading can help to reduce the number of queries performed. The products page triggers the index
action on the ProductsController
and in here we fetch the products ordered by their name.
def index @products = Product.order("name") end
One of the easiest ways to add eager loading to this is to call includes
to include the relevant category data.
def index @products = Product.order("name").includes(:category) end
Whenever we’re unsure as to whether we should pluralize the name in the includes
call we should remember that it needs to match the name of the association. In this case a Product
belongs to a Category
and so this is the name to use. If we reload the page now it behaves like it did before but the number of database queries made is hugely reduced.
Product Load (0.3ms) SELECT "products".* FROM "products" ORDER BY name Category Load (0.3ms) SELECT "categories".* FROM "categories" WHERE "categories"."id" IN (2, 1, 5, 4, 3)
We now have one query to fetch the products and another to fetch the relevant categories for those products. We might think that a single query might be better here but it’s often more efficient to make two simple queries instead of one more complex one. As we’re using two separate queries here it’s not a good idea to use the associated data within the same query. For example what is we want to order the products based on the category’s name instead of the product’s? We could write this:
def index @products = Product.order("categories.name").includes(:category) end
This works but it isn’t a good idea. If we look at the development log we’ll see that ActiveRecord is smart enough to detect that we’re using the categories
information in the products
query and uses a single query with joins.
SQL (44.3ms) SELECT "products"."id" AS t0_r0, "products"."name" AS t0_r1, "products"."price" AS t0_r2, "products"."category_id" AS t0_r3, "products"."created_at" AS t0_r4, "products"."updated_at" AS t0_r5, "categories"."id" AS t1_r0, "categories"."name" AS t1_r1, "categories"."created_at" AS t1_r2, "categories"."updated_at" AS t1_r3 FROM "products" LEFT OUTER JOIN "categories" ON "categories"."id" = "products"."category_id" ORDER BY categories.name
This approach is a little unreliable as it involves having to interpret the SQL string that’s passed in and it will be deprecated in Rails 4. If we need to use associated data in a query we should use joins
instead of includes
as this way we’re guaranteed to use a join in a single query.
def index @products = Product.order("categories.name").joins(:category) end
This won’t do quite as much for us as it won’t try to populate the associated data in ActiveRecord. For example in our view when we loop through the products and display its category’s name joins
won’t populate this information so a separate query will be triggered to get the category information for each product. We can, however, use joins
to improve performance further as it can be used to fetch just the data we need. We’ll often see joins
used with select
to limit the data that’s fetched and we’ll use them now to fetch all the products data but only the name column from the categories
table, which we’ll alias as category_name
.
def index @products = Product.order("categories.name").joins(:category).select('products.*, categories.name as category_name') end
The category name will now be an attribute on our Product
model and we can reference it directly in the view instead of having to go through the association.
<td><%= product.category_name %></td>
If we reload the page it still works as we’re loading each category name through the join. If we look at the log we’ll see that there’s only one database query made.
Product Load (0.4ms) SELECT products.*, categories.name as category_name FROM "products" INNER JOIN "categories" ON "categories"."id" = "products"."category_id" ORDER BY categories.name
If we do this and change the attributes on a record dynamically it’s a good idea to define a method with the same name as the aliased field in the model so that it’s always there and reliable so that we don’t have to rely on the dependency of the select
clause always being defined in the same way. If we define a category_name
method in our Product
model and use read_attribute
to fetch it through the database query if we’re using select
to fetch it and fall back to fetching it through the association if we fetch the products in another way.
def category_name read_attribute('category_name') || category.name end
We’ll finish off this episode by demonstrating a few tips in the console. Both the include
and joins
methods allow us to pass in associations and we can use multiple associations too. In our application a product has many reviews so we can pass this in along with the category and it will be added as a join to the query.
>> Product.joins(:category, :reviews) Product Load (0.1ms) SELECT "products".* FROM "products" INNER JOIN "categories" ON "categories"."id" = "products"."category_id" INNER JOIN "reviews" ON "reviews"."product_id" = "products"."id"
We can go deeper if we want to. A review belongs to a user and in cases like these we can use a hash to join that in.
>> Product.joins(:category, :reviews => :user) Product Load (0.1ms) SELECT "products".* FROM "products" INNER JOIN "categories" ON "categories"."id" = "products"."category_id" INNER JOIN "reviews" ON "reviews"."product_id" = "products"."id" INNER JOIN "users" ON "users"."id" = "reviews"."user_id"
Finally if we need more control over our joins we can just pass in an SQL query in so that if we need, say, a left join instead of an inner join, we can use one.
>> Product.joins("left outer join categories on category_id = categories.id") Product Load (0.3ms) SELECT "products".* FROM "products" left outer join categories on category_id = categories.id
We could use this if we want to select products that don’t have an associated category. If you want to be notified on when to use Eager Loading take a look at the Bullet gem which was covered in episode 372.