#181 Include vs Joins
- Download:
- source codeProject Files in Zip (99.3 KB)
- mp4Full Size H.264 Video (19.8 MB)
- m4vSmaller H.264 Video (13.3 MB)
- webmFull Size VP8 Video (30.6 MB)
- ogvFull Size Theora Video (30.2 MB)
The find method in ActiveRecord can take a number of different options. Two of these options, the ones we’ll focus on in this episode, are include
and joins
. These are often confused as they perform similar tasks, but there are key times when we should choose one over the other.
To explain the differences between include
and joins
we’ll use a Rails application in which users can write comments. The two models are joined with a standard has_many
/ belongs_to
relationship.
class User < ActiveRecord::Base has_many :comments end
class Comment < ActiveRecord::Base belongs_to :user end
The application has a page that shows all of the comments that have been made. For each comment the author’s name is shown, along with the text “(admin)” if that user is an admin.
Let’s say that we want to change the page to show only the comments that have been written by admin users. To do so we’ll have to modify the find in the index
action of the CommentsController
.
def index @comments = Comment.all(:order => "comments.created_at desc") end
We want to filter the comments on an attribute in the users table so we’ll have to join the users table into a single query with our Comments
find. But should we use joins
or include
for this? To work it out we’ll compare the two in the console.
First we’ll try using joins
, adding a condition so that only the admin users are returned.
>> c = Comment.all(:joins => :user, :conditions => { :users => { :admin => true } }) Comment Load (1.2ms) SELECT "comments".* FROM "comments" INNER JOIN "users" ON "users".id = "comments".user_id WHERE ("users"."admin" = 't') +----+-----------------------+---------+-----------------------+-----------------------+ | id | content | user_id | created_at | updated_at | +----+-----------------------+---------+-----------------------+-----------------------+ | 3 | Some people, when ... | 1 | 2009-09-28 19:00:3... | 2009-09-28 19:00:3... | | 5 | Write the code as ... | 2 | 2009-09-28 19:44:0... | 2009-09-28 19:44:0... | | 6 | Walking on water a... | 1 | 2009-09-28 19:46:2... | 2009-09-28 19:46:2... | | 8 | It should be noted... | 2 | 2009-09-28 19:49:3... | 2009-09-28 19:49:3... | +----+-----------------------+---------+-----------------------+-----------------------+ 4 rows in set
This performs a single query that only returns attributes from the comments table. If we try to get the first comment’s User
then another database call will have to be made as none of the users’ attributes were retrieved by the query above.
>> c.first.user User Load (0.3ms) SELECT * FROM "users" WHERE ("users"."id" = 1) +----+--------+-------+-------------------------+-------------------------+ | id | name | admin | created_at | updated_at | +----+--------+-------+-------------------------+-------------------------+ | 1 | Eifion | true | 2009-09-28 18:51:53 UTC | 2009-09-28 18:51:53 UTC | +----+--------+-------+-------------------------+-------------------------+ 1 row in set
So let’s try doing the same thing again, but with include
instead of joins
.
>> c = Comment.all(:include => :user, :conditions => { :users => { :admin => true } }) Comment Load Including Associations (0.7ms) SELECT "comments"."id" AS t0_r0, "comments"."content" AS t0_r1, "comments"."user_id" AS t0_r2, "comments"."created_at" AS t0_r3, "comments"."updated_at" AS t0_r4, "users"."id" AS t1_r0, "users"."name" AS t1_r1, "users"."admin" AS t1_r2, "users"."created_at" AS t1_r3, "users"."updated_at" AS t1_r4 FROM "comments" LEFT OUTER JOIN "users" ON "users".id = "comments".user_id WHERE ("users"."admin" = 't') +----+-----------------------+---------+-----------------------+-----------------------+ | id | content | user_id | created_at | updated_at | +----+-----------------------+---------+-----------------------+-----------------------+ | 3 | Some people, when ... | 1 | 2009-09-28 19:00:3... | 2009-09-28 19:00:3... | | 5 | Write the code as ... | 2 | 2009-09-28 19:44:0... | 2009-09-28 19:44:0... | | 6 | Walking on water a... | 1 | 2009-09-28 19:46:2... | 2009-09-28 19:46:2... | | 8 | It should be noted... | 2 | 2009-09-28 19:49:3... | 2009-09-28 19:49:3... | +----+-----------------------+---------+-----------------------+-----------------------+ 4 rows in set
This time a much more complicated SELECT query is run and columns from both the comments and users tables are retrieved. The related User
models are stored in memory so when we get the first comment’s User this time there’s no need to make another database call.
>> c.first.user +----+--------+-------+-------------------------+-------------------------+ | id | name | admin | created_at | updated_at | +----+--------+-------+-------------------------+-------------------------+ | 1 | Eifion | true | 2009-09-28 18:51:53 UTC | 2009-09-28 18:51:53 UTC | +----+--------+-------+-------------------------+-------------------------+ 1 row in set
Changing The Comments Page
Now that we have some understanding of the differences between include
and joins
, which one should we choose for our comments page? The question we need to ask is “are we using any of the related model’s attributes?” In our case the answer is “yes” as we’re showing the user’s name against each comment. This means that we want to get the users at the same time as we retrieve the comments and so we should be using include
here.
Back in our CommentsController
we’ll modify the index
action so that it gets the users along with the comments.
def index @comments = Comment.all(:include => :user, :conditions => { :users => { :admin => true} }, :order => "comments.created_at desc") end
This find
is looking a little complex now, so we’d probably move parts of it into a named scope if this was an application we were putting into production, but we’ll leave it as it is.
What if the situation was a little different and we weren’t showing the users’ names with their comments? Well let’s see. First we’ll remove the part that shows the user’s name (and whether they’re an admin) from the comment
partial.
<div class="comment"> <%= simple_format comment.content %> <span style="text-decoration: line-through;"><p class="author"></span> <span style="text-decoration: line-through;"><%= h comment.user.name %></span> <span style="text-decoration: line-through;"><% if comment.user.admin? %>(admin)<% end %></span> <span style="text-decoration: line-through;"></p></span> <p class="actions"> <%= link_to "edit", edit_comment_path(comment) %> | <%= link_to "destroy", comment, :method => :delete, :confirm => "Are you sure?" %> </p> </div>
Reloading the page now shows that the user names have disappeared.
We aren’t displaying any user information on the page any more so our include is now rather inefficient as we’re getting all of the related User
information but not using it. In this case the correct option to use is joins
; this way we’re not getting User
information we don’t need. All we have to do is replace include
with joins
in the find.
def index @comments = Comment.all(:joins => :user, :conditions => { :users => { :admin => true} }, :order => "comments.created_at desc") end
This way we’re only using the users table to perform conditions on so our comments page will be more performance and memory efficient.
Another Example
Let’s take another look at the SQL statement that was generated when we executed our find with include
in the console earlier.
SELECT "comments"."id" AS t0_r0, "comments"."content" AS t0_r1, "comments"."user_id" AS t0_r2, "comments"."created_at" AS t0_r3, "comments"."updated_at" AS t0_r4, "users"."id" AS t1_r0, "users"."name" AS t1_r1, "users"."admin" AS t1_r2, "users"."created_at" AS t1_r3, "users"."updated_at" AS t1_r4 FROM "comments" LEFT OUTER JOIN "users" ON "users".id = "comments".user_id WHERE ("users"."admin" = 't')
This statement is fairly complicated as it gets each column from both the comments and users table, renaming them all. This means that the include
option doesn’t really work with the select
option as we don’t have control over how the first part of the SELECT statement is generated. If you need control over the fields in the SELECT then you should use joins
over include
.
So where might this be useful? On our users page we have a list of the users and, against each user, a count of the number of the comments that user has made.
In the view file for the page the number of comments is retrieved using this line of code.
<%= pluralize user.comments.count, "comment" %>
This code will perform a separate query for every user in the list, which is not ideal. It would be better if we could retrieve the number of comments at the same time we get the rest of the information about the users.
We can do this by using joins
with the select
option. We’ll use the console again to demonstrate how.
This time we’ll fetch all of the users and use joins
to join with the comments table. We’ll use the select
option to restrict the columns to all of the columns from the users table and to get a count of the comments, and also the group
option that ActiveRecord provides to group the results by each user’s id
.
>> User.all(:joins => :comments, :select => "users.*, count(comments.id) as comments_count", :group => "users.id")
When we run this we get all of the details for the users and a count of the number of comments each one has made.
User Load (1.3ms) SELECT users.*, count(comments.id) as comments_count FROM "users" INNER JOIN "comments" ON comments.user_id = users.id GROUP BY users.id +----+--------+-------+------------------+------------------+----------------+ | id | name | admin | created_at | updated_at | comments_count | +----+--------+-------+------------------+------------------+----------------+ | 1 | Eifion | true | 2009-09-28 18... | 2009-09-28 18... | 2 | | 2 | Susan | true | 2009-09-28 18... | 2009-09-28 18... | 2 | | 3 | Paul | false | 2009-09-28 18... | 2009-09-28 18... | 3 | | 4 | John | false | 2009-09-28 18... | 2009-09-28 18... | 1 | +----+--------+-------+------------------+------------------+----------------+ 4 rows in set
Now that we can get the users and the number of comments each one has made in one query we can modify the users index page to use our new query. We’ll need to make two small changes. In the controller we’ll replace User.all
with our new find
.
def index @users = User.all(:joins => :comments, :select => "users.*, count(comments.id) as comments_count", :group => "users.id") end
And in the index view we can use the comments_count
field to display the number of comments each user has written, replacing the comments.count
code that caused another query to be made for each user on the page.
<%= pluralize user.comments_count, "comment" %>
The users page will look exactly the same when we reload it now, but will be much more efficient in the way it accesses the database as it’s performing everything in a single query.
Another Use For joins
We’ll finish this episode by showing another good candidate for using joins
over include
. Below are the User
and Comment
models we’ve been using, along with two new ones: Group
and Membership
.
class Group < ActiveRecord::Base has_many :memberships has_many :users, :through => :memberships end class Membership < ActiveRecord::Base belongs_to :user belongs_to :group end class User < ActiveRecord::Base has_many :memberships has_many :groups, :through => :memberships has_many :comments end class Comment < ActiveRecord::Base belongs_to :user end
In this setup User
and Group
have a many-to-many relationship with each other through Membership
. We want to show all of the comments that have been made by users who belong to a specific group. Ideally we’d want some kind of association between Group
and Comment
, maybe something like this.
class Group < ActiveRecord::Base has_many :membership has_many :users, :through => :memberships has_many :comments, :through => :users end
Ruby doesn’t support nested has_many :through
associations like this, though, so we’ll need to find another way to do this and happily we can use joins
again here.
Here’s what the page we’re building, the GroupController
’s show
view, looks like so far. We have a list of the groups members but we’ve not yet added their comments.
We’ll go back to the console again to work out the code we’ll need to display the comments. First we’ll get our group.
>> g = Group.first Group Load (0.4ms) SELECT * FROM "groups" LIMIT 1 +----+------------------+-------------------------+-------------------------+ | id | name | created_at | updated_at | +----+------------------+-------------------------+-------------------------+ | 1 | Musician's Guild | 2009-10-01 20:09:11 UTC | 2009-10-01 20:09:11 UTC | +----+------------------+-------------------------+-------------------------+ 1 row in set
And then we’ll use joins
to get the comments for the group’s members. We need to join the users and memberships tables so we can join both the User
and Memberships
associations. Then we can add conditions and restrict the memberships to those whose group_id
is the id
of our group.
>> Comment.all(:joins => { :user => :memberships }, :conditions => { :memberships => { :group_id => g.id } } ) Comment Load (0.7ms) SELECT "comments".* FROM "comments" INNER JOIN "users" ON "users".id = "comments".user_id INNER JOIN "memberships" ON memberships.user_id = users.id WHERE ("memberships"."group_id" = 1) +----+--------------------+---------+--------------------+--------------------+ | id | content | user_id | created_at | updated_at | +----+--------------------+---------+--------------------+--------------------+ | 1 | I have always w... | 3 | 2009-09-28 18:5... | 2009-09-28 18:5... | | 3 | Some people, wh... | 1 | 2009-09-28 19:0... | 2009-09-28 19:0... | | 4 | Java is to Java... | 3 | 2009-09-28 19:0... | 2009-09-28 19:0... | | 5 | Write the code ... | 2 | 2009-09-28 19:4... | 2009-09-28 19:4... | | 6 | Walking on wate... | 1 | 2009-09-28 19:4... | 2009-09-28 19:4... | | 7 | Never trust a c... | 3 | 2009-09-28 19:4... | 2009-09-28 19:4... | | 8 | It should be no... | 2 | 2009-09-28 19:4... | 2009-09-28 19:4... | +----+--------------------+---------+--------------------+--------------------+ 7 rows in set
Running the query shows all of the comments from the users in our group so we can use it to finish off our group page. We’ll use it in a new comments method in our Group
model.
class Group < ActiveRecord::Base has_many :memberships has_many :users, :through => :memberships def comments Comment.all(:joins => { :user => :memberships}, :conditions => { :memberships => { :group_id => id } } ) end end
We still need to display the comments on our page so we’ll have to update our view. We already have a comment
partial so all we need to do is render the comments.
<h2>Comments</h2> <%= render @group.comments %>
If we reload the page now we’ll see the comments below the list of users.
Looking again at our Group
model’s comments method it would appear to work if we replace the joins
with include
and these two options often appear to be interchangeable. Bear in mind though that using include
here would load the user and memberships associated models into memory which in this case we don’t want.
Our comments method is almost creating another association and in these cases we can used scoped
instead of all
. This will behave almost like a named scope, but one which is generated dynamically. The advantage is that we can then chain other scopes to it to further limit and change the limit of the find.
If you found this episode useful and you’d like more information on performing ActiveRecord queries Ryan Bates has produced a series of screencasts called “Everyday Active Record” that go into more depth on the areas we’ve covered.