#179 Seed Data

Sep 14, 2009 | 7 minutes | Active Record, Rails 2.3

Rails 2.3.4 includes a conventional way to add seed data to your application - no more including it in the migration files.

Click to Play Video ▶

Download:
source codeProject Files in Zip (91.5 KB)
mp4Full Size H.264 Video (11.3 MB)
m4vSmaller H.264 Video (7.83 MB)
webmFull Size VP8 Video (22.2 MB)
ogvFull Size Theora Video (15 MB)

Rails has recently been updated to version 2.3.4. This release focuses mainly on security and bug fixes but there are a couple of interesting new features too. One of these new features allows you to seed your applications database with the data it needs to get your application up and running.

When you create a Rails application under Rails 2.3.4 a file called seeds.rb is created in the db directory. This is now the conventional place to define any initial data that your application needs. This data can then be created by running a new rake task: rake db:seed.

To give a quick demonstration of this we’ll add a puts statement to the seed file:

          terminal
        
puts "Seed data goes here."

Then run the rake task, where we’ll see the output.

          terminal
        
% rake db:seed
(in /Users/eifion/rails/apps_for_asciicasts/ep179/seeder)
Seed data goes here.

At first sight this might seem to be a simple new feature, and so it is. What makes it worthy of note though is that it means that there is now a conventional place to put seed data in applications.

Creating Seed Data

Let’s say that we’re writing an application where users have to choose which operating system they’re running when they register. To enable this we’ll create a model called OperatingSystem that has a name column. We’ll generate the model in the usual way.

          terminal
        
script/generate model operating_system name:string

The list of operating systems isn’t something that will be created by the users so we’ll need to define some initial data. But where should we do this? One place where Rails developers sometimes add seed data is within the migration files, like this:

          ruby
        
class CreateOperatingSystems < ActiveRecord::Migration
  def self.up
    create_table :operating_systems do |t|
      t.string :name

      t.timestamps
    end

    # Create the seed data
    ["Linux", "Mac OS X", "Windows"].each do |os|
      OperatingSystem.find_or_create_by_name os
    end
  end

  def self.down
    drop_table :operating_systems
  end
end

This works, but it isn’t really the best way to do this. Migrations are best left to the job they’re designed for: creating the structure of your database. Creating seed data in them can also lead to your seed data being scattered across several migration files.

Now from Rails 2.3.4 we have a central place where we can create the seed data so we can move the seed data code from the migration file into the seeds.rb file.

          ruby
        
["Linux", "Mac OS X", "Windows"].each do |os|
  OperatingSystem.find_or_create_by_name os
end

Note that we’re using find_or_create_by_name so that the models are only created if they don’t already exist, meaning that the seed data file can be run more than once and won’t repeatedly create the same operating systems.

Another example of the sort of data you might want to seed an application with is a list of countries for an address form. To do this we’ll generate another model for a country that has a name and a code.

          terminal
        
script/generate model country name:string code:string

Entering all of the country data would be fairly tedious, even if we only have to do it once. Fortunately at this URL is a text file containing a list of country codes an names separated by a vertical bar.

AF|Afghanistan
AL|Albania
DZ|Algeria
AS|American Samoa
AD|Andorra
…

We can use the data in this file to populate our Country model.

          ruby
        
Country.delete_all
open("http://openconcept.ca/sites/openconcept.ca/files/country_code_drupal_0.txt") do |countries|
  countries.read.each_line do |country|
    code, name = country.chomp.split("|")
    Country.create!(:name => name, :code => code)
  end
end

This time we’re populating the data in a slightly different way. First we delete any existing countries, then open the text file and loop through each line in it, creating a country from the code and name. This provides a quick and simple way to populate the country models. The code above uses OpenURI to get the file so for it to work we’ll need to require it at the top of the file for it to work.

          ruby
        
require 'open-uri'

Now that we’ve written our seed script we can run it to see if it works. Before we do we’ll need to run our migration file to create the two models.

          terminal
        
rake db:migrate

Then we can run our seed task.

          terminal
        
rake db:seed

This will take a couple of seconds to run and when it finishes our database will be populated. We can check this by running script/console.

Our operating systems are there:

          terminal
        
>> OperatingSystem.all
+----+----------+-------------------------+-------------------------+
| id | name     | created_at              | updated_at              |
+----+----------+-------------------------+-------------------------+
| 1  | Linux    | 2009-09-14 20:55:20 UTC | 2009-09-14 20:55:20 UTC |
| 2  | Mac OS X | 2009-09-14 20:55:20 UTC | 2009-09-14 20:55:20 UTC |
| 3  | Windows  | 2009-09-14 20:55:20 UTC | 2009-09-14 20:55:20 UTC |
+----+----------+-------------------------+-------------------------+
3 rows in set

And so are the countries.

          terminal
        
>> Country.all

+-----+---------------------+------+---------------------+---------------------+
| id  | name                | code | created_at          | updated_at          |
+-----+---------------------+------+---------------------+---------------------+
| 1   | Afghanistan         | AF   | 2009-09-14 21:03... | 2009-09-14 21:03... |
| 2   | Albania             | AL   | 2009-09-14 21:03... | 2009-09-14 21:03... |
| 3   | Algeria             | DZ   | 2009-09-14 21:03... | 2009-09-14 21:03... |
| 4   | American Samoa      | AS   | 2009-09-14 21:03... | 2009-09-14 21:03... |
| 5   | Andorra             | AD   | 2009-09-14 21:03... | 2009-09-14 21:03... |

Fixtures

We’ll finish this episode with a final tip. If your application already has fixtures which contain the data you want to use as seed data, you can use this as the basis for your seed data.

Say we have the following seed data in our /test/fixtures/operating_systems.yml file.

          ruby
        
# Read about fixtures at http://ar.rubyonrails.org/classes/Fixtures.html

windows:
 name: Windows

mac:
 name: Mac OS X

linux:
 name: Linux

We can import it by replacing the code that generates the operating systems in seeds.rb with this.

          ruby
        
require 'active_record/fixtures'

Fixtures.create_fixtures("#{Rails.root}/test/fixtures", "operating_systems")

If we re-run our seed task the operating system models will be recreated.

          terminal
        
>> OperatingSystem.all
+------------+----------+-------------------------+-------------------------+
| id         | name     | created_at              | updated_at              |
+------------+----------+-------------------------+-------------------------+
| 303122256  | Linux    | 2009-09-14 21:28:31 UTC | 2009-09-14 21:28:31 UTC |
| 387181413  | Mac OS X | 2009-09-14 21:28:31 UTC | 2009-09-14 21:28:31 UTC |
| 1676117404 | Windows  | 2009-09-14 21:28:31 UTC | 2009-09-14 21:28:31 UTC |
+------------+----------+-------------------------+-------------------------+
3 rows in set

There is one noticeable difference when getting the data from the fixture file: the ids are rather wonky. This is because of the way that fixtures generate ids when they are not explicitly specified in the fixture file.

There is some controversy among developers about what exactly constitutes seed data. For some it’s the minimum amount of data that is required to get an application functioning, while others like to add user records and other user-generated content. If you want to keep the data that is absolutely necessary to get an application set up separate from any other data it might need you can use a rake task. Episode 126 used this approach to populate a database with a large amount of test data. If you’re trying to test the performance of your application when it has a lot of data, or simulate what it will look like when it has a large amount of data this is an excellent approach.

While we’re on the topic of seed data it’s worth looking at the seed-fu library which is another way of generating seed data. If you’re not using the latest version of Rails then this provides a useful alternative way of generating seed data. Alternatively the BootStrapper library is also well worth investigating.