#126 Populating a Database
Sep 08, 2008 | 8 minutes | Active Record, Plugins
Have you ever wanted to fill up a database with a lot of test data? See how to do that in this episode using the populator and faker gems.
- Download:
- source code
- mp4
- m4v
- webm
- ogv
Good work, Ryan. I've done something similar to the "populator" gem on the last on the last few projects I've done, but your interface is much more pleasent, and I'll be using it in the future. Thanks.
This is great. For the longest time I relied on my spreadsheet data based approach to generating random names as test data.
http://www.markrichman.com/2007/09/26/generating-random-names-as-test-data/
Cool!
I wrote an article about this a while back (http://railspikes.com/2008/1/25/fuzzing-your-database) but it was a more random approach. I like the approach of using Faker to generate more realistic data.
Wow, I was just thinking about creating some loops to do this very thing to do some sort of stress testing on my slice to see how long different approaches would take. Perfect timing, thanks!
I'm getting this error:
rake aborted!
Could not find RubyGem echoe (>= 0)
I'm pretty sure I installed the gem properly.
Ren, I have the same problem, and I also believe I installed the gem correctly. Have you updated your rubygems system too?
Having the same problem here...
Try putting this in your environment.rb file:
Rails::Initializer.run do |config|
config.gem 'populator'
end
@Ren & @Neil, do a gem install echoe...should install and you'll be good to go. I had the same issue, maybe he doesn't have it requiring it as a dependency upon installation.
Looks nice. I usually do this with a fixture http://pastie.org/268729 , but you'd still need something for really random data... this works well. :)
@James
Thanks! Didn't even consider "echoe" to be a gem. It works. Thanks Ryan for an awesome tutorial/gem. Mucho helpful.
Echoe isn't supposed to be a gem dependency, but looks like something is hosed in my gem config. I think I fixed it, so the latest populator version (0.2.4) should work without echoe installed.
Thanks for the great screencast, Ryan. I have a question, however, concerning how you would populate a habtm (has_many :through is possible though) association? In your example, Category-to-Products is a one-to-many, but what if you modeled products to be part of many categories? Nesting the two like you have in your example wouldn't work in that case. Maybe you already came across this problem and have a nice solution? :)
@Greg, currently HABTM associations aren't supported by Populator, I'll work on adding that though. Thanks. :)
Great technique, just a question. Is there any support for counter caches?
sthapit: ping it:
PING railscasts.com (75.127.77.214) 56(84) bytes of data.
64 bytes from rm-75-127-77-214.railsmachina.com (75.127.77.214): icmp_seq=1 ttl=47 time=36.2 ms
Hi
If I add to many records at the same time, MySQL give me this error through the task:
Mysql::Error: Got a packet bigger than 'max_allowed_packet' bytes
I know that max_allowed_packet is a parameter into MySQL, but do you know if there is a way to tell Populator to insert the records by steps instead of waiting at the end to do a big insert ?
Thanks !
@Pascal: Just had the same problem. There is an option to #populate called :per_query that limits how many records are inserted per query.
@sthapit I used to use slicehost and had no problems, but Rails Machine was kind enough to host it for free. :)
@Fredd, you'll need to set the counter caches directly. Usually this is possible with a query at the end to count the associated models. I'll look into providing a more convenient way to do this.
@Pascal, the :per_query option will do this, it defaults to 1000 so I may need to lower this. It will vary depending on the size of each record.
@Ryan,
I'm at slicehost now and I'm happy with what I get for my $$, but I'm also curious about other providers. Ignoring the price difference (or what it would be if you had to pay for it) how would you compare the two for a production rails app? What kind of slice (size and OS) were you running railscasts on before?
@Carl, I'm just running on a 256 MB slice, both here and at slicehost. Both hosts fit my needs equally well since this is such a simple site. Rails Machine offers quite a bit more with their plan so you should look into it and see if it fits your needs.
Ryan, this is really a great screencast! Thanx for it and keep up the good work!
How can I use this to load up a bunch of dummy users using restful_authentication?
I can't seem to set user.password = 'foo' here:
Account.populate 200 do |account|
account.name = Faker::Company.name
account.email = Faker::Internet.email
account.created_at = 2.years.ago..Time.now
User.populate 10 do |user|
user.login = Faker::Internet.user_name
user.password = 'foo'
user.first_name = Faker::Name.name
user.last_name = Faker::Name.name
user.email = Faker::Internet.email
user.account_id = account.id
end
end
@Mark,
I wonder if that's because populator is working directly with the database for speed and bypassing the virtual attribute? Does it give you an error or just not set anything for the password, or just not create any users?
@Mark, Carl is correct, the "account" passed to the block is not a true instance of account. You can only set columns direction, not through virtual attributes.
I'm thinking of changing how this is done so it allows you to use virtual attributes as well. But it doesn't work yet.
Hey, this is a good idea. I just cloned Faker and spent a little time in the wee hours of the morning adding a couple features. I'll likely be adding a bit more too. I haven't created a patch yet for the SVN version, but I'll get around to it quickly.
Also I need to set up a gemspec so that it can be used with GitHub's gem thingy, but until then:
http://github.com/darrylring/faker/tree/master
It would be nice to have the ability to supply a max length to Populator.words, Populator.sentences etc. as otherwise I think you'd hit the limits on some columns.
I thought I was hitting that problem but it appears that I'm hitting a problem with duplicate keys although it doesn't make any sense as the queries that it's complaining about aren't duplicating keys from what I can see.
I figured out the duplicate key issue, it would be nice to be able to tell populator about what should be unique within a database.
Hi, and thanks for the great screencast! I am having the following error when I run the rake:
rake aborted!
no such file to load -- spec/rake/spectask
So I read all the comments and tried:
sudo gem install echoe
and got:
ERROR: Error installing echoe:
echoe requires RubyGems version = 1.2
I am running this on a fresh Rails 2.1 app and have not had any proplems with RubyGems before... Any suggestions?
Here's how I'm trying to populate now... http://pastie.org/276597
I know the last couple of lines won't work, and I know why. I'm just demonstrating how slick it would be if I could. Do you have a github project for the gem, or anything? I'd be happy to help accomplish that if you'd like.
Also, it would be slick if there was something available along the lines of "Populator.bool" that would return an array of true and false, so I don't have to set that up manually in my task.
Thanks a ton for this plugin, it's way slick. Hit me on twitter or GT if there's any way I can contribute. :)
And of course, as soon as I submit it, I find the github link. :)
Is there any support for min/max or high/low fields? I have a model that has a price range that is stored as 2 separate fields. I can set pkg.min_price = 1..100 and pkg.max_price = 101..200, but that's not ideal.
restful_authentication X 2 I would like to know the best approach for this as well
Here's a quick example of how you can use Populator and Faker to create valid records for use in demos, acceptance testing, etc:
http://almosteffortless.com/2008/09/27/creating-valid-records-with-populator-and-faker/
hello, great as usual!
i have a small issue though with the following code
deal.name = Populator.words(4..9).titleize
deal.permalink = PermalinkFu.escape(deal.name)
this code dose correctly set the permalink but it also sets deal.name to be a permalink, any thoughts why??
This is awesome - can't wait to try it!
I do have one issue. This is more of a general question, but...
In one of my applications, a number of my models have one uploaded file (ex. User has one uploaded_image). I'm using attachment_fu, which provides an "uploaded_data=" setter that works perfectly via the web interface. However, I can't seem to get this working in the rake task. I'm pretty sure that uploaded_data= expects a binary file, but none of my attempts with File, Tempfile, StringIO, or ActionController::UploadedTempfile have worked.
Is there a way to create a model with attachments via a rake task?
Is there a way to get populator to add records to a HABTM relationship?
Two questions:
1) How do I enforce that this db:populate only gets ran on 'test' environment? Don't want to accidently do it on prod :)
2) What about test scripts that rely on data to be the same, is there a better way of writing tests that don't rely on the data itself in the db? (ie: selenium tests for UI)
For whatever reason, I get a TON of failures during testing phase:
$ rake db:populate
Ftest
F
Finished in 0.872049 seconds.
1) Failure:
test_should_create_directory(ThingControllerTest)
[/usr/lib/ruby/gems/1.8/gems/activesupport-
... about 100 failures form this pint.
You can make this work with restful authentication by setting user.salt and user.crypted_password. Pull the encryption methods from restful authentication:
<script src='http://pastie.org/319672.js'></script>
Then populate your user table using these methods:
<script src='http://pastie.org/319680.js'></script>
Argh, pastie...
Pull the encryption methods from restful authentication.
#These are needed to create the salt for user passwords
def secure_digest(*args)
Digest::SHA1.hexdigest(args.flatten.join('--'))
end
def make_token
secure_digest(Time.now, (1..10).map{ rand.to_s })
end
#This is needed to encrypt the user password
def password_digest(password, salt)
digest = REST_AUTH_SITE_KEY
REST_AUTH_DIGEST_STRETCHES.times do
digest = secure_digest(digest, salt, password, REST_AUTH_SITE_KEY)
end
digest
end
Then populate your user table using these methods:
User.populate 10 do |user|
user.email = Faker::Internet.email
user.salt = make_token
user.crypted_password = password_digest('mypassword',user.salt)
end
Thanks very much for this tutorial. I was just starting to use migrations to load a bunch of development data, and it was getting very frustrating.
I've been using Populator and Faker for a few weeks now and have noticed an issue with modifying tables in the schema. Populator seems to work fine on tables after their creation, but not after they are modified in future migrations.
For example, I have a `distributors` table I later added a `country` column to. I updated my populator to set the country value, but on running my drop/migrate/populate command, it kept failing saying "country" was not a method.
I fixed this problem by adding a new line after your "delete_all" line that clears out all your data, to call "reset_column_information" before running all of my populator loops.
populator_models = [Posts, Distributors, ...]
populator_models.each(&:delete_all)
populator_models.each(&:reset_column_information)
On another note, I find it very useful to give certain data a chance to be nil. For example, when populating data for a `contacts` table:
contact.name = [Faker::Name.name, nil]
Or, for when I want it to be more rare a company has a fax number:
contact.fax = Faker::PhoneNumber.phone_number unless rand(5) == 0
Cheers!
Thanks, Using it for current project.
It doesn't work with HABTM.
Ryan,
Thank you for this great plugin, I'm using it for a project I'm working on now. But I really miss the HABTM feature. What are your plans for implementing it?
Is it possible to populate a field with a Float value?
Thanks for the great video and gem!
off topic, but one of your ads says "Powered by Railsmachine". Isn't railscasts being run on slicehost?
Ryan, you might want to fix your RDOCs for the Gem. You have the same "from_now" mistake in them. :)
Otherwise, great screen cast, as always.
PS: I love Populator (and Faker).
Great screencast Ryan! Are there still any plans to support habtm relations since the last time it was suggested?
I came up with a pretty good workaround for the HABTM problem.
If you nest the relationships and then inside the lower level, at the end add an ActiveRecord::Base.execute command like this http://gist.github.com/176652 , it creates a single record for the join table that makes habtm work well.
Thanks you so much for your gist, Jason. Really helpful solution!
Thank you - this is a great webcast!
One idea for improvement is to support counter cache. Say in your example above in the Product model you had:
belongs_to :category, :counter_cache => true
then it would be nice if while adding products to a category the counter would be incremented automatically.
Ryan,
Thanks for the populator gem. In case anyone else is looking for HTML formatted paragraphs, the following method addition to random.rb is what I use:
def hparagraphs(total)
(1..interpret_value(total)).map do
"<p>"
sentences(3..8).capitalize
end.join(".</p>")
end
Hi!
I know this screencast is a bit old, but for those interested in Faker
The Faker gem is a bit outdated now, try the ffaker gem instead.
gem install ffaker
I have found some useful class methods (in Faker Module) in the documentation:
letterify numerify and bothify, which will convert your placeholders in your string in alphabetical characters, numerical or both.
<script src="http://gist.github.com/558734.js?file=gistfile1.rb"></script>
Thus, you can really customize your test data to your needs.
:)
Bug with SQLite 3 ?
http://github.com/ryanb/populator/issuesearch?state=open&q=undefined#issue/8
Works for me !
no such file to load -- populator
I'm running on Mac OSX
same here
Same here... Any idea?
I have the error:
Don't know how to build task 'environment'
What can I do ... and where can I specify my Database and port ?
I'm using this too, but I wonder:
What's the best way to prevent clobbering your production database with the db:populate task?
Would you still recommend this or Factory Girl? https://github.com/thoughtbot/factory_girl/
I noticed this screencast is pretty old. As some have mentioned, there's a faster faker (ffaker) now.
When I search through repos on GitHub, I noticed a lot of people are using the ffaker gem in their specs/ directory.
I'm new to Rails, and I'm not familiar with the specs/ directory yet. Is the info in this screencast still a good way to populate your DB with random fake data?
@Dennis I'd also like to know this. I'm new to Rails and looking for the best way to populate my DB with fake data. Rails seems to have come a long way, so I'd like to be sure.
Just stumbled upon this - well done. Very helpful - thank you!