Sign in through GitHub

RailsCasts Pro episodes are now free!

Learn more or hide this

Valentin Gololobov's Profile

GitHub User: vgololobov

Comments by Valentin Gololobov


Because it's faster
find_or_initialize_by_id - is a dinamic finder, it uses SQL request + method_missing chain to return the object
Class hierarhy traversal (to hit correct method_missing) is much much slower than "|| new"


Delayed Job effectively solves only response delay problem.

If you plan to use Active record for data import - expect poor performance
insertion of 25k records using AR takes about 10 minutes

A little bit better situation with
insertion of 25k records takes about 3 minutes

Fastest way is to build and execute raw sql request
insertion of 25k records takes about 30 sec

Memory efficiency is about 1-3x CSV size (slower - less memory , faster - more memory)
If you plan to import CSVs frequently , you will face inefficient garbage collection problem. To free up memory you need to kill Dj worker

We are using following scheme in production - spawn Dj worker , import couple CSV files with raw queries, respawn worker

And one more thing - don't use Delayed job with rails directly ! it will load full rails environment for each Dj worker and it's a lot of memory for no reason


This solution is not the best idea for the situation where you have large csv files (like 40k+ rows)

First of all it will parse large files synchronously - which means you have to wait for the webserver response till parsing is done

The second major problem is that each row is inserted in separate transaction - which is obviously slow as hell

So,as for me, I would not recommend to use this technic for large files

One more thing - if gem you using for CSV parsing loads entire file into memory - this memory won't be freed up after conversion (it is one of the major Ruby's problems)