Because it's faster
find_or_initialize_by_id - is a dinamic finder, it uses SQL request + method_missing chain to return the object
Class hierarhy traversal (to hit correct method_missing) is much much slower than "|| new"
Fastest way is to build and execute raw sql request
insertion of 25k records takes about 30 sec
Memory efficiency is about 1-3x CSV size (slower - less memory , faster - more memory)
If you plan to import CSVs frequently , you will face inefficient garbage collection problem. To free up memory you need to kill Dj worker
We are using following scheme in production - spawn Dj worker , import couple CSV files with raw queries, respawn worker
And one more thing - don't use Delayed job with rails directly ! it will load full rails environment for each Dj worker and it's a lot of memory for no reason
This solution is not the best idea for the situation where you have large csv files (like 40k+ rows)
First of all it will parse large files synchronously - which means you have to wait for the webserver response till parsing is done
The second major problem is that each row is inserted in separate transaction - which is obviously slow as hell
So,as for me, I would not recommend to use this technic for large files
One more thing - if gem you using for CSV parsing loads entire file into memory - this memory won't be freed up after conversion (it is one of the major Ruby's problems)
Because it's faster
find_or_initialize_by_id - is a dinamic finder, it uses SQL request + method_missing chain to return the object
Class hierarhy traversal (to hit correct method_missing) is much much slower than "|| new"
Delayed Job effectively solves only response delay problem.
If you plan to use Active record for data import - expect poor performance
insertion of 25k records using AR takes about 10 minutes
A little bit better situation with https://github.com/zdennis/activerecord-import
insertion of 25k records takes about 3 minutes
Fastest way is to build and execute raw sql request
insertion of 25k records takes about 30 sec
Memory efficiency is about 1-3x CSV size (slower - less memory , faster - more memory)
If you plan to import CSVs frequently , you will face inefficient garbage collection problem. To free up memory you need to kill Dj worker
We are using following scheme in production - spawn Dj worker , import couple CSV files with raw queries, respawn worker
And one more thing - don't use Delayed job with rails directly ! it will load full rails environment for each Dj worker and it's a lot of memory for no reason
This solution is not the best idea for the situation where you have large csv files (like 40k+ rows)
First of all it will parse large files synchronously - which means you have to wait for the webserver response till parsing is done
The second major problem is that each row is inserted in separate transaction - which is obviously slow as hell
So,as for me, I would not recommend to use this technic for large files
One more thing - if gem you using for CSV parsing loads entire file into memory - this memory won't be freed up after conversion (it is one of the major Ruby's problems)