#396 Importing CSV and Excel
Dec 08, 2012 | 10 minutes |
Plugins
Allow users to import records into the database by uploading a CSV or Excel document. Here I show how to use Roo to parse these files and present a solution for validations.
- Download:
- source codeProject Files in Zip (90.8 KB)
- mp4Full Size H.264 Video (24.6 MB)
- m4vSmaller H.264 Video (12 MB)
- webmFull Size VP8 Video (14.8 MB)
- ogvFull Size Theora Video (29.5 MB)
I ran into a problem immediately, and from what I can tell it is because I am using a pg_search enabled Document model I am trying to import. I also have another form_tag that I use to search Documents on the index page. Everytime I try to import I am getting.
If I remove the include PgSearch section from the Document model the CSV import works perfectly fine. How can I resolve this issue?
The relevant portion from Document.rb:
Update. I went a little farther into the Screencast to see if the code would eventually work. After failing on the very first import, I spit out a CSV file with the IDs, and all columns and with the following code, it works now.
Thanks memoht
Why do
require 'csv'
in application.rb and not in product.rb where it's actually used?I have the same concern, is something about performance or better practices? Anyone?
Thanks for the episode!
Great example, how would we accommodate custom mapping vs static header mapping?
This solution is not the best idea for the situation where you have large csv files (like 40k+ rows)
First of all it will parse large files synchronously - which means you have to wait for the webserver response till parsing is done
The second major problem is that each row is inserted in separate transaction - which is obviously slow as hell
So,as for me, I would not recommend to use this technic for large files
One more thing - if gem you using for CSV parsing loads entire file into memory - this memory won't be freed up after conversion (it is one of the major Ruby's problems)
what do you think about using this solution mixed whit delayed job for large cvs?
Delayed Job effectively solves only response delay problem.
If you plan to use Active record for data import - expect poor performance
insertion of 25k records using AR takes about 10 minutes
A little bit better situation with https://github.com/zdennis/activerecord-import
insertion of 25k records takes about 3 minutes
Fastest way is to build and execute raw sql request
insertion of 25k records takes about 30 sec
Memory efficiency is about 1-3x CSV size (slower - less memory , faster - more memory)
If you plan to import CSVs frequently , you will face inefficient garbage collection problem. To free up memory you need to kill Dj worker
We are using following scheme in production - spawn Dj worker , import couple CSV files with raw queries, respawn worker
And one more thing - don't use Delayed job with rails directly ! it will load full rails environment for each Dj worker and it's a lot of memory for no reason
How do I not use delayed_job directly with rails?
Please expand on not using Delayed_job with rails directly
You might also want to check out this gem for importing CSV:
https://github.com/tilo/smarter_csv
it addresses the issue of reading large CSV files (I tried it with more than a million rows), and can process them in chunks, e.g. for creating Resque jobs (if you want to de-couple the processing from the CSV-file reading).
Why don't you use the find or create syntax in the product model instead of doing find || new ?
There is also a gem called CsvRecord that serves as a persistance layer between csv files and Ruby classes, either importing or exporting. It also implements validation and associations and it can be used alongside ActiveRecord.
https://github.com/lukasalexandre/csv_record
Why would you use
over
Because it's faster
find_or_initialize_by_id - is a dinamic finder, it uses SQL request + method_missing chain to return the object
Class hierarhy traversal (to hit correct method_missing) is much much slower than "|| new"
I built a Rails-based client portal for a client, and I initially used Roo to permit them to upload the spreadsheets full of data that the portal needed to parse. I later switched to RubyXL. I ran into severe constraints with both packages, on large spreadsheets (where "large" means more than 50,000 rows of data, in my case). Both packages performed poorly with large XLSX files, which stands to reason, given how those files are encoded. I had memory use issues and CPU spikes, severe at times. These issues were especially challenging on the production system (a Linode VPS).
My ultimate solution was to use the Gnumeric spreadsheet tool "ssconvert" to convert the XLSX files to CSV format; from there, the standard Ruby CSV module worked fine. The same Resque job that initially used Roo to parse the spreadsheets just runs "ssconvert" and produces CSVs in a temp directory, before continuing its work. This solution isn't always suitable, of course; if you need to preserve the formulae in the spreadsheets, CSV won't work. In my case, I just needed the data in the columns.
Roo and RubyXL are great for some purposes. In fact, I'd have preferred to use them, rather than my hack solution. But my wishes were thwarted by the sheer size of the spreadsheets I had to import. If you run into a problem like that, using "ssconvert" to convert the Excel spreadsheets to CSV might help.
Thanks for the tip. I'm also interested in parsing tens of thousands of lines of excel data in rails.
Could you say more about how you set up Gnumeric/ssconvert on your production server? We're running Ubuntu 12 and I'm not seeing an obvious way to get the command-line ssconvert program installed without installing Gnumeric and all of the Gnome packages it depends on.
I'm the only one who have encodings issue? There's a method to force encoding to roo?
Pass encoding options to CSV is possible,
but with roo?
I've also had many frustrating experiences with encoding issues in user generated files. You might want to look at https://github.com/chrisgogreen/roo which allows CSV options, or my fork which incorporates CSV options and implements filthy brute force encoding to UTF-8: https://github.com/aew/roo.
Speaking of which - I've only ever been able to overcome pc/mac encoding issues in user generated files using iconv. String#encode or force_encoding don't seem to do it. I must be missing something as I've run into this repeatedly on different projects and have yet to figure out the "rails way" of dealing with it. I guess I'll keep staring at iconv deprecation warnings for the foreseeable future.
Agree with other comments on large files - roo / active record will be slow and have a large memory footprint which can get you in trouble using delayed_job on Heroku.
You might also want to check out this gem for importing CSV:
https://github.com/tilo/smarter_csv
It gives more control over how the CSV is imported, e.g. can manipulate, re-write, or replace column-headers; it can read data in chunks for more efficient post-processing (e.g. with Resque); it can deal with non-standard record separators; and more..
Thanks Ryan!!! Nice Episode..
A small question when I was watching:
What plugin do you use to swap between different applications(Safari, Terminal, and so on)? Currently, I used Quicksilver, but sometimes it conflicts with VI command. Any clues?
Just wanted to point out to everyone - if you happen to be using the google-api-client Roo has a name conflict with 'Google' and breaks the api client.
Took a little while to track that down..
Great topic.
One problem I've had with importing CSV files from real users is often they aren't always valid, i.e., they include extended characters, are encoded weirdly, or are otherwise malformed leading to many parsing errors. Finally if they include headers at all they do not match that names of the attributes in the Rails model.
The users who send me files like these are also the least technical so hardest for them to clean up themselves or understand why files are invalid.
Would like to see more about error handling, or know what people do with these real world messy files. Thanks.
check out smarter_csv for easy renaming of the headers to match your model.
Agreed - this seems like a big issue per my comment above, and I'm really curious as to how an experienced Rails dev would tackle it.
This works excellent for Excel files.
But how do I import a csv file that have ";" as a column separator instead of the default value ","?
I'd like to know this also, I'm often given tab separated files and various text enclosures.
check out the other Gem I mentioned in a previous comment - it gives full control over delimiters, column headers, etc..
Thanks for this fork! https://github.com/chrisgogreen/roo
Thanks for uploading this video, but how do I parse tab delimited instead of comma ?
It'll be something like:
CSV.parse("text you are processing", column separator, row separator)
For anybody that's getting bitten by the "redundant UTF-8 sequence" bug when attempting to upload an Excel file (which is likely causing the error Ryan refers to), a fix has been implemented in the 3-2-stable branch of rails, but not released yet. In order to get this goodness into your own app, simply create this file:
...and be sure te
require
it. You should be good to go, after that.Im having problems getting it to import without adding in new records rather than updating existing. Sometimes it doesn't appear to be doing anything. I cant figure why though!
Do you mean that it adds the same data in the file again instead of updating? If so, Change the find_by in the model to another field instead of id. For me, I use first name & last name seeing as this would not be one of the changes made to the uploaded file.
I'm trying to import an excel xml that was created like 362-exporting-csv-and-excel http://railscasts.com/episodes/362-exporting-csv-and-excel?view=asciicast
I get the following error
c:/ruby/lib/ruby/gems/1.8/gems/parseexcel-0.5.2/lib/parseexcel/
olestorage.rb:354:in `get_header': OLE::UnknownFormatError
(OLE::UnknownFormatError)
ruby-roo/ parseexcel cannot read the excel xml file type?
hi there i m try too did you solve or find any way import product from large xml file if yes please let me share with thanks you can check my this link where i can import product form small file not from large file
http://stackoverflow.com/questions/36556532/how-to-import-20-thousand-product-from-large-xml-file-in-a-rails-app
I switched over to strong_parameters and this has broken the basic CSV import method outlined in this screencast I am using on a Documents model. Wondering what syntax I would need to use to fix this line to work with strong_parameters.
document.attributes = row.to_hash.slice(*accessible_attributes)
I don't know if you still need... I used with strong parameters with this:
and it worked. I don't know if it is optimal
I am trying to import largish Excel files (~2MB) that are generated by an analysis program that we run. I am running into an issue where the open operation is exhausting the memory on my computer (I start with about 8 GiB free).
I have tried both Roo and the underlying Spreadsheet gem. Has anyone had any luck opening larger files?
Turns out the issue was really with irb:
https://github.com/zdavatz/spreadsheet/issues/8
Seems to work fine from scripts.
After import I need to populate one of the columns with the foreign key so that I can associate each row uploaded from the file with the user that uploaded it. However, the foreign key (restuarant_id) is stored in a session when the user logs in. seeing that I can't or should't access sessions in the model, how do I accomplish this?
Try this, it worked for me.
http://stackoverflow.com/questions/17719660/rails-4-csv-import-and-setting-a-value-to-a-key-value
I get the error uninitialized constant Model::Excel. Anyone else have this problem?
I found a fix for it just now. I called up Excel/Csv classes from the Roo object. Seems to be working ok
Thank you for this! I was getting very frustrated with not being able to get even the example to work correctly and this solved my problem.
Thanks for this tip!
Yeah, i have it too, but with Excelx. The problem occurs with ruby 1.9.3, but not in 1.9.2 Anyone know about this?
Use:
Just checked on both 1.9.2 and 1.9.3, works fine.
Hi,
I tried
But i get the following error.
wrong number of arguments (3 for 2)
I keep getting the same error: wrong number of arguments (3 for 2). Did you end up finding a solution to this?
try to format it like this
when ".csv" then Roo::Csv.new(file.path)
this is the rubydoc for roo
http://rubydoc.info/gems/roo/frames
http://ericlondon.com/posts/246-importing-and-converting-csv-data-into-a-rails-model-with-a-slick-active-admin-interface
how to use this functionality to import data for model associations
This is my case
class SampleRequest < ActiveRecord::Base
has_one :sample_request_text_excerpt
accepts_nested_attributes_for :sample_request_text_excerpt
end
class SampleRequestTextExcerpt < ActiveRecord::Base
belongs_to :sample_request
end
In the sample_request model, asset type field would be text and image , for text type different excel sheet and for image type different excel sheet , only certain field would be extra in the sample_request_text_excerpt model.
For single model no issues , while try to use for associations
this line showing error
sample_request.attributes = row.to_hash
while uploading the text spreadsheet error displayed
put the below file 'import.rake' in lib/tasks
then 'rake import_hosts'
require 'csv'
task :import_hosts =>[:environment] do
file="db/tmp/hostssmall.csv"
additional_rows_to_skip = 1
CSV.foreach(file) do |row|
if additional_rows_to_skip > 0
additional_rows_to_skip-=1
else
p = Host.create!({
:ref => row[0],
:name => row[1],
:address => row[2],
}
)
end
end
end
the import is seemingly working ok for me but then the values of the attributes are saved as nil. any thoughts?
INSERT INTO "addresses" ("baths", "beds", "building_name", "city", "created_at", "date_available", "full_address", "laundry_in_building", "laundry_in_unit", "listing_notes", "parking", "pets", "price", "showing_contact", "showing_contact_number", "showing_instructions", "state", "status", "unit_number", "updated_at", "utilities_included", "zipcode") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22) RETURNING "id" [["baths", nil], ["beds", nil], ["building_name", nil], ["city", nil], ["created_at", Sun, 03 Mar 2013 16:52:17 UTC +00:00], ["date_available", nil], ["full_address", nil], ["laundry_in_building", nil], ["laundry_in_unit", nil], ["listing_notes", nil], ["parking", nil], ["pets", nil], ["price", nil], ["showing_contact", nil], ["showing_contact_number", nil], ["showing_instructions", nil], ["state", nil], ["status", nil], ["unit_number", nil], ["updated_at", Sun, 03 Mar 2013 16:52:17 UTC +00:00], ["utilities_included", nil], ["zipcode", nil]]
I just had this same issue. My problem was that I was searching for row["id"] but my spreadsheet had "ID" instead.
You know this is awesome sauce.
The most valuable thing however was
product.attributes = row.to_hash.slice(*accessible_attributes)
The thought is simple, it's mass assignment basics but would have never thought of it when importing excel files!
Super.
Great 'cast. . . I am attempting to use the same technique to import XML.
Does anyone have pointers on how to work with XML using this approach?
Thanks in advance.
hi there i m try too did you solve or find any way import product from large xml file if yes please let me share with thanks you can check my this link where i can import product form small file not from large file
http://stackoverflow.com/questions/36556532/how-to-import-20-thousand-product-from-large-xml-file-in-a-rails-app
correction with open_spreadsheet method
def self.open_spreadsheet(file)
case File.extname(file.original_filename)
when ".csv" then Roo::Csv.new(file.path, nil, :ignore)
when ".xls" then Roo::Excel.new(file.path, nil, :ignore)
when ".xlsx" then Roo::Excelx.new(file.path, nil, :ignore)
else raise "Unknown file type: #{file.original_filename}"
end
end
My import button is not showing up in any browsers. I used this snippet in my customer.rb
and this in my index.html.erb
Any thoughts?
Hammer, you need an equal sign after '<%', before 'submit_tag', like this:
'<%= submit tag "Import" %>'
Good luck!
I can't get it to work when i add this into my app.
undefined method `import' for
i have gone at it over and over -- and nothing.
-- though I can get it to work using the supplied source code.
please assist -- i have lots of content entered in good cspreadhseets taht I need to enter into my site.
thx,
sg
I'm using these steps with Rails 4 and I received this error:
undefined local variable or method `accessible_attributes' for #Class:0x007fbbd2134da0
This is because Rails 4 uses Strong Parameters checking in the controller instead of the attr_accessible method in the model.
In order to use the code in this cast for Rails 4, you'll need to replace the code snippet here:
With something that looks like this:
Disclaimer: I'm not sure this qualifies as 'The Rails Way'. If someone has a more elegant solution, please post a reply.
Hi,
What I did was
invoice.attributes = row.to_hash.slice(*Invoice.attribute_names())
I use Rails 3 and I get the error message "undefined method `original_filename' for nil:NilClass"
Thank you for your assistance
Im getting this too...
I'm new to ruby and need some guidance with a task. I have to import data from an excel spreadsheet. That currently works fine. Now I need to add new columns to the spreadsheet and those new columns belong to seperate tables in the database. So, first I need to look-up the foreign key of the value from excel and add it to that respective table. In total when the spreadsheet is imported I am dealing with three tables. How do I go about doing that? Thank you in advance for any guidance, help or examples.
I am only able to import the first column of the csv file Any suggestions?
hi all when i import csv or xls
i get below error
zip/zipfilesystem
hi all when i import csv or xls
i get below error
It's an issue between
roo gem
andrubyzip gem
.You can fix it by adding
to your
Gemfile
.I'm wondering about the best way to handle very large data imports that can take minutes to parse... The app I'm working on requires handling potentially huge imports consisting of tens/hundreds of thousands of rows.
I'm thinking it makes sense to store the CSV temporarily on upload, and do all the parsing via cron... and somehow notify the user that the import is processing.
I'm pretty new to rails, and I'm not sure the best way to execute.
Anyone have a suggestion?
Don't use Active Record but create an SQL statement so the data insert is done in a single statement:
This will take each array element and joins them with a comma to have a long single sql statement which is then executed. Make sure you don't have any single quotes in your incoming data. When using postgres you can replace them with two single quotes '' to allow them to be used as attribute values.
ya thanks sagescs
Now csv working fine but when i import Excel(xls) i get below error
invalid byte sequence in UTF-8
uploaded file disappears if validation fails. Is there anyway to handle this as It is not user friendly to ask user to upload file every time validation fails.
Anyone have any tips for adding a foreign key? Say in Ryan's example, Factory had many Products, and one wanted to upload Products from a CSV that did not contain a foreign key to the Factory each Product belonged to. Thanks!
My CSV file contains this type of symbol ᅠ and when i tried to import this file, the place of ᅠ is replaced with ? i.e, question mark and gets stored in database.
For example if this text is in my csv file "ᅠDinesh", upon import it gets stored as "?Dinesh".
Please help me out.
We created a gem at Continuum that provides a DSL on top of Roo for defining data importers. It's called active_importer and we think it could be useful for everyone. It's still a young project, and contributions are very much welcome.
I'm having a problem when I try to import a csv file with more than one column. I get a mass assignment security error in spite of the fact that I have all the attributes listed in my attar_accessible statement. The error seems to list all the attributes except the very first column of my csv file. Any ideas? My code is the same as Ryan's with the exception that I am using a model named "Customer.rb" instead of "Product.rb"
I figured out my problem. I had spaces after the commas in my csv file. Once removed, I no longer had the problem.
undefined method `valid?' for 2:Fixnum
I'm getting this error blocked in here, do you have any suggestion, i'm using rails 4
def save
if imported_departments.map(&:valid?).all?
imported_departments.each(&:save!)
true
else
Mohamed Sami there're 2 ways to pass your issue
just use
collect
instead ofeach
try adding each product to an array, then just return it
Hello,
I tried to import an excel file but I got this error:Couldn't find User without an ID. I am using rails 4 & ruby 2.1.1.
I have the very same problem. Did you find a solution?
I followed this video to import CSV file, but I cannot insert data from file to table:
SQL (0.6ms) INSERT INTO
products
(created_at
,id
,updated_at
) VALUES ('2014-04-07 03:16:07', 27, '2014-04-07 03:16:07')other fields were not insert in this command. I'm using rails 4 & ruby 2.0
Is .to_hash totally redundant? Isn't row already a hash when product.attributes is assigned it's value?
Exclusively in regard to excel sheets, I understand why it's necessary for csv files.
Is it possible to change the default line separator from newline '\n' or '\r\n' to some other character, e.g '|' while parsing csv
hi i am using rails 4.1.2, ruby 2.0.0
I upgraded the application form rails 3.2.14 to rails 4.1.2
ActiveModel::ForbiddenAttributesError in AAAAAImportsController#create
ActiveModel::ForbiddenAttributesError
in my controller and removed attr_accessible in model
Mr. Ryan Bates, may you please revisit this topic with a Rails4Ruby2-compatible solution, establishing strong parameters (without accessible_attributes) and up-to-date methods (without deprecated .save! method). Thank you in advance. Much appreciated!
Otherwise, at .save! position, I keep getting the following error: "Validation failed: User can't be blank, Content can't be blank"
If I change to .save, then everything goes through without errors, but NOTHING gets saved, since I have a table placed right under my Import button and there are no new entries visible. Also, PGAdmin Database entries confirm that nothing gets stored. Please help!
Can someone help me with this?
Extracted source (around line #3):
/app/models/product.rb:17: syntax error, unexpected $end, expecting keyword_end
class ProductsController < ApplicationController
def index
3 @products = Product.order(:name)
respond_to do |format|
format.html
format.csv { send_data @products.to_csv }
Great example! Is there a way to control macros in xls files?
I used this Code for Rails 4 and it is creating the rows in the DB, but all values (url, pr, etc) are nil. Any suggestions what I have done wrong?
i got the mistake. I installed the gem 'protected attributes'. I also forgot to define the attr_accessible. The working version in Rails 4 looks so:
Hi admin. I use Ruby 1.9.3 When I run this code. I met warning:
'''
Warning:C:/Users/T410/Desktop/chinsu/config/application.rb:3:in `': iconv will be deprecated in the future, use String#encode instead.'''
And when I lick button import in browser I met Error:
'''
Errno::EBADF in ProductImportsController#create
Bad file descriptor - C:/Users/T410/AppData/Local/Temp/RackMultipart20141104-5028-kjmbt6
app/models/product_import.rb:49:in
new'
open_spreadsheet'app/models/product_import.rb:49:in
app/models/product_import.rb:36:in
load_imported_products'
imported_products'app/models/product_import.rb:32:in
app/models/product_import.rb:18:in
save'
create'app/controllers/product_imports_controller.rb:8:in
'''
Hello!
Please help!
I got this error: https://www.evernote.com/shard/s447/sh/bc54becd-2c39-4ec7-b139-f9defc7651f0/9caea62319ac2ecad30ac0738931a649
Had to remove
require iconv
fromconfig/application.rb
to make this compile (with either ruby 1.9.2 or 2.0.0).Most works fine, but when reading back the xsl file written with the example, I will get
OLE2 signature is invalid
Ok, I learned that this is a format problem (still think it's funny that I writes a format it cannot read).
So I open the xls file with libreoffice, and save it as 2007 xls.
Now I can read the file (no OLE problem anymore), but I get
Row 2: Price can't be blank
and so forth for every other line.
When opening the file from libreoffice, it has the prices in the price row.
I've added all of the code up untill about 3:30 min in and it's successfully uploading the csv file but the rows aren't getting associated with a user.
I'm really struggling to figure out how to get this to work. Any help will be appreciated!
I've posted a full description of the issue to stack overflow.
http://stackoverflow.com/questions/28347026/associating-rows-from-uploaded-csv-files-with-a-user-in-rails
can someone help me?
How is it possible to add the filename as a new column in the database?
thanks for this, didn't expect I wouldn't need to create an unnecessary model and use any file storage gems. very informative on how params work as well.
Okay, so I am using the Carrierwave gem to upload the file, how would the process of importing the file then differ? He was just storing it in the temporary files, but I am storing the files using Carrierwave. I would this change the import method in the controller?
I was really struggling with CSV import and this thread has really helped me out... thanks a lot!
how i can upload product from XML file is there any advice or video thanks or some code thanks advance
Excelx.new(file.path, nil, :ignore) - has error!
Required <= 2: filename_or_stream, options ...i can't find a options names...
This episode has been updated to Rails 5 as a blog post Import Records from CSV and Excel in Rails 5
This is now outdated video. I am getting a lot of errors.
e.g undefined method `original_filename' for nil:NilClass.
It needs to be updated.
Hi. will this "Importing CSV and Excel source code" work on ruby version 1.8.7 and rails 3.0.10?
How to Import Image from excel file to Rails data base?
in excel file i store image(not Url).
like i have 2 column
Name Profile_pic
xyz Image of user
pls reply fast
undefined method `path' for "test.csv":String Did you mean? pathmap
def self.import(file)
CSV.foreach(file.path, headers: true) do |row|
puts "upload"
end
end