Don't use ActiveRecord for Bulk Operations
Everybody knows that this line :
Model.where(column: 'value').update_all(column: 'Another Value')
is better than the following lines :
Model.where(column: 'value').each do |item|
item.column = 'Another Value'
item.save
end
Not because it’s shorter, but because it will get executed as one UPDATE
statement, Not as multiple UPDATE
statements.
But it seems that I did this mistake while working on a worker that should bulk insert thousands of records.
My task was to clone N amount of records and update one value for the new clones (records), So I started to work and here was my code :
ActiveRecord::Base.transaction do
Model.where(column: 'value').each do |item|
new_item = item.dup # Duplicating the Object
new_item.column = 'Another Value'
new_item.save
end
end
So even though I am using DB Transactions
thinking that It would be the most optimized solution since I’m committing my records at once but I forgot the fact that I’m executing multiple INSERT
statements.
So the solution is to execute one SQL INSERT statement
but unfortunately Rails
doesn’t support that or ActiveRecords
doesn’t. So I had to use a 3rd party gem to do so, and the gem is bulk_insert.
bulk_insert
allows you to insert multiple records in one single INSERT
statement along with some other features where you can check in this README file. And to test it, I created a small Rails
App with a local PostgreSQL
DB to benchmark the alternative solutions. The benchmark was testing the time to bulk insert 100k records.
Solution | Time to insert 100K records |
---|---|
Multiple INSERT statements with DB Transaction | ~ 88 seconds |
Using bulk_insert gem | ~ 8 seconds |
I admit that the ORMs
spoiled me and I need to pay more attention to what’s happening behind the scenes.