Recently I decided to do some work on a sqlsnapshot utility and tried to figure out the best way to do a bulk update on the database.

Here are the test scripts I used to generate the SQL:

for i in {1..9000}; do echo "UPDATE test SET name = 'test$i' WHERE id = $i;"; done > update.sql
for i in {1..9000}; do echo "($i,'test$i'), "; done > replace.sql

SQL Schema:

CREATE TABLE `test` (
	`id` INT(10) NOT NULL AUTO_INCREMENT,
	`name` VARCHAR(50) NOT NULL DEFAULT 'None',
	PRIMARY KEY (`id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;

System specs:

mysqld Ver 5.5.31-0ubuntu0.12.10.1 for debian-linux-gnu on i686 ((Ubuntu))
Ubuntu 12.10
Running on a VM in VMWare Player

First off: single SQL statements are bad.

For example:

UPDATE test SET name = 'test1' WHERE id = 1;
UPDATE test SET name = 'test2' WHERE id = 2;
UPDATE test SET name = 'test3' WHERE id = 3;
UPDATE test SET name = 'test4' WHERE id = 4;
UPDATE test SET name = 'test5' WHERE id = 5;
UPDATE test SET name = 'test6' WHERE id = 6;
UPDATE test SET name = 'test7' WHERE id = 7;
UPDATE test SET name = 'test8' WHERE id = 8;
UPDATE test SET name = 'test9' WHERE id = 9;
-- ...

This SQL should look like this:

REPLACE INTO test VALUES (1,'test1'),
(2,'test2'),
(3,'test3'),
(4,'test4'),
(5,'test5'),
(6,'test6'),
(7,'test7'),
(8,'test8'),
(9,'test9')
-- ....

On average the mass UPDATE took anywhere from 20-30 seconds to execute! The replace only took 0.30-0.12 seconds!

Now of course, different schemas may yield different results - but even the delete/insert was 0.20-0.12 seconds (ie DELETE FROM test, INSERT INTO...). From what I see it appears that MySQL doesn't wait for the user to stop entering SQL queries. There really wouldn't be a good way to detect this though, besides assuming that if a user hasn't entered a SQL statement for the past second to run the transaction.

Wait transactions....let's take a look at those. In transactions you can run a bunch of queries, and a SQL will run them in a shadow table, and when you tell the SQL engine "COMMIT" it will commit all of the queries you just made.

So I modified the update SQL script.

SET AUTOCOMMIT=0;
START TRANSACTION;
UPDATE test SET name = 'test1' WHERE id = 1;
UPDATE test SET name = 'test2' WHERE id = 2;
-- ...
COMMIT;

Running it now takes about 0.70 seconds instead of 20 seconds! What a difference. Adding transaction to the replace script turned it into a consistent 0.12 seconds. However, it didn't seem to do anything for deleting/inserting and inserting data.

Transactions are good for:

* UPDATES
* REPLACES

Transactions do no improve:

* DELETE/INSERT
* INSERT

And when you can use REPLACE over UPDATE.