So there I was. I worked my full day of client meetings and looking for new office space, only to face a late-night data migration, transferring data between two tables. I did not expect it to be an overnight endeavor.
The Groovy script I had authored seemed simple and straightforward to me. I logged in, cd’d to its location on the server, typed the name of the script, and hit enter.
Six hours later, the prompt returned.
You can imagine I was not a happy camper, constantly executing queries on the database to answer the question: “Is it really doing anything? Is it hung?”
Row by row the data was deposited. Six long hours later, I wearily logged off and crawled into bed just as the sun began to peek over the horizon.
I did not want to repeat this. So, I did a little research. Hopefully what I learned can help others avoid this very costly oversight.
Enter Groovy’s batchUpdate feature. Perhaps I’m just a little late to this party, but from a novice’s perspective, Approach 1 (see listing) seems perfectly reasonable: for each row in one table, insert the data into another table.
Approach 1: “Brute Force”
// This took more than 6 Hours
mssql.withTransaction
{
cachesql.eachRow(”select * from HDL.TransactionJournal where TransactionDate between ‘2010-08-01′ and ‘2011-08-01′ “)
{ row ->
mssql.execute(”insert into[dbo].TransactionJournal (ID, TransactionDate,TxCode,TxType) values (?,?,?,?)”, [row.ID,row.TransactionDate, row.TxCode,row.TxType])
}
}
After running this script and painstakingly monitoring its progress overnight, I embarked on a quest to determine the better way to do this. After a bit of searching, I discovered Groovy’s batchUpdate.
I quickly rewrote my script, and the same data transfer took 3 minutes. That’s right – from 360 minutes to 3. How’s that for two orders of magnitude’s difference?
The concept is this: do a mass insert, and commit every x number of rows, instead of inserting and committing each row singly. The higher you set the parameter for committing, the faster it will run.
Approach 2: Same Data Shuffling, Much Faster Performance
// This took 3 mins
mssql.withTransaction
{
//it commits after each 10K insert
def updateCounts = mssql.withBatch(10000, “insert into[dbo].TransactionJournal (ID,TransactionDate,TxCode,TxType) values (?,?,?,?)”,
[row.ID,,row.TransactionDate, row.TxCode,row.TxType]“)
{ ps ->
cachesql.eachRow(”select * from dbo.TransactionJournal where TransactionDate between ‘2010-08-01′ and ‘2011-08-01′ “)
{ row ->
ps.addBatch([row.ID,row.TransactionDate, row.TxCode, row.TxType])
}
}
}
The underlying lesson learned here is a fairly standard one, which is, when doing something that looks like batch processing in a database, make sure to use batch processing (within a transaction) to ensure optimal performance.
Groovy’s batch processing feature was introduced in November 2010 with the release of 1.7, so anyone who picked up Groovy prior to that (like myself) may be accustomed to working around this previously missing feature.
Hopefully this article will help others avoid the mistake of quickly whipping up a script like this – only to sit and watch it execute, not-so-quickly.
(804) 504-1423



