Recently, we needed to iterate over a fairly large data set (on the order of millions) and do the ever-common If it’s not in the database, put it in. If it’s already there, just update some fields. It’s a pattern that is very common for things like log files (where, for example, only a timestamp needs to be updated in some cases).
The obvious way of doing a SELECT, followed by either an UPDATE or an INSERT is too slow for even moderately-large datasets. The better way to accomplish this is to use MySQL’s ON DUPLICATE KEY UPDATE directive. By simply creating a unique key on the fields that should be different per-row, this syntax provides two specific benefits:
- Allows batch (read: transaction) queries for large data
- Increases performance overall versus making two separate queries
These benefits are especially helpful when your dataset is too large to fit into memory. The obvious drawback to this method, however, is that it may put additional load on your database server. Like anything else, it’s worth testing out your individual situation but, for us, ON DUPLICATE KEY UPDATE was the way to go.
It’s been a long, busy holiday- and now time to resume blogging. My latest interest is testing our mysql servers using mysqlslap. While a great tool, it is unfortunately only distributed with MySQL 5.1.4 and above. However, many of our servers are in the 5.0 release, with some in the 4.x release. What I did was grab the latest 5.1.x version of mysql, compile it statically and test it out on a MYSQL 5.0.45 version, and it worked just fine. I have yet to test it on another machine, or against a version other than 5.0.45. I’ll update when I do.
Here are the steps I took:
1) wget “http://dev.mysql.com/get/Downloads/MySQL-5.1/mysql-5.1.30.tar.gz/from/http://mysql.llarian.net/”
2) tar -xf mysql-5.1.30.tar.gz&&cd mysql-5.1.30
3) ./configure –without-server –disable-shared
4) make&&cp -i ./client/mysqlslap /usr/local/bin/
One issue that seems to be very common when running a MySQL server is high iowait. As we have delved into our servers, one of the main causes we have found is that our table_cache was set too low, causing an ever increasing number of table opens
One issue that seems to be very common when running a MySQL server is high iowait. As we have delved into our servers, one of the main causes we have found is that our table_cache was set too low, causing an ever increasing number of table opens. As we increased the table_cache, and watched for higher hit percentages, we were stumped regarding how many table opens in a given time period were acceptable. A number which I use as a guideline was given to me during a MySQL training course this year. We use a guideline of no more than 10 table opens per hour. More than that- and we need to increase table_cache.