Recently, we needed to iterate over a fairly large data set (on the order of millions) and do the ever-common If it’s not in the database, put it in. If it’s already there, just update some fields. It’s a pattern that is very common for things like log files (where, for example, only a timestamp needs to be updated in some cases).
The obvious way of doing a SELECT, followed by either an UPDATE or an INSERT is too slow for even moderately-large datasets. The better way to accomplish this is to use MySQL’s ON DUPLICATE KEY UPDATE directive. By simply creating a unique key on the fields that should be different per-row, this syntax provides two specific benefits:
- Allows batch (read: transaction) queries for large data
- Increases performance overall versus making two separate queries
These benefits are especially helpful when your dataset is too large to fit into memory. The obvious drawback to this method, however, is that it may put additional load on your database server. Like anything else, it’s worth testing out your individual situation but, for us, ON DUPLICATE KEY UPDATE was the way to go.
Recently, I was populating a database with lines from a number of log files. One of the key pieces of information in each of these log lines was a URL. Because URLs can be pretty much as long as they want to be (or can they?) I decided to make the URL field a
Text type in my schema. Then, because I wanted fast lookups, I tried to add an index (key) on this field and ran into this guy:
ERROR 1170 (42000): BLOB/TEXT column ‘field_name’ used in key specification
without a key length
It turns out that MySQL can only index the first N characters of a
Text field – but for a URL, that’s not good enough. After talking it over with my team members, we decided to instead add a field –
url_md5. By storing the md5sum of each URL, we could index on the hash field and have both fast lookups and avoid worrying about domains like this fitting into a
How can we transparently migrate MySQL from one server to another when we don’t want to disrupt end users? That was the question posed as we come to the final phase of decommissioning a server. We have transitioned almost all services away from the older server (CHIMAY)- but there is one external cron that is not under our control that we can see in the logs which generates several MySQL queries. Therefore- we need to transparently move MySQL through another server (TECATE). Here’s the scenario: Continue reading “Transparent MySQL migration using MySQL proxy”
Often, you may want to run a script that uses a username and password to access data in a MySQL database. Securely running a script like this manually is easy – simply use the ‘-p’ option for the MySQL client, and it will prompt you for the password. However, this is not an option if you want to automate the script.
There are several ways to provide the password in a way that can be used with automated scripts, but only one that is both flexible and secure. You can specify the password on the command line itself (with ‘mysql -p ‘); however, this allows the password be seen by other users who run commands like ‘ps’. Another option is setting the environment variable “MYSQL_PWD” to the password, but this can also be seen by other users. Continue reading “Securely specify mysql credentials in automated scripts”
One issue that seems to be very common when running a MySQL server is high iowait. As we have delved into our servers, one of the main causes we have found is that our table_cache was set too low, causing an ever increasing number of table opens
One issue that seems to be very common when running a MySQL server is high iowait. As we have delved into our servers, one of the main causes we have found is that our table_cache was set too low, causing an ever increasing number of table opens. As we increased the table_cache, and watched for higher hit percentages, we were stumped regarding how many table opens in a given time period were acceptable. A number which I use as a guideline was given to me during a MySQL training course this year. We use a guideline of no more than 10 table opens per hour. More than that- and we need to increase table_cache.