Schema Migrations in Large-Scale MySQL Databases

On small MySQL databases, making schema changes is as simple as running an ALTER query. However, with large databases, it’s not so easy. In this article you will learn how to tackle this challenge.

Sep 30, 2024

Problems of schema migrations

When dealing with large-scale databases, we often encounter the following challenges during schema migrations:

Downtime and Service Disruption - Schema changes many times require taking the database offline, resulting in downtime for applications depending on the database.
Data Integrity Issues - Data consistency can be problematic during a migration, especially when altering large tables.
Performance Degradation - Adding indexes, altering large tables, adding columns etc. can cause performance bottlenecks, slow queries, and increased load on the database during migration itself.
Locking Issues - Most of schema changes result in table locks, blocking writes to the database.

Renaming approach

Common approach when dealing with these kind of problems is, to take advantage of MySQL’s online operations and work a solution around that. One of those instantanious operations would be for example rename operation and here’s how we utilize rename to be able to do schema migration with zero downtime:

Step 1: Copy schema of Table1 and create empty Table1_tmp

Step 2: Write down and drop foreign keys from Table1_tmp

Reason we do this, is because foreing keys checks pose a lot of additional processing for the database when it’s inserting data. If we are to sync the data into this new table, we want to keep processing at minimum, to have as fast sync as possible.

Step 3: Apply schema migration on Table1_tmp

Make sure that if you are adding column, you specify the default value that will be used while inserting data from Table1 to Table1_tmp for existing data.

Step 4: Start syncing data from Table1 to Table1_tmp

During the sync, Table1 can be in use. This means, we are free to use database, but we need to make sure we have binlog enabled, so that write queries can be replicated on the new Table1_tmp and we won’t have it out of sync after filling of data from Table1 finishes.

Step 5: Once we are in sync, just rename tables

Rename is instantanious process, so what we need to do is rename Table1 to Table1_old for example and right after that, rename Table1_tmp to Table1.

Step 6: Add the dropped foreign keys

Available tooling

To help us with the process, we already have few open source tools available today. One of them would be gh-ost (https://github.com/github/gh-ost), but there are others, like pt-online-schema-change (https://docs.percona.com/percona-toolkit/pt-online-schema-change.html).

Gh-ost snippets

To save you some time, here are few snippets we usually run when doing the migrations using gh-ost, so you can modify them to your liking and test things yourself.

Running gh-ost

To run gh-ost you will need to install it on a server that has access to your database. Replace all ${VARS} with appropriate details from your database and modify the “ADD INDEX …” with your schema change statement that you wish to perform over the table.

gh-ost \
--assume-rbr \
--user="${DB_USER}" \
--password="${DB_PASS}" \
--host="${DB_HOST}" \
--allow-on-master \
--database="${DB_NAME}" \
--table="${DB_TABLE}" \
--verbose \
--alter="\
ADD INDEX table_name (column_one,column_two,datetime)," \
--initially-drop-ghost-table \
--skip-foreign-key-checks \
--max-load=Threads_running=30 \
--chunk-size=50 \
--cut-over=default \
--exact-rowcount \
--concurrent-rowcount \
--serve-socket-file=/tmp/ghost.sock \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--execute

This command will create copy of schema of your ${DB_TABLE} table, run the --alter commands on the newly created table and start syncing data from main table to this new one.

Needless to say, you should run this command in screen or something similar, that you can detach from, since it will probably take long time.

For further information on flags that you can/should choose, have a look at the cheatsheet.

Checking status

To check status of the sync, there is a convenient socket available.

echo status | nc -U /tmp/ghost.sock

# Migrating `mydb`.`mytable`; Ghost table is `mydb`.`_mytable_gho`
# Migrating ip-172-27-0-49:3306; inspecting ip-172-27-0-72:3306; executing on ghost
# Migration started at Wed Jan 09 16:25:10 +0000 2024
# chunk-size: 500; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: Threads_running=30; critical-load: ; nice-ratio: 1.500000
# throttle-additional-flag-file: /tmp/ghost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag [set]
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/ghost.test.sock
Copy: 22404000/81210224 27.6%; Applied: 4052; Backlog: 0/1000; Time: 2h21m25s(total), 2h21m24s(copy); streamer: mysql-bin-changelog.003876:31236708; State: migrating; ETA: 6h11m11s

Throttling

If you see impact on production due to this process, you can set niceness and reduce burden on the database:

echo "nice-ratio=1.5" | nc -U /tmp/gh-ost.sock

Perform rename

Once your tables are in sync, you are free to perform cut off. To do that, gh-ost constantly checks specific file and if you remove it, the running process will perform cut off and exit, finishing the migration process.

rm /tmp/ghost.postpone.flag

More commands…

If you are interested in more interactive commands, you can check here:

https://github.com/github/gh-ost/blob/master/doc/interactive-commands.md

Final thoughts

Although pt-online-schema-change is a powerful tool, we prefer using gh-ost for the job. Its additional features, which are essential for managing large databases, make it invaluable. Of course you are free to test both and choose one of your liking.

BitNirmata

Discussion about this post