Optimizing Replication: Selective Data Transfer with binlog-do-db and replicate-do-db in MySQL and MariaDB
In MySQL and MariaDB, master-slave replication allows you to keep a secondary server (slave) in sync with a primary server (master). This ensures data redundancy and scalability for read-heavy workloads or disaster recovery scenarios.
Binary Logging
A crucial aspect of replication is binary logging. The master server records all data-modification statements (like INSERT, UPDATE, and DELETE) in a binary log file. This log becomes the source of truth for keeping the slave server updated.
Filtering Replication with binlog-do-db
and replicate-do-db
These options enable you to selectively replicate data based on specific databases. However, they have distinct functionalities:
-
binlog-do-db
(on the Master):- Configured on the master server.
- Instructs the master to only write statements that modify tables within the specified databases to its binary log.
- Statements affecting excluded databases won't be logged, preventing replication for those databases.
-
replicate-do-db
(on the Slave):- Instructs the slave to only apply statements from its relay log (a temporary log storing binary log events received from the master) that modify tables within the specified databases.
- Statements targeting excluded databases on the master will still be replicated, but the slave will ignore them.
Key Differences:
- Target:
binlog-do-db
controls what goes into the master's binary log, whilereplicate-do-db
filters what gets applied on the slave. - Effect on Master:
binlog-do-db
affects the master's binary log content, potentially impacting other slaves replicating from the same master.replicate-do-db
only affects the specific slave where it's configured.
Important Considerations:
- Alternatives: Consider using whitelisting with
binlog-ignore-db
orreplicate-ignore-db
to explicitly include only the databases you want replicated. This can be more secure and avoid unintended exclusions. - Backup Implications: Filtering with these options can prevent capturing all data changes in the binary log. This might affect your ability to perform point-in-time recovery from backups.
- Use Cases: Use these options cautiously for scenarios like:
- Offloading read traffic to a slave for specific databases.
- Migrating data from one database to another server selectively.
In summary:
binlog-do-db
(master): Only log statements for specific databases on the master.replicate-do-db
(slave): Only apply statements from the relay log for specific databases on the slave.
Command Line:
mysqld_safe --binlog-do-db=mydatabase
Option File:
[mysqld]
binlog-do-db=mydatabase
Explanation:
In both methods, we're specifying mydatabase
as the database to include in the binary log. The master server will only write statements that modify tables within mydatabase
to its binary log.
Command Line (after stopping the slave thread):
STOP SLAVE;
SET GLOBAL replicate_do_db='mydatabase';
START SLAVE;
[mysqld]
replicate-do-db=mydatabase
- We first stop the slave thread using
STOP SLAVE
. - Then, we set the
replicate_do_db
system variable tomydatabase
. This instructs the slave to only apply statements from the relay log that modify tables withinmydatabase
. - Finally, we restart the slave thread using
START SLAVE
.
Remember:
- Replace
mydatabase
with the actual database name you want to replicate or filter. - Consider using
binlog-ignore-db
orreplicate-ignore-db
for whitelisting to be more explicit about what gets replicated/ignored. - Be cautious when using filtering options, as they can impact backup and recovery strategies.
- Introduced in MySQL 5.6 and MariaDB 10.0, logical replication allows replicating specific tables or even specific DML statements (INSERT, UPDATE, DELETE) within a table.
- Pros:
- More granular control over what gets replicated.
- Doesn't affect the master's binary log content.
- Cons:
- Requires newer MySQL/MariaDB versions.
- May have higher overhead compared to statement-based filtering.
User-Defined Functions (UDFs) in Triggers:
- Create triggers on the master that use UDFs to determine if a statement should be replicated based on specific criteria (e.g., database name, table name, user permissions).
- The UDF logic can be customized to your needs.
- Pros:
- Cons:
- Can be complex to set up and maintain.
- Might introduce performance overhead on the master.
Database-Level Permissions:
- Grant appropriate permissions (SELECT, INSERT, UPDATE, DELETE) to users on specific databases on the slave. Users can then access/modify only the authorized databases.
- Pros:
- Cons:
- Less granular control than other methods.
- Requires additional user management.
Filtering Based on Replication Channels:
- MySQL 8.0 introduced replication channels, which allow defining logical replication streams with specific filtering criteria (e.g., databases, tables, DML statements).
- Pros:
- Granular control over replication using channels.
- Cons:
- Requires MySQL 8.0 or later.
- Might be a more advanced setup for some users.
Choosing the Right Method:
The best method depends on your specific requirements for filtering, complexity tolerance, and MySQL/MariaDB version. Here's a general guideline:
- Logical replication: Ideal if you need very granular control and are using a compatible MySQL/MariaDB version.
- UDFs in triggers: Consider this if you have specific filtering logic beyond database names and need flexibility.
- Database-level permissions: Choose this for a simple setup where filtering by database is sufficient.
- Replication channels: A good option for MySQL 8.0 users who want granular control and channel-based replication management.
mysql mariadb master-slave