Optimizing Replication: Selective Data Transfer with binlog-do-db and replicate-do-db in MySQL and MariaDB

2024-07-27

In MySQL and MariaDB, master-slave replication allows you to keep a secondary server (slave) in sync with a primary server (master). This ensures data redundancy and scalability for read-heavy workloads or disaster recovery scenarios.

Binary Logging

A crucial aspect of replication is binary logging. The master server records all data-modification statements (like INSERT, UPDATE, and DELETE) in a binary log file. This log becomes the source of truth for keeping the slave server updated.

Filtering Replication with binlog-do-db and replicate-do-db

These options enable you to selectively replicate data based on specific databases. However, they have distinct functionalities:

  • binlog-do-db (on the Master):

    • Configured on the master server.
    • Instructs the master to only write statements that modify tables within the specified databases to its binary log.
    • Statements affecting excluded databases won't be logged, preventing replication for those databases.
  • replicate-do-db (on the Slave):

    • Instructs the slave to only apply statements from its relay log (a temporary log storing binary log events received from the master) that modify tables within the specified databases.
    • Statements targeting excluded databases on the master will still be replicated, but the slave will ignore them.

Key Differences:

  • Target: binlog-do-db controls what goes into the master's binary log, while replicate-do-db filters what gets applied on the slave.
  • Effect on Master: binlog-do-db affects the master's binary log content, potentially impacting other slaves replicating from the same master. replicate-do-db only affects the specific slave where it's configured.

Important Considerations:

  • Alternatives: Consider using whitelisting with binlog-ignore-db or replicate-ignore-db to explicitly include only the databases you want replicated. This can be more secure and avoid unintended exclusions.
  • Backup Implications: Filtering with these options can prevent capturing all data changes in the binary log. This might affect your ability to perform point-in-time recovery from backups.
  • Use Cases: Use these options cautiously for scenarios like:
    • Offloading read traffic to a slave for specific databases.
    • Migrating data from one database to another server selectively.

In summary:

  • binlog-do-db (master): Only log statements for specific databases on the master.
  • replicate-do-db (slave): Only apply statements from the relay log for specific databases on the slave.



Command Line:

mysqld_safe --binlog-do-db=mydatabase

Option File:

[mysqld]
binlog-do-db=mydatabase

Explanation:

In both methods, we're specifying mydatabase as the database to include in the binary log. The master server will only write statements that modify tables within mydatabase to its binary log.

Command Line (after stopping the slave thread):

STOP SLAVE;
SET GLOBAL replicate_do_db='mydatabase';
START SLAVE;
[mysqld]
replicate-do-db=mydatabase
  • We first stop the slave thread using STOP SLAVE.
  • Then, we set the replicate_do_db system variable to mydatabase. This instructs the slave to only apply statements from the relay log that modify tables within mydatabase.
  • Finally, we restart the slave thread using START SLAVE.

Remember:

  • Replace mydatabase with the actual database name you want to replicate or filter.
  • Consider using binlog-ignore-db or replicate-ignore-db for whitelisting to be more explicit about what gets replicated/ignored.
  • Be cautious when using filtering options, as they can impact backup and recovery strategies.



  • Introduced in MySQL 5.6 and MariaDB 10.0, logical replication allows replicating specific tables or even specific DML statements (INSERT, UPDATE, DELETE) within a table.
  • Pros:
    • More granular control over what gets replicated.
    • Doesn't affect the master's binary log content.
  • Cons:
    • Requires newer MySQL/MariaDB versions.
    • May have higher overhead compared to statement-based filtering.

User-Defined Functions (UDFs) in Triggers:

  • Create triggers on the master that use UDFs to determine if a statement should be replicated based on specific criteria (e.g., database name, table name, user permissions).
  • The UDF logic can be customized to your needs.
  • Pros:
  • Cons:
    • Can be complex to set up and maintain.
    • Might introduce performance overhead on the master.

Database-Level Permissions:

  • Grant appropriate permissions (SELECT, INSERT, UPDATE, DELETE) to users on specific databases on the slave. Users can then access/modify only the authorized databases.
  • Pros:
  • Cons:
    • Less granular control than other methods.
    • Requires additional user management.

Filtering Based on Replication Channels:

  • MySQL 8.0 introduced replication channels, which allow defining logical replication streams with specific filtering criteria (e.g., databases, tables, DML statements).
  • Pros:
    • Granular control over replication using channels.
  • Cons:
    • Requires MySQL 8.0 or later.
    • Might be a more advanced setup for some users.

Choosing the Right Method:

The best method depends on your specific requirements for filtering, complexity tolerance, and MySQL/MariaDB version. Here's a general guideline:

  • Logical replication: Ideal if you need very granular control and are using a compatible MySQL/MariaDB version.
  • UDFs in triggers: Consider this if you have specific filtering logic beyond database names and need flexibility.
  • Database-level permissions: Choose this for a simple setup where filtering by database is sufficient.
  • Replication channels: A good option for MySQL 8.0 users who want granular control and channel-based replication management.

mysql mariadb master-slave



Example Code (Schema Changes Table)

Create a table in your database specifically for tracking changes. This table might have columns like version_number (integer...


Visualize Your MySQL Database: Reverse Engineering and ER Diagrams

Here's a breakdown of how it works:Some popular tools for generating MySQL database diagrams include:MySQL Workbench: This free...


Level Up Your MySQL Skills: Exploring Multiple Update Techniques

This is the most basic way. You write separate UPDATE statements for each update you want to perform. Here's an example:...


Retrieving Your MySQL Username and Password

Understanding the Problem: When working with MySQL databases, you'll often need to know your username and password to connect...


Managing Databases Across Development, Test, and Production Environments

Developers write scripts containing SQL statements to define the database schema (structure) and any data changes. These scripts are like instructions to modify the database...



mysql mariadb master slave

Optimizing Your MySQL Database: When to Store Binary Data

Binary data is information stored in a format computers understand directly. It consists of 0s and 1s, unlike text data that uses letters


Enforcing Data Integrity: Throwing Errors in MySQL Triggers

MySQL: A popular open-source relational database management system (RDBMS) used for storing and managing data.Database: A collection of structured data organized into tables


Bridging the Gap: Transferring Data Between SQL Server and MySQL

SSIS is a powerful tool for Extract, Transform, and Load (ETL) operations. It allows you to create a workflow to extract data from one source


Replacing Records in SQL Server 2005: Alternative Approaches to MySQL REPLACE INTO

SQL Server 2005 doesn't have a direct equivalent to REPLACE INTO. You need to achieve similar behavior using a two-step process:


When Does MySQL Slow Down? It Depends: Optimizing for Performance

Hardware: A beefier server with more RAM, faster CPU, and better storage (like SSDs) can handle much larger databases before slowing down