Big Data 265 - Canal Deployment

Canal Introduction

Canal is an open-source data synchronization tool from Alibaba for MySQL database incremental log parsing and synchronization. It simulates the MySQL slave protocol to obtain master binlog logs, enabling real-time data capture and transmission, commonly used for data migration, cache updates, and search engine synchronization.

Environment Preparation

  • Operating system: Linux/Windows/MacOS (Linux recommended)
  • Java environment: JDK 1.8 or above
  • Database environment: MySQL 5.6 or above, with binlog enabled
  • Zookeeper cluster: For distributed coordination (optional)

Canal Installation

My MySQL is on node h122, so I placed Canal on node h123.

Download Project

https://github.com/alibaba/canal/releases

You can use various methods. Here for convenience, I use version 1.1.4.

cd /opt/software
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz

Configure Project

mkdir /opt/servers/canal
tar -zxvf canal.deployer-1.1.4.tar.gz -C /opt/servers/canal

Modify Configuration

conf/canal.properties

cd /opt/servers/canal
vim conf/canal.properties

This file contains Canal’s basic general settings. Mainly pay attention to the port number. If not changed, the default is 11111. The modified content is as follows:

# Configure zookeeper address
canal.zkServers = h121.wzk.icu:2181,h122.wzk.icu:2181,h123.wzk.icu
# tcp, kafka, RocketMQ
canal.serverMode = kafka
# Configure kafka address
canal.mq.servers = h121.wzk.icu:9092,h122.wzk.icu:9092,h123.wzk.icu

conf/example/instance.properties

cd /opt/servers/canal
vim conf/example/instance.properties

This file is the configuration for the MySQL instance to track:

# Configure the host where MySQL database is located
canal.instance.master.address = h122.wzk.icu:3306
# username/password, configure database username and password
canal.instance.dbUsername = canal
canal.instance.dbPassword = canal
# mq config, corresponding Kafka topic:
canal.mq.topic=dwshow

Start Service

sh bin/startup.sh

Stop Service

sh bin/stop.sh

Notes

The above configuration is standalone mode. You can also build Canal in cluster mode. To build cluster mode, you can distribute the Canal directory to other machines and start Canal separately on each node. This zookeeper observer monitoring mode can achieve high availability rather than load balancing. At any point in time, only one canal-server node can monitor a specific data source. As long as this node works normally, other canal-server nodes monitoring this data source can only be standby until the working node is stopped, and other canal-server nodes can then compete for it.

Canal Cluster Deployment

Cluster Mode Introduction

Canal cluster mode typically combines ZooKeeper for distributed coordination to ensure high availability and load balancing. The following are three common cluster solutions:

  • Standalone instance mode: Multiple Canal instances run independently, suitable for small-scale scenarios.
  • HA mode: Implements master-standby switching based on ZooKeeper to improve reliability.
  • Distributed mode: Combined with Kafka/RocketMQ to achieve higher throughput distributed synchronization.

Common Issues and Solutions

Connection Failure

Ensure the database binlog is enabled and set to ROW mode:

SHOW VARIABLES LIKE 'binlog_format';

Ensure Database Permissions are Correct

GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';

Cannot Get Binlog Data

Confirm MySQL log configuration

SHOW VARIABLES LIKE 'log_bin';

High Kafka Consumer Latency

Adjust Kafka partitions and replica numbers to improve consumption rate.

Testing and Monitoring

Client Testing

Download Canal client SDK

<dependency>
    <groupId>com.alibaba.otter</groupId>
    <artifactId>canal.client</artifactId>
    <version>1.1.6</version>
</dependency>

Write code to consume data

CanalConnector connector = CanalConnectors.newSingleConnector(
    new InetSocketAddress("127.0.0.1", 11111), "example", "", "");

connector.connect();
connector.subscribe(".*\\..*");
connector.rollback();

while (true) {
    Message message = connector.get(100);
    for (Entry entry : message.getEntries()) {
        System.out.println(entry.toString());
    }
}

Status Monitoring

Check Canal service status

jps

View log error messages

tail -f logs/canal/canal.log

Use Prometheus or Grafana to monitor Canal service metrics.

Summary

Canal is a powerful data synchronization tool. When combined with clusters and message queues, it can meet complex real-time data synchronization needs. When configuring cluster mode, you need to reasonably plan the coordination between ZooKeeper, Kafka, and Canal instances, and combine monitoring tools to ensure high availability and stability.