MySQL Sharding-JDBC: Physical, Logical & Binding Tables E...

Core Concepts

Physical Table

Physical table is a table that actually exists in the database, the entity table where data is finally stored. In database sharding scenarios, these tables usually have numeric suffixes to indicate shards. For example:

b_order0: Represents the first shard of the order table
b_order1: Represents the second shard of the order table
b_order_item0: Represents the first shard of the order item table

These physical tables are distributed across different database instances or data sources, with each table storing partial data.

Logical Table

Logical table is an abstract representation of the same type of table structure, the table name used in application programs. It represents a collection of sharded tables. For example:

b_order: Represents the logical name of all order shard tables
b_order_item: Represents the logical name of all order item shard tables

In SQL queries, developers only need to use the logical table name, and the sharding middleware will automatically route the query to the corresponding physical tables.

Data Node

Data node is the basic data unit after sharding, consisting of a data source and physical table, usually formatted as datasource_name.physical_table_name. For example:

ds0.b_order1: Represents b_order1 table in datasource ds0
ds1.b_order_item2: Represents b_order_item2 table in datasource ds1

Binding Table

Binding table refers to master table and child table relationships with completely consistent sharding rules. They have the same sharding key and sharding algorithm, ensuring associated data is stored on the same shard.

Binding Table Advantages

Join queries can be completed directly on a single shard, avoiding cross-shard queries
Eliminates Cartesian product problems, improving query efficiency
Guarantees transaction consistency, related data on the same shard

Binding Table Configuration Example

spring:
  shardingsphere:
    sharding:
      binding-tables:
        - b_order,b_order_item
        - t_user,t_user_address

Cartesian Product Problem

Without binding table configuration, Cartesian product relationship joins generate 4 SQLs:

SELECT * FROM b_order0 o
JOIN b_order_item0 i ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);

SELECT * FROM b_order0 o
JOIN b_order_item1 i ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);

SELECT * FROM b_order1 o
JOIN b_order_item0 i ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);

SELECT * FROM b_order1 o
JOIN b_order_item1 i ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);

After binding table configuration, generates 2 SQLs:

SELECT * FROM b_order0 o
JOIN b_order_item0 ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);

SELECT * FROM b_order1 o
JOIN b_order_item1 ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);

Broadcast Table

Broadcast table is a special data sharding strategy, mainly used for handling small data tables that don’t need sharding but require frequent join queries.

Characteristics

Small data scale: Data volume usually ranges from hundreds to tens of thousands of records
High-frequency join query requirements: Need to JOIN with massive data
High data consistency requirements: Content is relatively static, low update frequency

Typical Use Cases

Basic data tables: Country/region code tables, product category tables
System configuration tables: System parameter tables, user role tables
Dimension tables: Time dimension tables, organization tables

Sharding Algorithm Details

1. Precise Sharding Algorithm (PreciseShardingAlgorithm)

Use case: Handles equal value queries (IN operator) using single sharding key
Typical query example: WHERE user_id IN (1001, 1002, 1003)

2. Range Sharding Algorithm (RangeShardingAlgorithm)

Use case: Handles range queries using single sharding key
Supported operators: BETWEEN AND, >, <, >=, <=

3. Complex Sharding Algorithm (ComplexKeysShardingAlgorithm)

Use case: Handles complex sharding logic with multiple sharding key combinations
Features: Supports multiple sharding key combination conditions

4. Hint Sharding Algorithm (HintShardingAlgorithm)

Use case: Special scenarios where sharding field is not determined by SQL but by external conditions
Implementation methods:
- Java API method
- SQL comment method

// Java API method
HintManager.getInstance().addDatabaseShardingValue("logicTable", "shardingValue");

/* Sharding:table=employee_01 */ SELECT * FROM employee

Sharding Strategy Details

1. Standard Sharding Strategy (StandardShardingStrategy)

Only supports single sharding key, suitable for most conventional sharding scenarios. Requires PreciseShardingAlgorithm (required) and RangeShardingAlgorithm (optional).

2. Complex Sharding Strategy (ComplexShardingStrategy)

Supports multiple sharding key combinations, suitable for complex business scenarios.

3. Inline Sharding Strategy (InlineShardingStrategy)

A lightweight sharding solution based on Groovy expressions, no Java code writing required.

# Shard into 8 tables by user ID modulo
t_user_$->{u_id % 8}

4. Hint Sharding Strategy (HintShardingStrategy)

Specifies sharding values through programming, completely bypassing SQL parsing process.

5. None Sharding Strategy (NoneShardingStrategy)

Explicitly declares no sharding, suitable for small tables that don’t need sharding.

Strategy Configuration

Data Source Sharding Strategy

Mainly used to define how data records are allocated to different physical data source nodes.

Table Sharding Strategy

Responsible for defining data record distribution rules within the target data source at the table level, must be based on data source sharding strategy results.