Core Concepts
Physical Table
Physical table is a table that actually exists in the database, the entity table where data is finally stored. In database sharding scenarios, these tables usually have numeric suffixes to indicate shards. For example:
b_order0: Represents the first shard of the order tableb_order1: Represents the second shard of the order tableb_order_item0: Represents the first shard of the order item table
These physical tables are distributed across different database instances or data sources, with each table storing partial data.
Logical Table
Logical table is an abstract representation of the same type of table structure, the table name used in application programs. It represents a collection of sharded tables. For example:
b_order: Represents the logical name of all order shard tablesb_order_item: Represents the logical name of all order item shard tables
In SQL queries, developers only need to use the logical table name, and the sharding middleware will automatically route the query to the corresponding physical tables.
Data Node
Data node is the basic data unit after sharding, consisting of a data source and physical table, usually formatted as datasource_name.physical_table_name. For example:
ds0.b_order1: Represents b_order1 table in datasource ds0ds1.b_order_item2: Represents b_order_item2 table in datasource ds1
Binding Table
Binding table refers to master table and child table relationships with completely consistent sharding rules. They have the same sharding key and sharding algorithm, ensuring associated data is stored on the same shard.
Binding Table Advantages
- Join queries can be completed directly on a single shard, avoiding cross-shard queries
- Eliminates Cartesian product problems, improving query efficiency
- Guarantees transaction consistency, related data on the same shard
Binding Table Configuration Example
spring:
shardingsphere:
sharding:
binding-tables:
- b_order,b_order_item
- t_user,t_user_address
Cartesian Product Problem
Without binding table configuration, Cartesian product relationship joins generate 4 SQLs:
SELECT * FROM b_order0 o
JOIN b_order_item0 i ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);
SELECT * FROM b_order0 o
JOIN b_order_item1 i ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);
SELECT * FROM b_order1 o
JOIN b_order_item0 i ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);
SELECT * FROM b_order1 o
JOIN b_order_item1 i ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);
After binding table configuration, generates 2 SQLs:
SELECT * FROM b_order0 o
JOIN b_order_item0 ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);
SELECT * FROM b_order1 o
JOIN b_order_item1 ON(o.order_id=i.order_id)
WHERE o.order_id IN(10,11);
Broadcast Table
Broadcast table is a special data sharding strategy, mainly used for handling small data tables that don’t need sharding but require frequent join queries.
Characteristics
- Small data scale: Data volume usually ranges from hundreds to tens of thousands of records
- High-frequency join query requirements: Need to JOIN with massive data
- High data consistency requirements: Content is relatively static, low update frequency
Typical Use Cases
- Basic data tables: Country/region code tables, product category tables
- System configuration tables: System parameter tables, user role tables
- Dimension tables: Time dimension tables, organization tables
Sharding Algorithm Details
1. Precise Sharding Algorithm (PreciseShardingAlgorithm)
- Use case: Handles equal value queries (IN operator) using single sharding key
- Typical query example:
WHERE user_id IN (1001, 1002, 1003)
2. Range Sharding Algorithm (RangeShardingAlgorithm)
- Use case: Handles range queries using single sharding key
- Supported operators: BETWEEN AND, >, <, >=, <=
3. Complex Sharding Algorithm (ComplexKeysShardingAlgorithm)
- Use case: Handles complex sharding logic with multiple sharding key combinations
- Features: Supports multiple sharding key combination conditions
4. Hint Sharding Algorithm (HintShardingAlgorithm)
- Use case: Special scenarios where sharding field is not determined by SQL but by external conditions
- Implementation methods:
- Java API method
- SQL comment method
// Java API method
HintManager.getInstance().addDatabaseShardingValue("logicTable", "shardingValue");
/* Sharding:table=employee_01 */ SELECT * FROM employee
Sharding Strategy Details
1. Standard Sharding Strategy (StandardShardingStrategy)
Only supports single sharding key, suitable for most conventional sharding scenarios. Requires PreciseShardingAlgorithm (required) and RangeShardingAlgorithm (optional).
2. Complex Sharding Strategy (ComplexShardingStrategy)
Supports multiple sharding key combinations, suitable for complex business scenarios.
3. Inline Sharding Strategy (InlineShardingStrategy)
A lightweight sharding solution based on Groovy expressions, no Java code writing required.
# Shard into 8 tables by user ID modulo
t_user_$->{u_id % 8}
4. Hint Sharding Strategy (HintShardingStrategy)
Specifies sharding values through programming, completely bypassing SQL parsing process.
5. None Sharding Strategy (NoneShardingStrategy)
Explicitly declares no sharding, suitable for small tables that don’t need sharding.
Strategy Configuration
Data Source Sharding Strategy
Mainly used to define how data records are allocated to different physical data source nodes.
Table Sharding Strategy
Responsible for defining data record distribution rules within the target data source at the table level, must be based on data source sharding strategy results.