E-R Table Sharding Strategy Explained

1. E-R Relationship Sharding Principle

MyCat’s E-R relationship sharding strategy is an innovative solution designed based on database entity-relationship models. The core idea is to store data of associated parent and child tables in the same data shard.

2. Table Grouping Mechanism

  • Group tables with association relationships into the same logical group
  • Ensure associated data rows are always on the same physical node

3. Typical Application Scenarios

  1. Order-Order Detail scenario: Order table as parent, order detail table as child
  2. User-User Extension Information scenario
  3. Product-Product Category scenario

4. Configuration Example

<table name="orders" primaryKey="order_id" dataNode="dn1,dn2" rule="mod-long">
    <childTable name="order_items" joinKey="order_id" parentKey="order_id"/>
</table>

5. Advantage Analysis

  1. Query performance improvement: Join queries do not need cross-node execution
  2. Transaction simplification: Related table operations can complete on the same node
  3. Data consistency: Avoids complexity from distributed transactions

Global Table

Concept and Definition

Global table is a special type of data table in distributed database systems, mainly used for storing dictionary data or configuration data.

Typical Characteristics

  1. Low data change frequency: Very few update operations
  2. Stable data scale: Number of records basically remains constant
  3. Moderate data volume: Single table data volume usually does not exceed 100MB

Solution

MyCat uses data redundancy mechanism for dictionary table join queries:

  • All shards save complete copies of global table data
  • Any update operation synchronizes to all shards

Design Principles

  1. Identification criteria: Data change cycle greater than 1 month, record count less than 100,000
  2. Implementation notes: Need to configure automatic synchronization mechanism
  3. Performance optimization suggestions: Establish appropriate indexes for global tables

Shard Node

Shard node refers to the actual database instance where each data shard is stored after splitting a single database table according to specific rules.

  1. Sharding mechanism: Data rows are allocated to different physical databases according to preset sharding rules
  2. Node characteristics: Each shard node is a complete database instance
  3. Typical deployment scenario: E-commerce system order table sharded by user ID

Node Host

Node host refers to the physical server or virtual machine that carries one or more shard nodes.

  1. Relationship between sharding and host: One node host can run multiple shard instances
  2. Concurrency limitation issue: When multiple high-load shards concentrate on the same host, resource bottlenecks easily occur
  3. Load balancing strategy: Shard distribution, read-write separation, weight allocation

Sharding Rules

Importance of Sharding Rules

  1. Even data distribution: Good sharding rules ensure data is evenly distributed across all shard nodes
  2. Query efficiency optimization: Reasonable sharding rules can reduce cross-shard queries
  3. Scalability guarantee: When expansion is needed, good sharding rules support smooth expansion

Common Sharding Rule Types

  1. Hash sharding: Hash calculate the sharding key to determine data distribution
  2. Range sharding: Shard by numeric range
  3. List sharding: Shard by enumerated values
  4. Complex sharding: Combine multiple sharding strategies