Overview

ShardingSphere’s core sharding flow includes six major stages: SQL parsing, query optimization, SQL routing, SQL rewriting, SQL execution and result merging.

1. SQL Parsing

1. Lexical Analysis

  • Split original SQL statement into basic syntax units (Tokens)
  • Identify keywords in SQL string (such as SELECT, FROM, etc.)
  • Identify identifiers (such as table names, column names)
  • Identify operators (such as =, >, etc.)
  • Identify constants (such as numbers, strings)

Example: SELECT id FROM user WHERE age > 18 is split into: [SELECT, id, FROM, user, WHERE, age, >, 18]

2. Syntax Analysis

  • Validate statement structure according to SQL grammar rules, generate Abstract Syntax Tree (AST)
  • Check if keyword order complies with grammar rules
  • Verify statement structure completeness

3. Sharding-JDBC’s Parser Implementation

Sharding-JDBC provides dedicated parser implementations for different database dialects:

  • MySQL parser: Supports MySQL-specific syntax like ON DUPLICATE KEY UPDATE, handles LIMIT pagination syntax
  • Oracle parser: Parses ROWNUM pseudo-column, handles DUAL virtual table
  • SQLServer parser: Parses TOP syntax, handles WITH(NOLOCK) table hints
  • PostgreSQL parser: Parses RETURNING clause, supports WINDOW window functions

2. Query Optimization

  1. Condition merge processing: Merge multiple conditions on the same field into more efficient expressions
  2. Sharding condition optimization: Identify sharding conditions that can be pushed down
  3. Execution plan improvement: Reduce cross-partition data transmission volume
  4. Special scenario handling: Handle OR conditions with subqueries, optimize complex conditions with JOINs

3. SQL Routing

SQL routing is the core component in distributed database systems, responsible for routing query requests to correct data nodes based on SQL parsing results and user-configured sharding strategies.

1. Sharding Strategy Matching

The system matches based on table’s sharding configuration (such as sharding key, sharding algorithm, etc.) and conditions in the SQL statement.

2. Routing Type Judgment

  • Shard Routing: Route SQL to specific one or multiple shard nodes
  • Broadcast Routing: Send SQL to all shard nodes simultaneously

3. Performance Optimization Mechanism

  • Routing result caching
  • Sharding metadata local caching
  • Smart routing selection algorithm

4. SQL Rewriting

SQL rewriting converts the original SQL statement into a form that can be correctly executed in the actual database environment.

1. Correctness Rewriting

  • Syntax adaptation: Convert database-specific syntax to target database-supported syntax
  • Function replacement: Replace incompatible functions with target database-supported equivalent functions
  • Data type conversion: Adjust data type definitions to adapt to target database

2. Optimization Rewriting

  • Query rewriting: Convert subqueries to join queries
  • Pagination optimization: Use more efficient pagination for large data volumes

5. SQL Execution

1. Execution Flow

  1. SQL parsing: First perform syntax analysis and semantic check on SQL statement
  2. Execution plan generation: Generate optimal execution plan based on SQL type
  3. Task distribution: Submit SQL execution tasks to thread pool queue
  4. Async execution: Worker threads in thread pool fetch tasks from queue and execute
  5. Result return: After execution completes, return results through callback function

2. Performance Optimization

  • Use PreparedStatement to reduce parsing overhead
  • Use pagination queries for large result sets
  • Set reasonable timeout for time-consuming operations

6. Result Merging

Result merging combines and processes result sets returned from multiple execution nodes.

1. Streaming Merge

  • Use iterator pattern to fetch data row by row
  • Typical use cases: ORDER BY sorting merge, GROUP BY grouping merge
  • Advantages: Low memory usage, suitable for large data volume processing

2. Memory Merge

  • Need to load all result data into memory for processing
  • Applicable to: Aggregate function calculations (SUM/AVG/MAX, etc.)

3. Decorator Pattern Additional Merge

  • Add functionality to original result set without changing it
  • Result set metadata merge, pagination merge, data verification merge