MySQL ShardingSphere: SQL Parse, Route, Rewrite & Execute...

Overview

ShardingSphere’s core sharding flow includes six major stages: SQL parsing, query optimization, SQL routing, SQL rewriting, SQL execution and result merging.

1. SQL Parsing

1. Lexical Analysis

Split original SQL statement into basic syntax units (Tokens)
Identify keywords in SQL string (such as SELECT, FROM, etc.)
Identify identifiers (such as table names, column names)
Identify operators (such as =, >, etc.)
Identify constants (such as numbers, strings)

Example: SELECT id FROM user WHERE age > 18 is split into: [SELECT, id, FROM, user, WHERE, age, >, 18]

2. Syntax Analysis

Validate statement structure according to SQL grammar rules, generate Abstract Syntax Tree (AST)
Check if keyword order complies with grammar rules
Verify statement structure completeness

3. Sharding-JDBC’s Parser Implementation

Sharding-JDBC provides dedicated parser implementations for different database dialects:

MySQL parser: Supports MySQL-specific syntax like ON DUPLICATE KEY UPDATE, handles LIMIT pagination syntax
Oracle parser: Parses ROWNUM pseudo-column, handles DUAL virtual table
SQLServer parser: Parses TOP syntax, handles WITH(NOLOCK) table hints
PostgreSQL parser: Parses RETURNING clause, supports WINDOW window functions

2. Query Optimization

Condition merge processing: Merge multiple conditions on the same field into more efficient expressions
Sharding condition optimization: Identify sharding conditions that can be pushed down
Execution plan improvement: Reduce cross-partition data transmission volume
Special scenario handling: Handle OR conditions with subqueries, optimize complex conditions with JOINs

3. SQL Routing

SQL routing is the core component in distributed database systems, responsible for routing query requests to correct data nodes based on SQL parsing results and user-configured sharding strategies.

1. Sharding Strategy Matching

The system matches based on table’s sharding configuration (such as sharding key, sharding algorithm, etc.) and conditions in the SQL statement.

2. Routing Type Judgment

Shard Routing: Route SQL to specific one or multiple shard nodes
Broadcast Routing: Send SQL to all shard nodes simultaneously

3. Performance Optimization Mechanism

Routing result caching
Sharding metadata local caching
Smart routing selection algorithm

4. SQL Rewriting

SQL rewriting converts the original SQL statement into a form that can be correctly executed in the actual database environment.

1. Correctness Rewriting

Syntax adaptation: Convert database-specific syntax to target database-supported syntax
Function replacement: Replace incompatible functions with target database-supported equivalent functions
Data type conversion: Adjust data type definitions to adapt to target database

2. Optimization Rewriting

Query rewriting: Convert subqueries to join queries
Pagination optimization: Use more efficient pagination for large data volumes

5. SQL Execution

1. Execution Flow

SQL parsing: First perform syntax analysis and semantic check on SQL statement
Execution plan generation: Generate optimal execution plan based on SQL type
Task distribution: Submit SQL execution tasks to thread pool queue
Async execution: Worker threads in thread pool fetch tasks from queue and execute
Result return: After execution completes, return results through callback function

2. Performance Optimization

Use PreparedStatement to reduce parsing overhead
Use pagination queries for large result sets
Set reasonable timeout for time-consuming operations

6. Result Merging

Result merging combines and processes result sets returned from multiple execution nodes.

1. Streaming Merge

Use iterator pattern to fetch data row by row
Typical use cases: ORDER BY sorting merge, GROUP BY grouping merge
Advantages: Low memory usage, suitable for large data volume processing

2. Memory Merge

Need to load all result data into memory for processing
Applicable to: Aggregate function calculations (SUM/AVG/MAX, etc.)

3. Decorator Pattern Additional Merge

Add functionality to original result set without changing it
Result set metadata merge, pagination merge, data verification merge