TL;DR
- Scenario: Incremental build produces multiple small Segments, need to merge to optimize query performance and clean up expired data
- Conclusion: Auto merge + retention policy simplifies operations, JDBC facilitates BI tool integration
- Output: Manual/auto merge operations, retention policy config, JDBC query examples
Manual Merge (Merge Segment)
Operation Steps
- Login to Kylin Web UI
- Select target Cube
- Click Action → Merge
- Select continuous Segments to merge
- Submit Merge Job
Merge Conditions
- Can only merge continuous Segments
- All Segments participating in merge must be in READY status
- No other build tasks allowed during merge
Merge Job Flow
- Extract Fact Table - Read each Segment data
- Merge Cuboid - Merge Cuboid data
- Convert to HBase - Generate new HBase table
- Cleanup - Clean temporary files
Auto Merge
Config Location
Configure in Cube Designer Refresh Settings:
Multi-level Threshold Strategy
Auto Merge Thresholds:
- 7 days: small Segments per day automatically merged into 7 days
- 28 days: Segments per 7 days automatically merged into 28 days
Trigger Mechanism
- Triggers when new Segment becomes READY status
- Attempts merge from largest threshold downward
- Example: Already have 30 days of data, new Segment triggers attempt to merge into 28 days
Config Example
{
"auto_merge_time_ranges": "7,28",
"retention_threshold": 90
}
Retention Threshold
Purpose
Automatically clean up expired Segments no longer in use, reduce storage costs.
Calculation Logic
Difference = Latest Segment end time - Current Segment end time
Difference >= threshold → auto delete
Config Example
{
"retention_threshold": 90 // Retain latest 90 days data
}
Delete Segment Flow
Steps
- Disable Segment - Disable Segment
- Delete Segment - Delete Segment
Notes
- Must Disable before Delete
- After disabling, Segment doesn’t participate in queries
- After deletion, data cannot be recovered
JDBC Connection Query
Maven Dependency
<dependency>
<groupId>org.apache.kylin</groupId>
<artifactId>kylin-jdbc</artifactId>
<version>3.1.1</version>
</dependency>
JDBC Connection Example
import java.sql.*;
public class KylinJDBCDemo {
public static void main(String[] args) throws SQLException {
// Connection URL format: jdbc:kylin://host:port/project
String url = "jdbc:kylin://h122.wzk.icu:7070/wzk_project";
String user = "ADMIN";
String password = "KYLIN";
try (Connection conn = DriverManager.getConnection(url, user, password)) {
String sql = "SELECT dt, region, SUM(amount) AS total_amount " +
"FROM fact_sales " +
"WHERE dt >= '2024-01-01' AND dt <= '2024-01-31' " +
"GROUP BY dt, region " +
"ORDER BY total_amount DESC";
try (Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(sql)) {
while (rs.next()) {
System.out.printf("%s | %s | %.2f%n",
rs.getString("dt"),
rs.getString("region"),
rs.getDouble("total_amount"));
}
}
}
}
}
Query Examples
-- Aggregation query
SELECT dt, category, SUM(amount) AS revenue, COUNT(*) AS orders
FROM fact_sales
JOIN dim_product ON fact_sales.product_id = dim_product.product_id
WHERE dt >= '2024-01-01'
GROUP BY dt, category
ORDER BY revenue DESC
LIMIT 10;
-- Subquery
SELECT region, AVG(avg_amount) AS avg_order_value
FROM (
SELECT region, dt, SUM(amount) AS avg_amount
FROM fact_sales
WHERE dt >= '2024-01-01'
GROUP BY region, dt
) t
GROUP BY region;
Error Quick Reference
| Symptom | Root Cause Location | Fix |
|---|---|---|
| Merge option grayed out | Segments not continuous or status not READY | Check Segment time ranges |
| Auto merge not triggered | Threshold config or trigger condition not met | Check auto_merge_time_ranges config |
| Retention policy not cleaning | Misunderstanding of difference calculation | Confirm threshold vs latest Segment time difference |
| Delete fails | Didn’t execute Disable first | Disable first then Delete |
| JDBC connection failed | Driver/URL/permission issue | Check kylin-jdbc version and network |
| Query returns empty | Project name or Cube name wrong | Confirm project name in connection URL |