TL;DR

  • Scenario: Incremental build produces multiple small Segments, need to merge to optimize query performance and clean up expired data
  • Conclusion: Auto merge + retention policy simplifies operations, JDBC facilitates BI tool integration
  • Output: Manual/auto merge operations, retention policy config, JDBC query examples

Manual Merge (Merge Segment)

Operation Steps

  1. Login to Kylin Web UI
  2. Select target Cube
  3. Click ActionMerge
  4. Select continuous Segments to merge
  5. Submit Merge Job

Merge Conditions

  • Can only merge continuous Segments
  • All Segments participating in merge must be in READY status
  • No other build tasks allowed during merge

Merge Job Flow

  1. Extract Fact Table - Read each Segment data
  2. Merge Cuboid - Merge Cuboid data
  3. Convert to HBase - Generate new HBase table
  4. Cleanup - Clean temporary files

Auto Merge

Config Location

Configure in Cube Designer Refresh Settings:

Multi-level Threshold Strategy

Auto Merge Thresholds:
- 7 days: small Segments per day automatically merged into 7 days
- 28 days: Segments per 7 days automatically merged into 28 days

Trigger Mechanism

  1. Triggers when new Segment becomes READY status
  2. Attempts merge from largest threshold downward
  3. Example: Already have 30 days of data, new Segment triggers attempt to merge into 28 days

Config Example

{
  "auto_merge_time_ranges": "7,28",
  "retention_threshold": 90
}

Retention Threshold

Purpose

Automatically clean up expired Segments no longer in use, reduce storage costs.

Calculation Logic

Difference = Latest Segment end time - Current Segment end time
Difference >= threshold → auto delete

Config Example

{
  "retention_threshold": 90  // Retain latest 90 days data
}

Delete Segment Flow

Steps

  1. Disable Segment - Disable Segment
  2. Delete Segment - Delete Segment

Notes

  • Must Disable before Delete
  • After disabling, Segment doesn’t participate in queries
  • After deletion, data cannot be recovered

JDBC Connection Query

Maven Dependency

<dependency>
    <groupId>org.apache.kylin</groupId>
    <artifactId>kylin-jdbc</artifactId>
    <version>3.1.1</version>
</dependency>

JDBC Connection Example

import java.sql.*;

public class KylinJDBCDemo {
    public static void main(String[] args) throws SQLException {
        // Connection URL format: jdbc:kylin://host:port/project
        String url = "jdbc:kylin://h122.wzk.icu:7070/wzk_project";
        String user = "ADMIN";
        String password = "KYLIN";

        try (Connection conn = DriverManager.getConnection(url, user, password)) {
            String sql = "SELECT dt, region, SUM(amount) AS total_amount " +
                        "FROM fact_sales " +
                        "WHERE dt >= '2024-01-01' AND dt <= '2024-01-31' " +
                        "GROUP BY dt, region " +
                        "ORDER BY total_amount DESC";

            try (Statement stmt = conn.createStatement();
                 ResultSet rs = stmt.executeQuery(sql)) {

                while (rs.next()) {
                    System.out.printf("%s | %s | %.2f%n",
                        rs.getString("dt"),
                        rs.getString("region"),
                        rs.getDouble("total_amount"));
                }
            }
        }
    }
}

Query Examples

-- Aggregation query
SELECT dt, category, SUM(amount) AS revenue, COUNT(*) AS orders
FROM fact_sales
JOIN dim_product ON fact_sales.product_id = dim_product.product_id
WHERE dt >= '2024-01-01'
GROUP BY dt, category
ORDER BY revenue DESC
LIMIT 10;

-- Subquery
SELECT region, AVG(avg_amount) AS avg_order_value
FROM (
    SELECT region, dt, SUM(amount) AS avg_amount
    FROM fact_sales
    WHERE dt >= '2024-01-01'
    GROUP BY region, dt
) t
GROUP BY region;

Error Quick Reference

SymptomRoot Cause LocationFix
Merge option grayed outSegments not continuous or status not READYCheck Segment time ranges
Auto merge not triggeredThreshold config or trigger condition not metCheck auto_merge_time_ranges config
Retention policy not cleaningMisunderstanding of difference calculationConfirm threshold vs latest Segment time difference
Delete failsDidn’t execute Disable firstDisable first then Delete
JDBC connection failedDriver/URL/permission issueCheck kylin-jdbc version and network
Query returns emptyProject name or Cube name wrongConfirm project name in connection URL