Article Overview
This article introduces the complete solution for exporting Hive ADS layer tables to MySQL using DataX’s hdfsreader + mysqlwriter combination.
Architecture Description
The article describes the overall offline data warehouse process: ODS -> DWD -> DWS -> ADS -> DataX export to MySQL. The ADS layer has 4 tables to export:
- ads.ads_member_active_count (Active Member Count)
- ads.ads_member_retention_count (Member Retention Count)
- ads.ads_member_retention_rate (Member Retention Rate)
- ads.ads_new_member_cnt (New Member Count)
MySQL Table Creation Statement
-- Create database and table
create database dwads;
-- Active member count table
create table dwads.ads_member_active_count(
`dt` varchar(10) COMMENT 'Statistics date',
`day_count` int COMMENT 'Member count for the day',
`week_count` int COMMENT 'Member count for the week',
`month_count` int COMMENT 'Member count for the month',
primary key (dt)
);
-- New member count table
create table dwads.ads_new_member_cnt(
`dt` varchar(10) COMMENT 'Statistics date',
`cnt` int,
primary key (dt)
);
-- Member retention count table
create table dwads.ads_member_retention_count(
`dt` varchar(10) COMMENT 'Statistics date',
`add_date` varchar(10) comment 'Addition date',
`retention_day` int comment 'Retention days up to current date',
`retention_count` bigint comment 'Retention count',
primary key (dt)
);
-- Member retention rate table
create table dwads.ads_member_retention_rate(
`dt` varchar(10) COMMENT 'Statistics date',
`add_date` varchar(10) comment 'Addition date',
`retention_day` int comment 'Retention days up to current date',
`retention_count` bigint comment 'Retention count',
`new_mid_count` bigint comment 'New member count for that day',
`retention_ratio` decimal(10,2) comment 'Retention rate',
primary key (dt)
);
DataX JSON Configuration Example
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [{
"reader": {
"name": "hdfsreader",
"parameter": {
"path": "/user/hive/warehouse/ads.db/ads_member_active_count/dt=$do_date/*",
"defaultFS": "hdfs://h121.wzk.icu:9000",
"column": [
{"type": "string", "value": "$do_date"},
{"index": 0, "type": "string"},
{"index": 1, "type": "string"},
{"index": 2, "type": "string"}
],
"fileType": "text",
"encoding": "UTF-8",
"fieldDelimiter": ","
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "replace",
"username": "hive",
"password": "hive@wzk.icu",
"column": ["dt","day_count","week_count","month_count"],
"preSql": [""],
"connection": [{
"jdbcUrl": "jdbc:mysql://h122.wzk.icu:3306/dwads?useUnicode=true&characterEncoding=utf-8",
"table": ["ads_member_active_count"]
}]
}
}
}]
}
}
Execution Command
python datax.py -p "-Ddo_date=2020-07-21" /opt/wzk/datax/export_member_active_count.json
Shell Script Template
#!/bin/bash
JSON=/opt/wzk/datax
source /etc/profile
if [ -n "$1" ]; then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" $JSON/export_member_active_count.json
Error Troubleshooting Table
| Symptom | Root Cause | Fix |
|---|---|---|
| DataX runs successfully but MySQL has no data | HDFS path doesn’t match partition directory | Check DataX log reader side record count; verify HDFS directory exists; confirm -Ddo_date parameter passed |
| FileNotFoundException | path wrong/partition directory missing | Verify path with hdfs dfs -ls; ensure Hive table has corresponding partition data |
| Unknown database/table | MySQL database/table not created | Create database and table first; keep jdbcUrl database name consistent |
| Data truncation error | Column type/order mismatch | Verify reader column index matches writer column order |
| replace overwrite abnormal | Primary key design unreasonable | Design composite key for multi-dimensional detail table or use insert |
| JSON parsing failed | JSON format invalid | Write jdbcUrl as single-line string; validate JSON with jq |
| Communications link failure | Network/port/permission issue | Test port with telnet; check account password and authorization |
| datax.py: command not found | DATAX_HOME not effective | Confirm environment variable; use absolute path |
| Data read but dt is wrong | Parameter not passed or variable name inconsistent | Check -Ddo_date matches variable name in JSON |
Important Notes
- Version Compatibility: DataX 3.x plugin parameters may vary slightly between distributions
- Storage Format: hdfsreader can only read text files, does not support ORC/Parquet format
- Kerberos Scenario: Additional configuration for authentication and HA related parameters needed
- writeMode Selection: replace depends on primary key/unique key, need reasonable table structure design |