Overview
This is a practical verification article for member theme and advertising business pipeline based on Hadoop + Hive + HDFS + DataX + MySQL.
1. Data Testing
1.1 Active Members DWS
sh dws_load_member_start.sh 2020-07-21
1.2 Active Members ADS
sh ads_load_member_active.sh 2020-07-21
1.3 New Members DWS
sh dws_load_member_add_day.sh 2020-07-21
1.4 New Members ADS
sh ads_load_member_add.sh 2020-07-21
1.5 Member Retention DWS
sh dws_load_member_retention_day.sh 2020-07-21
1.6 Member Retention ADS
sh ads_load_member_retention.sh 2020-07-21
2. Export Testing
2.1 HDFS Export
Data is stored on HDFS and can be viewed using HDFS commands.
2.2 DataX Export
sh /opt/wzk/hive/export_member_active_count.sh 2020-07-21
2.3 MySQL Verification
After running data from 07-21 to 07-31, you can query results in MySQL for verification.
3. Advertising Business ODS/DWD/ADS
3.1 Basic Introduction
Internet platforms attract users with free basic services and monetize through advertising business. E-commerce platforms are naturally suitable for product promotion. Advertising focuses on impressions, clicks, purchases, click-through rate, conversion rate and other metrics.
3.2 Event Log Data Sample
{
"wzk_event": [{
"name": "goods_detail_loading",
"json": {
"entry": "3",
"goodsid": "0",
"loading_time": "80",
"action": "4",
"staytime": "68",
"showtype": "4"
},
"time": 1596225273755
}, {
"name": "ad",
"json": {
"duration": "17",
"ad_action": "0",
"shop_id": "786",
"event_type": "ad",
"ad_type": "4",
"show_style": "1",
"product_id": "2772",
"place": "placeindex_left",
"sort": "0"
},
"time": 1596278404415
}],
"attr": {
"area": "拉萨",
"uid": "2F10092A86",
"app_v": "1.1.12",
"device_id": "1FB872-9A10086",
"os_type": "4.1",
"channel": "KS",
"brand": "xiaomi-2"
}
}
3.3 Collected Event Types
- goods_detail_loading: Goods detail page loading
- loading: Goods list
- notification: Message notification
- comment: Product comment
- favorites: Favorites
- praise: Like/Praise
- ad: Advertisement
3.4 Advertising Field Description
| Field | Description |
|---|---|
| action | User behavior: 0 impression, 1 click, 2 purchase |
| duration | Stay duration |
| shop_id | Merchant id |
| ad_type | Ad type: 1 JPG, 2 PNG, 3 GIF, 4 SWF |
| show_style | Display style: 0 static image, 1 dynamic image |
| product_id | Product id |
| place | Ad placement: 1 homepage, 2 left, 3 right, 4 list page |
3.5 Requirements Metrics
Click Count Statistics (Hourly Statistics):
- Impression count, distinct user ID count, distinct user count
- Click count, distinct user ID count, distinct user count
- Purchase count, distinct user ID count, distinct user count
Conversion Rate - Funnel Analysis:
- Click-through rate = Click count / Impression count
- Purchase rate = Purchase count / Click count
Campaign Impression Effect Evaluation:
- By time period, ad placement, product, count the top N with most impressions.
3.6 ODS Layer Table Creation
use ods;
drop table if exists ods.ods_log_event;
CREATE EXTERNAL TABLE ods.ods_log_event(
`str` string
) PARTITIONED BY (`dt` string)
STORED AS TEXTFILE
LOCATION '/user/data/logs/event';
3.7 ODS Layer Script
#!/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
alter table ods.ods_log_event add partition (dt='$do_date');
"
hive -e "$sql"
4. Common Problem Troubleshooting
| Symptom | Root Cause | Fix |
|---|---|---|
| ADS script execution error | Script name spelling error | Correct to match actual filename |
| Hive cannot find today’s data | ODS partition not added | Execute ods_load_event_log.sh first to add partition |
| DataX successful but MySQL has no data | Source path has no files or date inconsistent | Verify HDFS path, DataX config and target table |
| Advertising metrics caliber doesn’t match | action field value not mapped with SQL logic | Standardize caliber: 0=impression, 1=click, 2=purchase |