Overview

This is a practical verification article for member theme and advertising business pipeline based on Hadoop + Hive + HDFS + DataX + MySQL.

1. Data Testing

1.1 Active Members DWS

sh dws_load_member_start.sh 2020-07-21

1.2 Active Members ADS

sh ads_load_member_active.sh 2020-07-21

1.3 New Members DWS

sh dws_load_member_add_day.sh 2020-07-21

1.4 New Members ADS

sh ads_load_member_add.sh 2020-07-21

1.5 Member Retention DWS

sh dws_load_member_retention_day.sh 2020-07-21

1.6 Member Retention ADS

sh ads_load_member_retention.sh 2020-07-21

2. Export Testing

2.1 HDFS Export

Data is stored on HDFS and can be viewed using HDFS commands.

2.2 DataX Export

sh /opt/wzk/hive/export_member_active_count.sh 2020-07-21

2.3 MySQL Verification

After running data from 07-21 to 07-31, you can query results in MySQL for verification.

3. Advertising Business ODS/DWD/ADS

3.1 Basic Introduction

Internet platforms attract users with free basic services and monetize through advertising business. E-commerce platforms are naturally suitable for product promotion. Advertising focuses on impressions, clicks, purchases, click-through rate, conversion rate and other metrics.

3.2 Event Log Data Sample

{
  "wzk_event": [{
    "name": "goods_detail_loading",
    "json": {
      "entry": "3",
      "goodsid": "0",
      "loading_time": "80",
      "action": "4",
      "staytime": "68",
      "showtype": "4"
    },
    "time": 1596225273755
  }, {
    "name": "ad",
    "json": {
      "duration": "17",
      "ad_action": "0",
      "shop_id": "786",
      "event_type": "ad",
      "ad_type": "4",
      "show_style": "1",
      "product_id": "2772",
      "place": "placeindex_left",
      "sort": "0"
    },
    "time": 1596278404415
  }],
  "attr": {
    "area": "拉萨",
    "uid": "2F10092A86",
    "app_v": "1.1.12",
    "device_id": "1FB872-9A10086",
    "os_type": "4.1",
    "channel": "KS",
    "brand": "xiaomi-2"
  }
}

3.3 Collected Event Types

  • goods_detail_loading: Goods detail page loading
  • loading: Goods list
  • notification: Message notification
  • comment: Product comment
  • favorites: Favorites
  • praise: Like/Praise
  • ad: Advertisement

3.4 Advertising Field Description

FieldDescription
actionUser behavior: 0 impression, 1 click, 2 purchase
durationStay duration
shop_idMerchant id
ad_typeAd type: 1 JPG, 2 PNG, 3 GIF, 4 SWF
show_styleDisplay style: 0 static image, 1 dynamic image
product_idProduct id
placeAd placement: 1 homepage, 2 left, 3 right, 4 list page

3.5 Requirements Metrics

Click Count Statistics (Hourly Statistics):

  • Impression count, distinct user ID count, distinct user count
  • Click count, distinct user ID count, distinct user count
  • Purchase count, distinct user ID count, distinct user count

Conversion Rate - Funnel Analysis:

  • Click-through rate = Click count / Impression count
  • Purchase rate = Purchase count / Click count

Campaign Impression Effect Evaluation:

  • By time period, ad placement, product, count the top N with most impressions.

3.6 ODS Layer Table Creation

use ods;
drop table if exists ods.ods_log_event;
CREATE EXTERNAL TABLE ods.ods_log_event(
  `str` string
) PARTITIONED BY (`dt` string)
STORED AS TEXTFILE
LOCATION '/user/data/logs/event';

3.7 ODS Layer Script

#!/bin/bash
source /etc/profile
if [ -n "$1" ]
then
  do_date=$1
else
  do_date=`date -d "-1 day" +%F`
fi
sql="
alter table ods.ods_log_event add partition (dt='$do_date');
"
hive -e "$sql"

4. Common Problem Troubleshooting

SymptomRoot CauseFix
ADS script execution errorScript name spelling errorCorrect to match actual filename
Hive cannot find today’s dataODS partition not addedExecute ods_load_event_log.sh first to add partition
DataX successful but MySQL has no dataSource path has no files or date inconsistentVerify HDFS path, DataX config and target table
Advertising metrics caliber doesn’t matchaction field value not mapped with SQL logicStandardize caliber: 0=impression, 1=click, 2=purchase