1. Requirements Analysis

Advertising AD Field Description

  • action: User behavior; 0 impression; 1 click after impression; 2 purchase
  • duration: Stay duration
  • shop_id: Merchant id
  • event_type: “ad”
  • ad_type: Format type; 1 JPG; 2 PNG; 3 GIF; 4 SWF
  • show_style: Display style, 0 static image; 1 dynamic image
  • product_id: Product id
  • place: Ad placement; homepage=1, left=2, right=3, list page=4
  • sort: Sort position

Hourly Statistics Requirements

  • Impression count, distinct user ID count (uid), distinct user count (device_id)
  • Click count, distinct user ID count, distinct user count
  • Purchase count, distinct user ID count, distinct user count

2. ADS Layer Table Creation and Data Loading

1. Create ADS Layer Table (Advertising Show Statistics)

USE ods;

DROP TABLE IF EXISTS ads.ads_ad_show;

CREATE TABLE ads.ads_ad_show (
    cnt BIGINT,
    u_cnt BIGINT,
    device_cnt BIGINT,
    ad_action TINYINT,
    hour STRING
)
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

2. Load ADS Data Script

Filename: ads_load_ad_show.sh

#!/bin/bash

source /etc/profile

if [ -n "$1" ]; then
  do_date=$1
else
  do_date=$(date -d "-1 day" +%F)
fi

sql="
INSERT OVERWRITE TABLE ads.ads_ad_show
PARTITION (dt='$do_date')
SELECT
  COUNT(1) AS cnt,
  COUNT(DISTINCT uid) AS u_cnt,
  COUNT(DISTINCT device_id) AS device_cnt,
  ad_action,
  hour
FROM dwd.dwd_ad
WHERE dt='$do_date'
GROUP BY ad_action, hour;
"

hive -e "$sql"

3. Funnel Analysis (Click Rate Purchase Rate)

1. Key Metrics Description

  • Conversion Rate: Current stage user count / Previous stage user count
  • Drop-off Rate: 1 - Conversion Rate
  • Click Rate = Click Count / Impression Count
  • Purchase Rate = Click Count / Click Count

2. Create ADS Layer Table (Funnel Rate)

use ads;
drop table if exists ads.ads_ad_show_rate;
create table ads.ads_ad_show_rate(
  hour string,
  click_rate double,
  buy_rate double
) PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';

3. Row-to-Column Implementation Methods

Method 1: Using SUM

SELECT
    SUM(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
    SUM(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
    SUM(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
    hour
FROM ads.ads_ad_show
WHERE dt='2020-08-02' AND hour='01'
GROUP BY hour;

Method 2: Using MAX

SELECT
    MAX(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
    MAX(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
    MAX(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
    hour
FROM ads.ads_ad_show
WHERE dt='2020-08-02' AND hour='01'
GROUP BY hour;

4. Load ADS Layer Data Script

Filename: ads_load_ad_show_rate.sh

#!/bin/bash

source /etc/profile

if [ -n "$1" ]; then
  do_date=$1
else
  do_date=$(date -d "-1 day" +%F)
fi

sql="
WITH tmp AS (
  SELECT
    MAX(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
    MAX(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
    MAX(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
    hour
  FROM ads.ads_ad_show
  WHERE dt='$do_date'
  GROUP BY hour
)
INSERT OVERWRITE TABLE ads.ads_ad_show_rate
PARTITION (dt='$do_date')
SELECT
  hour,
  CASE WHEN show_cnt > 0 THEN click_cnt / show_cnt ELSE 0 END AS click_rate,
  CASE WHEN click_cnt > 0 THEN buy_cnt / click_cnt ELSE 0 END AS buy_rate
FROM tmp;
"

hive -e "$sql"

4. Advertising Effect Analysis (Top100 Ranking)

1. Requirements Analysis

By time period, ad placement, product, count impression counts, get top 100 products with most impressions for each ad placement per hour.

2. Create ADS Layer Table

use ads;
drop table if exists ads.ads_ad_show_place;
create table ads.ads_ad_show_place(
  ad_action tinyint,
  hour string,
  place string,
  product_id int,
  cnt bigint
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';

drop table if exists ads.ads_ad_show_place_window;
create table ads.ads_ad_show_place_window(
  hour string,
  place string,
  product_id int,
  cnt bigint,
  rank int
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';

3. Load ADS Layer Data Script

Filename: ads_load_ad_show_page.sh

#!/bin/bash

source /etc/profile

if [ -n "$1" ]; then
  do_date=$1
else
  do_date=$(date -d "-1 day" +%F)
fi

sql="
INSERT OVERWRITE TABLE ads.ads_ad_show_place
PARTITION (dt='$do_date')
SELECT
  ad_action,
  hour,
  place,
  product_id,
  COUNT(1) AS cnt
FROM dwd.dwd_ad
WHERE dt='$do_date'
GROUP BY
  ad_action,
  hour,
  place,
  product_id;
"

hive -e "$sql"

Filename: ads_load_ad_show_page_window.sh

#!/bin/bash

source /etc/profile

if [ -n "$1" ]; then
  do_date=$1
else
  do_date=$(date -d "-1 day" +%F)
fi

sql="
INSERT OVERWRITE TABLE ads.ads_ad_show_place_window
PARTITION (dt='$do_date')
SELECT *
FROM (
  SELECT
    hour,
    place,
    product_id,
    cnt,
    ROW_NUMBER() OVER (PARTITION BY hour, place, product_id ORDER BY cnt DESC) AS rank
  FROM ads.ads_ad_show_place
  WHERE dt='$do_date' AND ad_action='0'
) t
WHERE rank <= 100;
"

hive -e "$sql"

5. Error Troubleshooting Table

SymptomRoot CauseFix
Clearly writing ADS table but first USE ods;Database context mixedUnify to USE ads; or use explicit database name throughout
Top100 result incorrect, rank almost always 1Window partition dimension wrongPARTITION BY hour, place not including product_id
ad_action='0' runs but not standardField is TINYINT but compared as stringChange to ad_action = 0
Funnel statistics missing results in NULLCASE WHEN...END didn’t write ELSE 0Add ELSE 0
Click rate/purchase rate results unstableInteger aggregation has type inference differenceExplicitly use CAST(click_cnt AS DOUBLE) / show_cnt
Script filename inconsistent with business meaningpage/place naming confusedUnify to ..._place.sh

6. Summary

This article details the advertising business ADS layer analysis practice based on Hive offline data warehouse, including:

  1. Hourly Statistics: Basic statistics for advertising impressions, clicks, purchases
  2. Funnel Analysis: Click-through rate (CTR) and purchase rate (CVR) calculation
  3. Top100 Ranking: Using window functions to implement Top100 products by impressions for each ad placement per hour

It also summarizes common engineering problems and solutions to ensure accuracy in data analysis and troubleshooting.