1. Requirements Analysis
Advertising AD Field Description
- action: User behavior; 0 impression; 1 click after impression; 2 purchase
- duration: Stay duration
- shop_id: Merchant id
- event_type: “ad”
- ad_type: Format type; 1 JPG; 2 PNG; 3 GIF; 4 SWF
- show_style: Display style, 0 static image; 1 dynamic image
- product_id: Product id
- place: Ad placement; homepage=1, left=2, right=3, list page=4
- sort: Sort position
Hourly Statistics Requirements
- Impression count, distinct user ID count (uid), distinct user count (device_id)
- Click count, distinct user ID count, distinct user count
- Purchase count, distinct user ID count, distinct user count
2. ADS Layer Table Creation and Data Loading
1. Create ADS Layer Table (Advertising Show Statistics)
USE ods;
DROP TABLE IF EXISTS ads.ads_ad_show;
CREATE TABLE ads.ads_ad_show (
cnt BIGINT,
u_cnt BIGINT,
device_cnt BIGINT,
ad_action TINYINT,
hour STRING
)
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
2. Load ADS Data Script
Filename: ads_load_ad_show.sh
#!/bin/bash
source /etc/profile
if [ -n "$1" ]; then
do_date=$1
else
do_date=$(date -d "-1 day" +%F)
fi
sql="
INSERT OVERWRITE TABLE ads.ads_ad_show
PARTITION (dt='$do_date')
SELECT
COUNT(1) AS cnt,
COUNT(DISTINCT uid) AS u_cnt,
COUNT(DISTINCT device_id) AS device_cnt,
ad_action,
hour
FROM dwd.dwd_ad
WHERE dt='$do_date'
GROUP BY ad_action, hour;
"
hive -e "$sql"
3. Funnel Analysis (Click Rate Purchase Rate)
1. Key Metrics Description
- Conversion Rate: Current stage user count / Previous stage user count
- Drop-off Rate: 1 - Conversion Rate
- Click Rate = Click Count / Impression Count
- Purchase Rate = Click Count / Click Count
2. Create ADS Layer Table (Funnel Rate)
use ads;
drop table if exists ads.ads_ad_show_rate;
create table ads.ads_ad_show_rate(
hour string,
click_rate double,
buy_rate double
) PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
3. Row-to-Column Implementation Methods
Method 1: Using SUM
SELECT
SUM(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
SUM(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
SUM(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
hour
FROM ads.ads_ad_show
WHERE dt='2020-08-02' AND hour='01'
GROUP BY hour;
Method 2: Using MAX
SELECT
MAX(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
MAX(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
MAX(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
hour
FROM ads.ads_ad_show
WHERE dt='2020-08-02' AND hour='01'
GROUP BY hour;
4. Load ADS Layer Data Script
Filename: ads_load_ad_show_rate.sh
#!/bin/bash
source /etc/profile
if [ -n "$1" ]; then
do_date=$1
else
do_date=$(date -d "-1 day" +%F)
fi
sql="
WITH tmp AS (
SELECT
MAX(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
MAX(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
MAX(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
hour
FROM ads.ads_ad_show
WHERE dt='$do_date'
GROUP BY hour
)
INSERT OVERWRITE TABLE ads.ads_ad_show_rate
PARTITION (dt='$do_date')
SELECT
hour,
CASE WHEN show_cnt > 0 THEN click_cnt / show_cnt ELSE 0 END AS click_rate,
CASE WHEN click_cnt > 0 THEN buy_cnt / click_cnt ELSE 0 END AS buy_rate
FROM tmp;
"
hive -e "$sql"
4. Advertising Effect Analysis (Top100 Ranking)
1. Requirements Analysis
By time period, ad placement, product, count impression counts, get top 100 products with most impressions for each ad placement per hour.
2. Create ADS Layer Table
use ads;
drop table if exists ads.ads_ad_show_place;
create table ads.ads_ad_show_place(
ad_action tinyint,
hour string,
place string,
product_id int,
cnt bigint
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
drop table if exists ads.ads_ad_show_place_window;
create table ads.ads_ad_show_place_window(
hour string,
place string,
product_id int,
cnt bigint,
rank int
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
3. Load ADS Layer Data Script
Filename: ads_load_ad_show_page.sh
#!/bin/bash
source /etc/profile
if [ -n "$1" ]; then
do_date=$1
else
do_date=$(date -d "-1 day" +%F)
fi
sql="
INSERT OVERWRITE TABLE ads.ads_ad_show_place
PARTITION (dt='$do_date')
SELECT
ad_action,
hour,
place,
product_id,
COUNT(1) AS cnt
FROM dwd.dwd_ad
WHERE dt='$do_date'
GROUP BY
ad_action,
hour,
place,
product_id;
"
hive -e "$sql"
Filename: ads_load_ad_show_page_window.sh
#!/bin/bash
source /etc/profile
if [ -n "$1" ]; then
do_date=$1
else
do_date=$(date -d "-1 day" +%F)
fi
sql="
INSERT OVERWRITE TABLE ads.ads_ad_show_place_window
PARTITION (dt='$do_date')
SELECT *
FROM (
SELECT
hour,
place,
product_id,
cnt,
ROW_NUMBER() OVER (PARTITION BY hour, place, product_id ORDER BY cnt DESC) AS rank
FROM ads.ads_ad_show_place
WHERE dt='$do_date' AND ad_action='0'
) t
WHERE rank <= 100;
"
hive -e "$sql"
5. Error Troubleshooting Table
| Symptom | Root Cause | Fix |
|---|---|---|
Clearly writing ADS table but first USE ods; | Database context mixed | Unify to USE ads; or use explicit database name throughout |
| Top100 result incorrect, rank almost always 1 | Window partition dimension wrong | PARTITION BY hour, place not including product_id |
ad_action='0' runs but not standard | Field is TINYINT but compared as string | Change to ad_action = 0 |
| Funnel statistics missing results in NULL | CASE WHEN...END didn’t write ELSE 0 | Add ELSE 0 |
| Click rate/purchase rate results unstable | Integer aggregation has type inference difference | Explicitly use CAST(click_cnt AS DOUBLE) / show_cnt |
| Script filename inconsistent with business meaning | page/place naming confused | Unify to ..._place.sh |
6. Summary
This article details the advertising business ADS layer analysis practice based on Hive offline data warehouse, including:
- Hourly Statistics: Basic statistics for advertising impressions, clicks, purchases
- Funnel Analysis: Click-through rate (CTR) and purchase rate (CVR) calculation
- Top100 Ranking: Using window functions to implement Top100 products by impressions for each ad placement per hour
It also summarizes common engineering problems and solutions to ensure accuracy in data analysis and troubleshooting.