一、需求分析
广告 AD 字段说明
- action: 用户行为;0 曝光;1 曝光后点击;2 购买
- duration: 停留时长
- shop_id: 商家 id
- event_type: “ad”
- ad_type: 格式类型;1 JPG;2 PNG;3 GIF;4 SWF
- show_style: 显示风格,0 静态图;1 动态图
- product_id: 产品 id
- place: 广告位置;首页=1,左侧=2,右侧=3,列表页=4
- sort: 排序位置
分时统计需求
- 曝光次数、不同用户 ID 数(uid)、不同用户数(device_id)
- 点击次数、不同用户 ID 数、不同用户数
- 购买次数、不同用户 ID 数、不同用户数
二、ADS 层建表与数据装载
1. 创建 ADS 层表(广告展示统计)
USE ods;
DROP TABLE IF EXISTS ads.ads_ad_show;
CREATE TABLE ads.ads_ad_show (
cnt BIGINT,
u_cnt BIGINT,
device_cnt BIGINT,
ad_action TINYINT,
hour STRING
)
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
2. 加载 ADS 数据脚本
文件名: ads_load_ad_show.sh
#!/bin/bash
source /etc/profile
if [ -n "$1" ]; then
do_date=$1
else
do_date=$(date -d "-1 day" +%F)
fi
sql="
INSERT OVERWRITE TABLE ads.ads_ad_show
PARTITION (dt='$do_date')
SELECT
COUNT(1) AS cnt,
COUNT(DISTINCT uid) AS u_cnt,
COUNT(DISTINCT device_id) AS device_cnt,
ad_action,
hour
FROM dwd.dwd_ad
WHERE dt='$do_date'
GROUP BY ad_action, hour;
"
hive -e "$sql"
三、漏斗分析(点击率购买率)
1. 关键指标说明
- 转化率(Conversion Rate):当前阶段用户数 / 上一阶段用户数
- 流失率(Drop-off Rate):1 - 转化率
- 点击率 = 点击次数 / 曝光次数
- 购买率 = 点击次数 / 点击次数
2. 创建 ADS 层表(漏斗率)
use ads;
drop table if exists ads.ads_ad_show_rate;
create table ads.ads_ad_show_rate(
hour string,
click_rate double,
buy_rate double
) PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
3. 行转列实现方法
方法一:使用 SUM
SELECT
SUM(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
SUM(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
SUM(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
hour
FROM ads.ads_ad_show
WHERE dt='2020-08-02' AND hour='01'
GROUP BY hour;
方法二:使用 MAX
SELECT
MAX(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
MAX(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
MAX(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
hour
FROM ads.ads_ad_show
WHERE dt='2020-08-02' AND hour='01'
GROUP BY hour;
4. 加载 ADS 层数据脚本
文件名: ads_load_ad_show_rate.sh
#!/bin/bash
source /etc/profile
if [ -n "$1" ]; then
do_date=$1
else
do_date=$(date -d "-1 day" +%F)
fi
sql="
WITH tmp AS (
SELECT
MAX(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
MAX(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
MAX(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
hour
FROM ads.ads_ad_show
WHERE dt='$do_date'
GROUP BY hour
)
INSERT OVERWRITE TABLE ads.ads_ad_show_rate
PARTITION (dt='$do_date')
SELECT
hour,
CASE WHEN show_cnt > 0 THEN click_cnt / show_cnt ELSE 0 END AS click_rate,
CASE WHEN click_cnt > 0 THEN buy_cnt / click_cnt ELSE 0 END AS buy_rate
FROM tmp;
"
hive -e "$sql"
四、广告效果分析(Top100 排名)
1. 需求分析
按时间段、广告位、商品,统计曝光次数,获取每小时内每个广告位曝光最多的前 100 个商品。
2. 创建 ADS 层表
use ads;
drop table if exists ads.ads_ad_show_place;
create table ads.ads_ad_show_place(
ad_action tinyint,
hour string,
place string,
product_id int,
cnt bigint
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
drop table if exists ads.ads_ad_show_place_window;
create table ads.ads_ad_show_place_window(
hour string,
place string,
product_id int,
cnt bigint,
rank int
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';
3. 加载 ADS 层数据脚本
文件名: ads_load_ad_show_page.sh
#!/bin/bash
source /etc/profile
if [ -n "$1" ]; then
do_date=$1
else
do_date=$(date -d "-1 day" +%F)
fi
sql="
INSERT OVERWRITE TABLE ads.ads_ad_show_place
PARTITION (dt='$do_date')
SELECT
ad_action,
hour,
place,
product_id,
COUNT(1) AS cnt
FROM dwd.dwd_ad
WHERE dt='$do_date'
GROUP BY
ad_action,
hour,
place,
product_id;
"
hive -e "$sql"
文件名: ads_load_ad_show_page_window.sh
#!/bin/bash
source /etc/profile
if [ -n "$1" ]; then
do_date=$1
else
do_date=$(date -d "-1 day" +%F)
fi
sql="
INSERT OVERWRITE TABLE ads.ads_ad_show_place_window
PARTITION (dt='$do_date')
SELECT *
FROM (
SELECT
hour,
place,
product_id,
cnt,
ROW_NUMBER() OVER (PARTITION BY hour, place, product_id ORDER BY cnt DESC) AS rank
FROM ads.ads_ad_show_place
WHERE dt='$do_date' AND ad_action='0'
) t
WHERE rank <= 100;
"
hive -e "$sql"
五、错误速查表
| 症状 | 根因 | 定位修复 |
|---|---|---|
明明在写 ADS 表,却先 USE ods; | 库上下文混用 | 统一使用 USE ads;,或全程显式库名 |
| Top100 结果不对,rank 几乎总是 1 | 窗口分区维度错误 | PARTITION BY hour, place 而非包含 product_id |
ad_action='0' 可跑但不规范 | 字段为 TINYINT 却按字符串比较 | 改为 ad_action = 0 |
| 漏斗统计缺失时结果为 NULL | CASE WHEN...END 未写 ELSE 0 | 添加 ELSE 0 |
| 点击率/购买率结果不稳定 | 整型聚合存在类型推断差异 | 显式 CAST(click_cnt AS DOUBLE) / show_cnt |
| 脚本文件名与业务含义不一致 | page/place 命名混乱 | 统一改为 ..._place.sh |
六、总结
本文详细介绍了基于 Hive 离线数仓的广告业务 ADS 层分析实战,包括:
- 分时统计:广告曝光、点击、购买的基础统计
- 漏斗分析:点击率(CTR)和购买率(CVR)计算
- Top100 排名:利用窗口函数实现每小时每广告位的曝光 TOP100 商品
同时总结了常见的工程问题与解决方案,确保数据分析和排障的准确性。