一、需求分析

广告 AD 字段说明

  • action: 用户行为;0 曝光;1 曝光后点击;2 购买
  • duration: 停留时长
  • shop_id: 商家 id
  • event_type: “ad”
  • ad_type: 格式类型;1 JPG;2 PNG;3 GIF;4 SWF
  • show_style: 显示风格,0 静态图;1 动态图
  • product_id: 产品 id
  • place: 广告位置;首页=1,左侧=2,右侧=3,列表页=4
  • sort: 排序位置

分时统计需求

  • 曝光次数、不同用户 ID 数(uid)、不同用户数(device_id)
  • 点击次数、不同用户 ID 数、不同用户数
  • 购买次数、不同用户 ID 数、不同用户数

二、ADS 层建表与数据装载

1. 创建 ADS 层表(广告展示统计)

USE ods;

DROP TABLE IF EXISTS ads.ads_ad_show;

CREATE TABLE ads.ads_ad_show (
    cnt BIGINT,
    u_cnt BIGINT,
    device_cnt BIGINT,
    ad_action TINYINT,
    hour STRING
)
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

2. 加载 ADS 数据脚本

文件名: ads_load_ad_show.sh

#!/bin/bash

source /etc/profile

if [ -n "$1" ]; then
  do_date=$1
else
  do_date=$(date -d "-1 day" +%F)
fi

sql="
INSERT OVERWRITE TABLE ads.ads_ad_show
PARTITION (dt='$do_date')
SELECT
  COUNT(1) AS cnt,
  COUNT(DISTINCT uid) AS u_cnt,
  COUNT(DISTINCT device_id) AS device_cnt,
  ad_action,
  hour
FROM dwd.dwd_ad
WHERE dt='$do_date'
GROUP BY ad_action, hour;
"

hive -e "$sql"

三、漏斗分析(点击率购买率)

1. 关键指标说明

  • 转化率(Conversion Rate):当前阶段用户数 / 上一阶段用户数
  • 流失率(Drop-off Rate):1 - 转化率
  • 点击率 = 点击次数 / 曝光次数
  • 购买率 = 点击次数 / 点击次数

2. 创建 ADS 层表(漏斗率)

use ads;
drop table if exists ads.ads_ad_show_rate;
create table ads.ads_ad_show_rate(
  hour string,
  click_rate double,
  buy_rate double
) PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';

3. 行转列实现方法

方法一:使用 SUM

SELECT
    SUM(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
    SUM(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
    SUM(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
    hour
FROM ads.ads_ad_show
WHERE dt='2020-08-02' AND hour='01'
GROUP BY hour;

方法二:使用 MAX

SELECT
    MAX(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
    MAX(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
    MAX(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
    hour
FROM ads.ads_ad_show
WHERE dt='2020-08-02' AND hour='01'
GROUP BY hour;

4. 加载 ADS 层数据脚本

文件名: ads_load_ad_show_rate.sh

#!/bin/bash

source /etc/profile

if [ -n "$1" ]; then
  do_date=$1
else
  do_date=$(date -d "-1 day" +%F)
fi

sql="
WITH tmp AS (
  SELECT
    MAX(CASE WHEN ad_action='0' THEN cnt END) AS show_cnt,
    MAX(CASE WHEN ad_action='1' THEN cnt END) AS click_cnt,
    MAX(CASE WHEN ad_action='2' THEN cnt END) AS buy_cnt,
    hour
  FROM ads.ads_ad_show
  WHERE dt='$do_date'
  GROUP BY hour
)
INSERT OVERWRITE TABLE ads.ads_ad_show_rate
PARTITION (dt='$do_date')
SELECT
  hour,
  CASE WHEN show_cnt > 0 THEN click_cnt / show_cnt ELSE 0 END AS click_rate,
  CASE WHEN click_cnt > 0 THEN buy_cnt / click_cnt ELSE 0 END AS buy_rate
FROM tmp;
"

hive -e "$sql"

四、广告效果分析(Top100 排名)

1. 需求分析

按时间段、广告位、商品,统计曝光次数,获取每小时内每个广告位曝光最多的前 100 个商品。

2. 创建 ADS 层表

use ads;
drop table if exists ads.ads_ad_show_place;
create table ads.ads_ad_show_place(
  ad_action tinyint,
  hour string,
  place string,
  product_id int,
  cnt bigint
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';

drop table if exists ads.ads_ad_show_place_window;
create table ads.ads_ad_show_place_window(
  hour string,
  place string,
  product_id int,
  cnt bigint,
  rank int
)PARTITIONED BY (`dt` string)
row format delimited fields terminated by ',';

3. 加载 ADS 层数据脚本

文件名: ads_load_ad_show_page.sh

#!/bin/bash

source /etc/profile

if [ -n "$1" ]; then
  do_date=$1
else
  do_date=$(date -d "-1 day" +%F)
fi

sql="
INSERT OVERWRITE TABLE ads.ads_ad_show_place
PARTITION (dt='$do_date')
SELECT
  ad_action,
  hour,
  place,
  product_id,
  COUNT(1) AS cnt
FROM dwd.dwd_ad
WHERE dt='$do_date'
GROUP BY
  ad_action,
  hour,
  place,
  product_id;
"

hive -e "$sql"

文件名: ads_load_ad_show_page_window.sh

#!/bin/bash

source /etc/profile

if [ -n "$1" ]; then
  do_date=$1
else
  do_date=$(date -d "-1 day" +%F)
fi

sql="
INSERT OVERWRITE TABLE ads.ads_ad_show_place_window
PARTITION (dt='$do_date')
SELECT *
FROM (
  SELECT
    hour,
    place,
    product_id,
    cnt,
    ROW_NUMBER() OVER (PARTITION BY hour, place, product_id ORDER BY cnt DESC) AS rank
  FROM ads.ads_ad_show_place
  WHERE dt='$do_date' AND ad_action='0'
) t
WHERE rank <= 100;
"

hive -e "$sql"

五、错误速查表

症状根因定位修复
明明在写 ADS 表,却先 USE ods;库上下文混用统一使用 USE ads;,或全程显式库名
Top100 结果不对,rank 几乎总是 1窗口分区维度错误PARTITION BY hour, place 而非包含 product_id
ad_action='0' 可跑但不规范字段为 TINYINT 却按字符串比较改为 ad_action = 0
漏斗统计缺失时结果为 NULLCASE WHEN...END 未写 ELSE 0添加 ELSE 0
点击率/购买率结果不稳定整型聚合存在类型推断差异显式 CAST(click_cnt AS DOUBLE) / show_cnt
脚本文件名与业务含义不一致page/place 命名混乱统一改为 ..._place.sh

六、总结

本文详细介绍了基于 Hive 离线数仓的广告业务 ADS 层分析实战,包括:

  1. 分时统计:广告曝光、点击、购买的基础统计
  2. 漏斗分析:点击率(CTR)和购买率(CVR)计算
  3. Top100 排名:利用窗口函数实现每小时每广告位的曝光 TOP100 商品

同时总结了常见的工程问题与解决方案,确保数据分析和排障的准确性。