Big Data 244 - Offline Data Warehouse: Hive ODS Layer

ODS Layer Overview

ODS (Operational Data Store) is an important component in data warehouses, primarily used to store raw, lightly processed data.

Function Positioning:

  • Serves as the staging layer before the data warehouse
  • Receives data extracted from business systems or external sources
  • Data usually not deeply processed or aggregated, preserving the original business state

Data Characteristics:

  • Rawness: Preserves business system field formats
  • Timeliness: Compared to data warehouse, ODS data updates more frequently
  • Short-term Storage: ODS data storage cycle is usually short, only keeping recent data (e.g., 7 days, 30 days)

ODS Role:

  • Buffer Layer: Avoids pressure on business systems from direct data extraction
  • Data Integration and Cleaning: Lightly cleans raw data
  • Support Real-time Queries: Provides relatively real-time data query services
  • Data Traceability: Preserves original state of business data

ODS Layer Table Creation (7 Tables)

1. ods_trade_orders (Order Table)

use ods;
DROP TABLE IF EXISTS `ods.ods_trade_orders`;
CREATE EXTERNAL TABLE `ods.ods_trade_orders`(
  `orderid` int,
  `orderno` string,
  `userid` bigint,
  `status` tinyint,
  `productmoney` decimal(10, 0),
  `totalmoney` decimal(10, 0),
  `paymethod` tinyint,
  `ispay` tinyint,
  `areaid` int,
  `tradesrc` tinyint,
  `tradetype` int,
  `isrefund` tinyint,
  `dataflag` tinyint,
  `createtime` string,
  `paytime` string,
  `modifiedtime` string)
COMMENT 'Order Table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/orders/';

2. ods_trade_order_product (Order Detail Table)

use ods;
DROP TABLE IF EXISTS `ods.ods_trade_order_product`;
CREATE EXTERNAL TABLE `ods.ods_trade_order_product`(
  `id` string,
  `orderid` decimal(10,2),
  `productid` string,
  `productnum` string,
  `productprice` string,
  `money` string,
  `extra` string,
  `createtime` string)
COMMENT 'Order Detail Table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/order_product/';

3. ods_trade_product_info (Product Information Table)

use ods;
DROP TABLE IF EXISTS `ods.ods_trade_product_info`;
CREATE EXTERNAL TABLE `ods.ods_trade_product_info`(
  `productid` bigint,
  `productname` string,
  `shopid` string,
  `price` decimal(10,0),
  `issale` tinyint,
  `status` tinyint,
  `categoryid` string,
  `createtime` string,
  `modifytime` string)
COMMENT 'Product Information Table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/product_info/';

4. ods_trade_product_category (Product Category Table)

use ods;
DROP TABLE IF EXISTS `ods.ods_trade_product_category`;
CREATE EXTERNAL TABLE `ods.ods_trade_product_category`(
  `catid` int,
  `parentid` int,
  `catname` string,
  `isshow` tinyint,
  `sortnum` int,
  `isdel` tinyint,
  `createtime` string,
  `level` tinyint)
COMMENT 'Product Category Table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/product_category';

5. ods_trade_shops (Merchant Store Table)

use ods;
DROP TABLE IF EXISTS `ods.ods_trade_shops`;
CREATE EXTERNAL TABLE `ods.ods_trade_shops`(
  `shopid` int,
  `userid` int,
  `areaid` int,
  `shopname` string,
  `shoplevel` tinyint,
  `status` tinyint,
  `createtime` string,
  `modifytime` string)
COMMENT 'Merchant Store Table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/shops';

6. ods_trade_shop_admin_org (Merchant Region Organization Table)

use ods;
DROP TABLE IF EXISTS `ods.ods_trade_shop_admin_org`;
CREATE EXTERNAL TABLE `ods.ods_trade_shop_admin_org`(
  `id` int,
  `parentid` int,
  `orgname` string,
  `orglevel` tinyint,
  `isdelete` tinyint,
  `createtime` string,
  `updatetime` string,
  `isshow` tinyint,
  `orgType` tinyint)
COMMENT 'Merchant Region Organization Table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/shop_org/';

7. ods_trade_payments (Payment Method Table)

use ods;
DROP TABLE IF EXISTS `ods.ods_trade_payments`;
CREATE EXTERNAL TABLE `ods.ods_trade_payments`(
  `id` string,
  `paymethod` string,
  `payname` string,
  `description` string,
  `payorder` int,
  `online` tinyint)
COMMENT 'Payment Method Table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/payments/';

ODS Layer Data Loading Script

#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
  do_date=$1
else
  do_date=`date -d "-1 day" +%F`
fi

# Create directories
hdfs dfs -mkdir -p /user/data/trade.db/product_category/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/shops/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/shop_org/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/payments/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/orders/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/order_product/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/product_info/dt=$do_date

# Data migration
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /opt/wzk/datax/product_category.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /opt/wzk/datax/shops.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /opt/wzk/datax/shop_org.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /opt/wzk/datax/payments.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /opt/wzk/datax/orders.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /opt/wzk/datax/order_product.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /opt/wzk/datax/product_info.json

# Load ODS layer data
sql="
alter table ods.ods_trade_orders add partition(dt='$do_date');
alter table ods.ods_trade_order_product add partition(dt='$do_date');
alter table ods.ods_trade_product_info add partition(dt='$do_date');
alter table ods.ods_trade_product_category add partition(dt='$do_date');
alter table ods.ods_trade_shops add partition(dt='$do_date');
alter table ods.ods_trade_shop_admin_org add partition(dt='$do_date');
alter table ods.ods_trade_payments add partition(dt='$do_date');
"
hive -e "$sql"

Execute command:

sh /opt/wzk/hive/ods_load_trade.sh 2020-07-12

Conclusion

Enabling fast queries of raw transaction records within 7 days demonstrates what core characteristics of the ODS layer:

This demonstrates the core characteristics of the ODS layer: short-term storage and support for real-time queries. The ODS layer is designed to retain only the most recent 7 days of raw data, allowing business users to quickly query the latest transaction records. Meanwhile, DataX synchronizes MySQL data to HDFS, which is then loaded into the ODS layer through Hive external tables, achieving rapid data access and querying.