TL;DR
- Scenario: E-commerce business has multi-end tracking + transaction data coexisting, requiring offline data warehouse to support operations and business analysis.
- Conclusion: First unify tracking definitions and metric dictionaries, then implement member/advertising/transaction analysis by theme domain to avoid “definition drift” and warehouse rework.
- Output: Offline data warehouse architecture breakdown, tracking process comparison (manual vs automatic), metric system methodology, common error troubleshooting cards.
Requirement Analysis
In recent years, China’s e-commerce has developed rapidly, transaction volume hitting record highs, e-commerce applications in various fields continuously expanding and deepening, related business flourishing, support systems continuously improving and perfecting, innovative capabilities continuously enhancing. E-commerce and real economy integration, entering scale development, impact on economy and life continuously increasing.
E-commerce Characteristics
- New technology
- Wide technology scope
- Distributed
- High concurrency, cluster, load balancing
- Massive data
- Complex business
- System security
Business Introduction
Similar to JD.com, Taobao, Tmall, e-commerce websites use merchant settlement model, merchants submit applications to settle on the platform. Platform reviews, after approval, merchants have independent management backend to enter product information, products can be published after review. Online malls mainly divided into:
- Website Frontend: Website homepage, merchant homepage, product detail page, search page, member center, order and payment related, seckill channel
- Operator Backend: Management platform for operations personnel, main functions include: merchant review, brand management, specification management, template management, product category management, product review, ad type management, ad management, order query, merchant settlement, etc.
- Merchant Management Backend: Platform for settled merchants to manage, main functions include: product management, order query and statistics, funds settlement functions
Main Analysis
- Log Data: Startup logs, click logs
- Transaction Data from Business Database: User order placement, order submission, payment, refund and other core transaction data
Analysis Tasks
- Member Activity Analysis Theme: Daily new member count, daily/weekly/monthly active count, retained member count
- Advertising Business Analysis Theme: Advertising click count, advertising click purchase rate, advertising exposure count
- Core Transaction Analysis Theme: Order count, paid product count, payment amount
Data Tracking
Basic Concepts
Data tracking (埋点), a set of data collection methods for collecting user browsing and click events. Through this method, user behaviors on apps and websites can be recorded, used to track application usage, and further optimize products or provide operational data support, including visit count, visitor count, dwell time, page view count, bounce rate, etc. Such information collection can be roughly divided into:
- Page statistics behavior
- Statistics operation behavior
Tracking Process
In enterprise operations, data analysis supporting decision-making is a very important part, and tracking collecting user behavior data is the foundation of the foundation. Without user behavior data, business cannot be discussed. Tracking provides basic data for data analysis. The tracking workflow can be:
- Complete development according to tracking requirements (frontend development engineers)
- App or website collects user data
- Data reported to server
- Data cleaning, processing, storage (big data engineers)
- Perform data analysis to get corresponding metrics (big data engineers)
In the above process, involved personnel can be divided as:
- Tracking Requirements: Data product manager, responsible for writing requirements documents, specifying which areas and user operations need tracking
- Tracking Collection: Frontend engineers, using a set of JS code to send user request times to server
- Data Cleaning, Processing, Storage: Need to clean missing and misreported data in tracking, through certain calculation processing, output structured data required for business analysis, finally store data in data warehouse
- Data Analysis: Organize data in data warehouse into metrics that business concerns
- Frontend Display: Frontend and backend development
Tracking Implementation
Mainstream tracking implementation methods are as follows, main difference is frontend development workload.
Manual Tracking
Developers need to manually write code to implement tracking, such as page ID, area ID, button ID, button position, event type (exposure, view, click), etc. Generally requires a self-developed tracking framework by the company
- Advantages: More accurate tracking data
- Disadvantages: Large workload, prone to errors
Automatic Tracking
No development code writing needed, automatically collects device number, browser model, device type and other data. Mainly uses third-party statistical tools such as Umeng, Baidu Mobile, MoFang, etc.
- Advantages: Simple and convenient
- Disadvantages: Inconsistent tracking data, not personalized and accurate enough
Metric System
Metrics: Statistical values of data, such as: member count, active member count, member retention count, advertising click volume, order amount, order count are all metrics
Metric System: Systematically organizes various metrics, classifies and layers metrics according to business models and standards.
Teams without data metric systems often show demand inflation and very frequent demand changes. Everyone has perspectives and demands for viewing data, then creates dimension/metric definitions in non-professional ways. Data analysts are trapped by massive data demands, difficult to extract business rule designed solutions, ultimately building hard-to-maintain data warehouses like a snowball.
- Establishing a metric system is actually reaching consensus with requesters, can effectively curb unreliable demands, make demands organized and systematic
- Metric system is the cornerstone guiding data warehouse construction. Stable and systematic demands benefit data warehouse solution optimization and efficiency improvement
Led by product manager, with business and IT assistance, a set of implementable framework that can reflect business status from dimensions. When establishing metric system, pay attention to three selection principles: accurate, explainable, structured.
- Accurate: Core data must be understood and accurate, cannot choose wrong
- Explainable: All metrics should have clear and detailed business explanations, e.g., what is the definition of daily active, is it users who opened APP, or users who stayed in APP for a period of time, favorited or purchased something
- Structured: Can fully interpret business, if new users is just a big number, need to know new users per channel, new user conversion rate per channel, new user value per channel, etc.
Before establishing metric system, first understand metric composition. Most metrics encountered in work are derived metrics, metric composition is as follows:
- Basic Metric + [Modifier] + Time Period
- Modifier is optional, basic metric and time period are required
- Basic metrics are indivisible metrics, such as: transaction amount, payment amount, order count
- Modifiers are mostly manifestations of certain scenarios, such as: transactions brought through search
- Time period is a time cycle, such as: during Double 11, during 618 event, etc.
The three combined form commonly used metrics in business (these are also derived metrics), such as: transaction amount brought through search on Double 11, transaction amount on Double 11, similarly, daily active, monthly active, next-day retention, daily conversion rate are all derived metrics. After filtering reasonable metrics, need to start establishing corresponding metric system. Mainly divided into four steps:
- Clarify business stages and requirements
- Determine core metrics
- Break down metrics by dimensions
- Implement metrics
Clarify Business Stages and Requirements
Enterprise development often divided into three stages:
- Startup period
- Growth period
- Mature development period
Core metrics of concern are different in different stages.
- Business Early Stage, most concerned about user volume, metric system should be closely around user volume improvement to do various dimension breakdowns
- Business Middle Stage, besides concerned about user volume trend, more important is optimizing current user volume structure, e.g., look at user retention, if retention is low, need further analysis to find reasons
- Mature Development Stage, more concerned about product monetization capability and market share, pay attention to revenue metrics, various commercialization model revenues, while doing well market share and competitor monitoring to prevent new forces from抢占份额, etc.
Determine Core Metrics
Most important in this stage is finding correct core metrics. For example, a certain product’s daily active definition is opening APP, and daily active is not small and steadily rising. Then during analysis found that among users who opened APP, 5-second bounce rate is as high as 25%, very unhealthy. So current core metric daily active actually has problems, better core metric should be user count with dwell time greater than 5 seconds. Every APP’s core metrics are not the same, must spend time considering this, like XX Toutiao APP, its daily active and retention metrics must be very high, but only focusing on such metrics is definitely wrong, its real core metrics are definitely not just daily active and retention.
Core Metric Dimension Breakdown
Fluctuations in core metrics are inevitably caused by fluctuations in certain dimensions. To monitor core metrics, essentially need to monitor dimensional core metrics. When analyzing “users entering APP” metric, should pay attention to channel conversion rate, analyze where users come from, and also how they opened, e.g., through clicking desktop icon, clicking notification bar, clicking PUSH, etc. When analyzing “dwell time > 5 seconds proportion” metric, should focus on dwell time distribution, how many users dwell 1-5 seconds, specific distribution situation, user characteristics and behavior characteristics of users who dwell > 5 seconds, user characteristics of users who dwell < 5 seconds, etc. E-commerce platforms focus on transaction amount. Before truly achieving transaction, user needs to open APP, select product, confirm order, pay order, etc. entire transaction funnel model. Key metrics of each step in the funnel can be broken down in formula form, then analyze corresponding influencing factors based on breakdown formulas.
Metric Communication, Archive, Implementation
After completing entire metric system establishment, need to do following key steps of communication and implementation:
-
Full Company Communication and Confirmation
- Organize cross-departmental meetings, invite all relevant business personnel (including operations, marketing, product, technology department heads)
- Clarify metric ownership through meeting minutes and email confirmation for records
- Example: Can create metric responsibility matrix, clarify each metric’s Owner, usage scenarios and acceptance criteria
-
Metric Definition Standardization
- Write detailed metric dictionary documents, containing:
- Metric name (Chinese-English)
- Business definition (e.g., “daily active users” means number of users who completed core behaviors that day)
- Calculation formula (clear definitions of numerator and denominator)
- Data source (tracking event name or database table field)
- Statistics period (daily/weekly/monthly)
- Exception handling rules (e.g., deduplication logic, null value processing)
- Suggest using Confluence and other knowledge management tools for archive, set regular review mechanism
- Write detailed metric dictionary documents, containing:
-
Report System Building
- Complete core report development 2 weeks before version launch, including:
- Real-time monitoring dashboards (e.g., big screens showing key metrics)
- Daily operations reports (divided by department/business line)
- Automated alerting mechanism (set threshold to trigger alerts)
- Real case: Before 618 promotion, a certain e-commerce App built monitoring system for core metrics like GMV, conversion rate in advance
- Complete core report development 2 weeks before version launch, including:
-
Collaboration Mechanism Establishment
- Clarify role responsibilities:
- Data analysts: Responsible for metric logic design and data validation
- Product managers: Coordinate business requirements and technical implementation
- Business side: Provide business scenario understanding, confirm metric priority
- Suggest establishing metric review meeting mechanism, important metrics need three-party sign-off confirmation
- Tool suggestion: Use Jira and other project management tools to track entire process from metric requirements to launch
- Clarify role responsibilities:
Error Quick Reference
| Symptom | Root Cause | Fix |
|---|---|---|
| Same metric shows inconsistent values across different reports | Metric definitions not unified, different deduplication rules, different time windows | Compare metric dictionary: statistics period, numerator/denominator definitions, deduplication fields; spot check details; establish metric dictionary as single source of truth; unify deduplication keys and time definitions; reports only reference metric layer outputs |
| Daily active feels “bloated” despite high numbers | Core metric selected wrong (e.g., only “opened APP”) | Analyze bounce rate/dwell time distribution; look at 1-5 second proportion; define core metric (e.g., users who stayed > 5 seconds); sync tracking and metric calculation changes |
| Retention/activity fluctuate but can’t find cause | Insufficient dimension breakdown, lacking channel/touch/behavior path fields | Check if can break down by channel, entry, touch method; supplement dimension tracking: channel, entry (desktop/notification/PUSH/search), key path events |
| Advertising click/exposure data don’t match | Exposure/click event definition unclear, duplicate reporting, client retries | Spot check event sequence; check if same session has multiple exposures; clarify exposure trigger conditions; add idempotent ID; deduplication and retry strategy on reporting side |
| Order count/payment amount don’t match transaction database | Multi-end event and transaction fact table definition conflict, missing refund/closed order handling | Reconcile: align by order status flow (submit/pay/refund); build transaction fact table state machine; pay/refund summarize by final status; supplement late data backfill |
| Metric demand inflation, frequent definition changes causing rework | No metric review and responsibility matrix, lacking communication and archive | Review requirement change records, identify “definition not confirmed” points; establish metric review meeting and three-party sign-off; archive responsibility matrix and meeting minutes; version metric dictionary |
| Tracking fields not unified, event names messy | Frontend lacks standards and automatic validation | Check if event name/field naming standards exist, if there’s validation SDK;制定 event naming and field standards; SDK strong validation; do tracking acceptance checklist before launch |
| Metric calculation poor performance, batch time too long | Theme domains not layered, detail and summary mixed | Check if directly doing multiple aggregations on raw details; layer: ODS→DWD→DWS→ADS; precipitate common wide table/summary layer; reuse common definition outputs |