Understanding GA4 Data in BigQuery: A Practical Guide for Ecommerce
A hands-on guide to GA4's BigQuery export schema for ecommerce teams. Learn how events, parameters, and ecommerce data are structured — and how to query them effectively.
Verity Team
·
February 15, 2026
·
16 min read
Why GA4 Data in BigQuery Matters for Ecommerce
If you run an ecommerce business and use Google Analytics 4, you already have access to a wealth of behavioral data: page views, product clicks, add-to-carts, purchases, and everything in between. The standard GA4 interface gives you dashboards and pre-built reports, but it has limits. Segments are constrained. Attribution windows are fixed. And the moment you want to join your analytics data with your CRM, your ad platforms, or your product catalog, you hit a wall.
That is where BigQuery comes in.
When you link GA4 to BigQuery, Google exports your raw, hit-level event data into a cloud data warehouse you fully control. No sampling. No aggregation. Every single event, with every parameter, lands in a table you can query with SQL. For ecommerce teams, this unlocks several critical capabilities:
- Unsampled data -- GA4's interface samples data for large date ranges or complex explorations. BigQuery gives you every row.
- Custom attribution -- Build your own attribution models instead of relying on GA4's defaults.
- Cross-platform joins -- Combine GA4 data with Meta Ads, Google Ads, Shopify orders, or your CRM to get a unified view of the customer journey.
- Historical analysis -- Query months or years of data without waiting for the GA4 interface to process it.
- Automated reporting -- Schedule queries, pipe results into dashboards, or feed them into machine learning models.
But there is a catch: GA4's BigQuery schema is not intuitive. If you are used to relational databases with clean columns like page_url, revenue, or campaign, you are in for a surprise. GA4 stores everything as events with nested, repeated key-value parameters. Understanding this structure is the first step toward getting real value from your data.
This guide walks you through the schema, explains how ecommerce data is structured, provides practical SQL queries you can run today, and highlights the pitfalls that trip up even experienced analysts.
The GA4 BigQuery Export Schema
How the Export Works
When you enable the BigQuery export in GA4, Google creates a dataset in your BigQuery project (typically named analytics_PROPERTY_ID). Inside that dataset, you get daily tables following the naming pattern events_YYYYMMDD -- for example, events_20260215. There is also an intraday table called events_intraday_YYYYMMDD that contains data from the current day, updated roughly every few minutes.
Each row in these tables represents a single event. Not a session. Not a user. A single event. This is a fundamental shift from Universal Analytics, where the basic unit was a "hit" within a session. In GA4, everything is an event, and sessions are reconstructed from those events after the fact.
The Events Table Structure
Every row in the events table has a consistent set of top-level columns. Here are the most important ones:
| Column | Type | Description |
|--------|------|-------------|
| event_date | STRING | The date of the event in YYYYMMDD format |
| event_timestamp | INTEGER | Unix timestamp in microseconds |
| event_name | STRING | The name of the event (e.g., page_view, purchase) |
| event_params | RECORD (REPEATED) | Array of key-value pairs containing event parameters |
| user_id | STRING | Your custom user ID (if set) |
| user_pseudo_id | STRING | GA4's anonymous client ID |
| user_properties | RECORD (REPEATED) | Array of key-value pairs for user-scoped properties |
| device | RECORD | Device info (category, browser, OS, etc.) |
| geo | RECORD | Geographic info (country, city, region) |
| traffic_source | RECORD | First-touch traffic source (source, medium, campaign) |
| ecommerce | RECORD | Ecommerce-specific data (transaction_id, value, etc.) |
| items | RECORD (REPEATED) | Array of items involved in the event |
The first thing to notice: event_params, user_properties, and items are all repeated records (arrays). This means a single event row can contain multiple parameters, multiple user properties, and multiple items. This nested structure is what makes GA4 BigQuery queries more complex than simple SELECT column FROM table statements.
How event_params Work
The event_params field is where most of the interesting data lives, and it is also where most confusion begins.
Each event parameter is stored as a struct with a key (string) and a value that has four possible sub-fields:
value.string_valuevalue.int_valuevalue.float_valuevalue.double_value
Only one of these sub-fields will be populated for any given parameter. For example, the page_location parameter stores its value in string_value, while ga_session_id uses int_value.
To extract a specific parameter, you need to unnest the array and filter by key. Here is the fundamental pattern:
SELECT
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
FROM
`your-project.analytics_123456789.events_20260215`
WHERE
event_name = 'page_view'This subquery-within-SELECT pattern is something you will use constantly when working with GA4 data in BigQuery. It is verbose, but it is the standard approach.
Common GA4 Ecommerce Events
GA4 defines a set of recommended ecommerce events. If your site uses Google Tag Manager or gtag.js with the standard ecommerce data layer, these events are sent automatically. Here are the key ones:
| Event Name | When It Fires | Key Parameters |
|-----------|---------------|----------------|
| page_view | Every page load | page_location, page_title, page_referrer |
| view_item | Product detail page viewed | items array with product details |
| view_item_list | Product listing page viewed | items array, item_list_name |
| select_item | Product clicked in a list | items array, item_list_name |
| add_to_cart | Item added to cart | items array with product + quantity |
| remove_from_cart | Item removed from cart | items array |
| view_cart | Cart page viewed | items array |
| begin_checkout | Checkout process started | items array, value, currency |
| add_shipping_info | Shipping info submitted | items array, shipping_tier |
| add_payment_info | Payment info submitted | items array, payment_type |
| purchase | Transaction completed | items array, transaction_id, value, currency, tax, shipping |
| refund | Refund processed | items array, transaction_id, value |
Understanding which events fire at which stage of the funnel is essential for building accurate reports.
Ecommerce-Specific Data Structures
The Items Array
Whenever an ecommerce event fires, it typically includes an items array. Each element in this array represents a product involved in the event. The items array is a repeated record, meaning a single purchase event can contain multiple items.
Each item struct has these fields:
| Field | Type | Description |
|-------|------|-------------|
| item_id | STRING | Your product SKU or ID |
| item_name | STRING | Product name |
| item_brand | STRING | Brand name |
| item_variant | STRING | Variant (e.g., size, color) |
| item_category | STRING | Primary product category |
| item_category2 through item_category5 | STRING | Additional category levels |
| price | FLOAT | Unit price of the item |
| quantity | INTEGER | Number of units |
| coupon | STRING | Item-level coupon code, if any |
| affiliation | STRING | Store or affiliate name |
| item_list_name | STRING | The list where the item was shown |
| item_list_index | STRING | Position of the item in a list |
| promotion_id | STRING | Promotion ID if from a promo |
| promotion_name | STRING | Promotion name |
To query item-level data, you need to cross join the items array using UNNEST:
SELECT
event_date,
items.item_id,
items.item_name,
items.item_category,
items.price,
items.quantity,
items.price * items.quantity AS item_revenue
FROM
`your-project.analytics_123456789.events_*`,
UNNEST(items) AS items
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20260201' AND '20260215'
ORDER BY
item_revenue DESCNote the use of events_* with _TABLE_SUFFIX -- this is how you query across multiple daily tables without listing each one individually.
Transaction Data
Transaction-level data is stored in two places:
- The
ecommercerecord on the event row, which containstransaction_id,purchase_revenue,shipping_value,tax_value, andunique_items. - Event parameters, where
transaction_id,value,currency,shipping, andtaxare also available.
In practice, the event parameters are more reliably populated, especially if you are using a standard ecommerce data layer. Here is how to extract transaction-level purchase data:
SELECT
event_date,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id') AS transaction_id,
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') AS revenue,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'currency') AS currency,
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'shipping') AS shipping,
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'tax') AS tax
FROM
`your-project.analytics_123456789.events_*`
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20260201' AND '20260215'You might also find that value is stored in float_value rather than double_value depending on your implementation. When in doubt, use COALESCE to check both:
COALESCE(
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value'),
(SELECT value.float_value FROM UNNEST(event_params) WHERE key = 'value'),
CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'value') AS FLOAT64)
) AS revenueThis defensive approach prevents missing data due to inconsistent parameter types.
Practical Query Patterns
Now that you understand the schema, let us look at four queries you will use regularly as an ecommerce analyst.
1. Revenue by Day
This is the most fundamental ecommerce query. It aggregates purchase revenue by date.
SELECT
event_date,
COUNT(DISTINCT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')) AS transactions,
SUM(ecommerce.purchase_revenue) AS total_revenue,
ROUND(SAFE_DIVIDE(SUM(ecommerce.purchase_revenue),
COUNT(DISTINCT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id'))), 2) AS avg_order_value
FROM
`your-project.analytics_123456789.events_*`
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20260101' AND '20260215'
GROUP BY
event_date
ORDER BY
event_dateThis gives you a daily time series with transaction count, total revenue, and average order value. Replace the _TABLE_SUFFIX range with your desired date range.
2. Top Products by Revenue
To find your best-selling products, you need to unnest the items array from purchase events:
SELECT
items.item_id,
items.item_name,
items.item_category,
SUM(items.quantity) AS total_units_sold,
ROUND(SUM(items.price * items.quantity), 2) AS total_revenue,
COUNT(DISTINCT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')) AS order_count
FROM
`your-project.analytics_123456789.events_*`,
UNNEST(items) AS items
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20260101' AND '20260215'
GROUP BY
items.item_id,
items.item_name,
items.item_category
ORDER BY
total_revenue DESC
LIMIT 25This query tells you which products drive the most revenue, how many units were sold, and how many distinct orders each product appeared in.
3. Conversion Funnel
One of the most valuable analyses for ecommerce is the conversion funnel: how many sessions progressed through each stage from browsing to purchase? Here is a query that builds this funnel:
WITH sessions AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
event_name
FROM
`your-project.analytics_123456789.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260201' AND '20260215'
AND event_name IN ('session_start', 'view_item', 'add_to_cart', 'begin_checkout', 'purchase')
),
funnel AS (
SELECT
user_pseudo_id,
session_id,
MAX(CASE WHEN event_name = 'session_start' THEN 1 ELSE 0 END) AS had_session,
MAX(CASE WHEN event_name = 'view_item' THEN 1 ELSE 0 END) AS viewed_product,
MAX(CASE WHEN event_name = 'add_to_cart' THEN 1 ELSE 0 END) AS added_to_cart,
MAX(CASE WHEN event_name = 'begin_checkout' THEN 1 ELSE 0 END) AS began_checkout,
MAX(CASE WHEN event_name = 'purchase' THEN 1 ELSE 0 END) AS purchased
FROM sessions
GROUP BY user_pseudo_id, session_id
)
SELECT
COUNT(*) AS total_sessions,
SUM(viewed_product) AS sessions_with_product_view,
SUM(added_to_cart) AS sessions_with_add_to_cart,
SUM(began_checkout) AS sessions_with_checkout,
SUM(purchased) AS sessions_with_purchase,
ROUND(SAFE_DIVIDE(SUM(viewed_product), COUNT(*)) * 100, 2) AS pct_viewed_product,
ROUND(SAFE_DIVIDE(SUM(added_to_cart), SUM(viewed_product)) * 100, 2) AS pct_view_to_cart,
ROUND(SAFE_DIVIDE(SUM(began_checkout), SUM(added_to_cart)) * 100, 2) AS pct_cart_to_checkout,
ROUND(SAFE_DIVIDE(SUM(purchased), SUM(began_checkout)) * 100, 2) AS pct_checkout_to_purchase
FROM funnelThis query gives you both absolute counts and step-by-step conversion rates. You can see exactly where users drop off in your funnel and prioritize accordingly.
4. User Acquisition Channels
Understanding which channels drive valuable users requires combining traffic source data with purchase behavior:
SELECT
traffic_source.source,
traffic_source.medium,
traffic_source.name AS campaign,
COUNT(DISTINCT user_pseudo_id) AS users,
COUNTIF(event_name = 'purchase') AS purchases,
ROUND(SUM(CASE WHEN event_name = 'purchase' THEN ecommerce.purchase_revenue ELSE 0 END), 2) AS revenue,
ROUND(SAFE_DIVIDE(
SUM(CASE WHEN event_name = 'purchase' THEN ecommerce.purchase_revenue ELSE 0 END),
COUNT(DISTINCT user_pseudo_id)
), 2) AS revenue_per_user
FROM
`your-project.analytics_123456789.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260101' AND '20260215'
GROUP BY
traffic_source.source,
traffic_source.medium,
traffic_source.name
HAVING
users > 10
ORDER BY
revenue DESC
LIMIT 20One important caveat: the traffic_source record in GA4's BigQuery export reflects the first-touch source for the user, not the session-level source. If you need session-scoped source/medium (the channel that drove a specific session), you need to extract it from event_params:
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS session_source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS session_medium,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') AS session_campaignThese session-scoped parameters are typically set on session_start events.
Common Pitfalls and Gotchas
Working with GA4 data in BigQuery is powerful, but there are several traps that catch people regularly. Here is what to watch for.
Nested and Repeated Fields
The most common frustration for analysts new to GA4 BigQuery is the nested structure. You cannot simply write SELECT page_location FROM events because page_location is buried inside the event_params array. Every parameter extraction requires the UNNEST subquery pattern. This makes queries verbose and slower to write.
A practical tip: create a view or a flattened table for your most common queries. You can schedule a daily query that extracts the parameters you care about into a clean, flat table with columns like page_location, session_id, source, medium, etc. This saves time and makes ad hoc analysis much faster.
Session Scope vs. Event Scope
GA4 is event-based, but many business questions are session-based ("how many sessions resulted in a purchase?") or user-based ("how many users made a purchase this month?"). Bridging between these scopes requires careful query construction.
Sessions in GA4 BigQuery are identified by the combination of user_pseudo_id and ga_session_id (from event_params). There is no built-in session table. You have to construct sessions yourself by grouping events with the same user and session ID.
A common mistake is counting events when you mean to count sessions. For example, a user who views 10 pages in a session generates 10 page_view events. If you count page_view events, you get page views, not sessions. Always be explicit about what you are counting.
Sampling and Data Freshness
The BigQuery export is unsampled, which is one of its biggest advantages. However, there are freshness considerations:
- Daily tables (
events_YYYYMMDD) are finalized once per day, usually within 24-48 hours after the day ends. - Intraday tables (
events_intraday_YYYYMMDD) are updated throughout the day but are deleted once the daily table is finalized. - If you query the current day, you need to use the intraday table. If it does not exist yet, there may be a short delay after midnight.
A practical pattern for querying "yesterday plus today" uses a union:
SELECT * FROM `your-project.analytics_123456789.events_20260214`
UNION ALL
SELECT * FROM `your-project.analytics_123456789.events_intraday_20260215`Event Parameter Extraction Patterns
Because parameters can be stored in different value fields (string_value, int_value, float_value, double_value), you need to know which field each parameter uses. There is no single reference for this -- it depends on your implementation.
Here is a diagnostic query that shows you which value fields are populated for each parameter key in your data:
SELECT
params.key,
COUNTIF(params.value.string_value IS NOT NULL) AS string_count,
COUNTIF(params.value.int_value IS NOT NULL) AS int_count,
COUNTIF(params.value.float_value IS NOT NULL) AS float_count,
COUNTIF(params.value.double_value IS NOT NULL) AS double_count
FROM
`your-project.analytics_123456789.events_20260215`,
UNNEST(event_params) AS params
GROUP BY
params.key
ORDER BY
params.keyRun this once against your data and keep the results as a reference. It will save you hours of debugging.
Duplicate Events
GA4 can sometimes send duplicate events, especially during network instability or tag misconfiguration. Purchase events are particularly sensitive to this -- a duplicate purchase event means double-counted revenue.
Always use DISTINCT on transaction_id when counting transactions or summing revenue:
SELECT
COUNT(DISTINCT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')) AS unique_transactions
FROM
`your-project.analytics_123456789.events_*`
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN '20260201' AND '20260215'BigQuery Costs
Each query you run in BigQuery scans data and costs money ($5 per TB scanned in on-demand pricing). GA4 tables can grow large quickly, especially for high-traffic ecommerce sites. To keep costs down:
- Always use
_TABLE_SUFFIXto limit the date range you scan. - Avoid
SELECT *-- only select the columns you need. - Consider partitioning or clustering frequently queried tables.
- Use BigQuery's query validator to estimate scan size before running large queries.
How AI and Natural Language BI Tools Can Help
By now, you can probably see the challenge. GA4's BigQuery schema is powerful but demanding. Writing correct SQL against nested, repeated fields requires deep familiarity with the schema. Building a conversion funnel query from scratch takes 30+ lines of SQL. And if you want to join GA4 data with your ad spend data from Meta or Google Ads, the complexity multiplies.
This is exactly the kind of problem that AI-powered analytics tools are built to solve. Instead of memorizing schema details and writing verbose SQL, imagine asking a question in plain English:
- "What was our conversion rate from add-to-cart to purchase last week, broken down by device type?"
- "Which products had the highest revenue but lowest return rates in January?"
- "Show me daily revenue by acquisition channel for the last 30 days."
A natural language BI tool that understands your GA4 BigQuery schema can translate these questions into correct, optimized SQL, run the query, and return the answer in seconds. It handles the UNNEST patterns, the parameter extraction, the session reconstruction, and the scope management automatically.
This does not replace the need to understand your data. Knowing what event_params are, how sessions work, and what the items array contains makes you a better analyst and helps you ask better questions. But it removes the bottleneck of writing and debugging complex SQL for every ad hoc question.
For ecommerce teams that need to move fast -- answering questions from marketing, merchandising, and leadership throughout the day -- the combination of a well-structured BigQuery dataset and an AI layer on top is transformative.
What Comes Next
This guide covered the foundations of GA4 ecommerce data in BigQuery: the event-based model, the nested parameter structure, the items array, and practical query patterns you can start using today. But GA4 is just one piece of the ecommerce data puzzle.
If you are interested in how natural language interfaces can make this kind of data more accessible to your entire team, read our guide on What is Natural Language BI? to understand how conversational analytics works in practice.
In future articles, we will cover the data structures behind Meta Ads, Google Ads, and Shopify -- and how they fit together with GA4 in a unified analytics layer. Each platform has its own schema quirks, and understanding those structures is the first step toward building a complete, queryable view of your ecommerce performance.
The data is already in your warehouse. The question is how fast you can turn it into decisions.
Stop Guessing. Start Asking.
Verity turns your data into a conversation. Ask questions in plain language, get trusted answers backed by your actual data.
Join the Waitlist