Understanding GA4 Data in BigQuery: A Practical Guide for Ecommerce

Data Structures
Complete Guide

Understanding GA4 Data in BigQuery: A Practical Guide for Ecommerce

A hands-on guide to GA4's BigQuery export schema for ecommerce teams. Learn how events, parameters, and ecommerce data are structured — and how to query them effectively.

Verity Team

·

February 15, 2026

·

16 min read

Why GA4 Data in BigQuery Matters for Ecommerce

If you run an ecommerce business and use Google Analytics 4, you already have access to a wealth of behavioral data: page views, product clicks, add-to-carts, purchases, and everything in between. The standard GA4 interface gives you dashboards and pre-built reports, but it has limits. Segments are constrained. Attribution windows are fixed. And the moment you want to join your analytics data with your CRM, your ad platforms, or your product catalog, you hit a wall.

That is where BigQuery comes in.

When you link GA4 to BigQuery, Google exports your raw, hit-level event data into a cloud data warehouse you fully control. No sampling. No aggregation. Every single event, with every parameter, lands in a table you can query with SQL. For ecommerce teams, this unlocks several critical capabilities:

  • Unsampled data -- GA4's interface samples data for large date ranges or complex explorations. BigQuery gives you every row.
  • Custom attribution -- Build your own attribution models instead of relying on GA4's defaults.
  • Cross-platform joins -- Combine GA4 data with Meta Ads, Google Ads, Shopify orders, or your CRM to get a unified view of the customer journey.
  • Historical analysis -- Query months or years of data without waiting for the GA4 interface to process it.
  • Automated reporting -- Schedule queries, pipe results into dashboards, or feed them into machine learning models.

But there is a catch: GA4's BigQuery schema is not intuitive. If you are used to relational databases with clean columns like page_url, revenue, or campaign, you are in for a surprise. GA4 stores everything as events with nested, repeated key-value parameters. Understanding this structure is the first step toward getting real value from your data.

This guide walks you through the schema, explains how ecommerce data is structured, provides practical SQL queries you can run today, and highlights the pitfalls that trip up even experienced analysts.

The GA4 BigQuery Export Schema

How the Export Works

When you enable the BigQuery export in GA4, Google creates a dataset in your BigQuery project (typically named analytics_PROPERTY_ID). Inside that dataset, you get daily tables following the naming pattern events_YYYYMMDD -- for example, events_20260215. There is also an intraday table called events_intraday_YYYYMMDD that contains data from the current day, updated roughly every few minutes.

Each row in these tables represents a single event. Not a session. Not a user. A single event. This is a fundamental shift from Universal Analytics, where the basic unit was a "hit" within a session. In GA4, everything is an event, and sessions are reconstructed from those events after the fact.

The Events Table Structure

Every row in the events table has a consistent set of top-level columns. Here are the most important ones:

| Column | Type | Description | |--------|------|-------------| | event_date | STRING | The date of the event in YYYYMMDD format | | event_timestamp | INTEGER | Unix timestamp in microseconds | | event_name | STRING | The name of the event (e.g., page_view, purchase) | | event_params | RECORD (REPEATED) | Array of key-value pairs containing event parameters | | user_id | STRING | Your custom user ID (if set) | | user_pseudo_id | STRING | GA4's anonymous client ID | | user_properties | RECORD (REPEATED) | Array of key-value pairs for user-scoped properties | | device | RECORD | Device info (category, browser, OS, etc.) | | geo | RECORD | Geographic info (country, city, region) | | traffic_source | RECORD | First-touch traffic source (source, medium, campaign) | | ecommerce | RECORD | Ecommerce-specific data (transaction_id, value, etc.) | | items | RECORD (REPEATED) | Array of items involved in the event |

The first thing to notice: event_params, user_properties, and items are all repeated records (arrays). This means a single event row can contain multiple parameters, multiple user properties, and multiple items. This nested structure is what makes GA4 BigQuery queries more complex than simple SELECT column FROM table statements.

How event_params Work

The event_params field is where most of the interesting data lives, and it is also where most confusion begins.

Each event parameter is stored as a struct with a key (string) and a value that has four possible sub-fields:

  • value.string_value
  • value.int_value
  • value.float_value
  • value.double_value

Only one of these sub-fields will be populated for any given parameter. For example, the page_location parameter stores its value in string_value, while ga_session_id uses int_value.

To extract a specific parameter, you need to unnest the array and filter by key. Here is the fundamental pattern:

SELECT
  event_name,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
FROM
  `your-project.analytics_123456789.events_20260215`
WHERE
  event_name = 'page_view'

This subquery-within-SELECT pattern is something you will use constantly when working with GA4 data in BigQuery. It is verbose, but it is the standard approach.

Common GA4 Ecommerce Events

GA4 defines a set of recommended ecommerce events. If your site uses Google Tag Manager or gtag.js with the standard ecommerce data layer, these events are sent automatically. Here are the key ones:

| Event Name | When It Fires | Key Parameters | |-----------|---------------|----------------| | page_view | Every page load | page_location, page_title, page_referrer | | view_item | Product detail page viewed | items array with product details | | view_item_list | Product listing page viewed | items array, item_list_name | | select_item | Product clicked in a list | items array, item_list_name | | add_to_cart | Item added to cart | items array with product + quantity | | remove_from_cart | Item removed from cart | items array | | view_cart | Cart page viewed | items array | | begin_checkout | Checkout process started | items array, value, currency | | add_shipping_info | Shipping info submitted | items array, shipping_tier | | add_payment_info | Payment info submitted | items array, payment_type | | purchase | Transaction completed | items array, transaction_id, value, currency, tax, shipping | | refund | Refund processed | items array, transaction_id, value |

Understanding which events fire at which stage of the funnel is essential for building accurate reports.

Ecommerce-Specific Data Structures

The Items Array

Whenever an ecommerce event fires, it typically includes an items array. Each element in this array represents a product involved in the event. The items array is a repeated record, meaning a single purchase event can contain multiple items.

Each item struct has these fields:

| Field | Type | Description | |-------|------|-------------| | item_id | STRING | Your product SKU or ID | | item_name | STRING | Product name | | item_brand | STRING | Brand name | | item_variant | STRING | Variant (e.g., size, color) | | item_category | STRING | Primary product category | | item_category2 through item_category5 | STRING | Additional category levels | | price | FLOAT | Unit price of the item | | quantity | INTEGER | Number of units | | coupon | STRING | Item-level coupon code, if any | | affiliation | STRING | Store or affiliate name | | item_list_name | STRING | The list where the item was shown | | item_list_index | STRING | Position of the item in a list | | promotion_id | STRING | Promotion ID if from a promo | | promotion_name | STRING | Promotion name |

To query item-level data, you need to cross join the items array using UNNEST:

SELECT
  event_date,
  items.item_id,
  items.item_name,
  items.item_category,
  items.price,
  items.quantity,
  items.price * items.quantity AS item_revenue
FROM
  `your-project.analytics_123456789.events_*`,
  UNNEST(items) AS items
WHERE
  event_name = 'purchase'
  AND _TABLE_SUFFIX BETWEEN '20260201' AND '20260215'
ORDER BY
  item_revenue DESC

Note the use of events_* with _TABLE_SUFFIX -- this is how you query across multiple daily tables without listing each one individually.

Transaction Data

Transaction-level data is stored in two places:

  1. The ecommerce record on the event row, which contains transaction_id, purchase_revenue, shipping_value, tax_value, and unique_items.
  2. Event parameters, where transaction_id, value, currency, shipping, and tax are also available.

In practice, the event parameters are more reliably populated, especially if you are using a standard ecommerce data layer. Here is how to extract transaction-level purchase data:

SELECT
  event_date,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id') AS transaction_id,
  (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') AS revenue,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'currency') AS currency,
  (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'shipping') AS shipping,
  (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'tax') AS tax
FROM
  `your-project.analytics_123456789.events_*`
WHERE
  event_name = 'purchase'
  AND _TABLE_SUFFIX BETWEEN '20260201' AND '20260215'

You might also find that value is stored in float_value rather than double_value depending on your implementation. When in doubt, use COALESCE to check both:

COALESCE(
  (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value'),
  (SELECT value.float_value FROM UNNEST(event_params) WHERE key = 'value'),
  CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'value') AS FLOAT64)
) AS revenue

This defensive approach prevents missing data due to inconsistent parameter types.

Practical Query Patterns

Now that you understand the schema, let us look at four queries you will use regularly as an ecommerce analyst.

1. Revenue by Day

This is the most fundamental ecommerce query. It aggregates purchase revenue by date.

SELECT
  event_date,
  COUNT(DISTINCT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')) AS transactions,
  SUM(ecommerce.purchase_revenue) AS total_revenue,
  ROUND(SAFE_DIVIDE(SUM(ecommerce.purchase_revenue),
    COUNT(DISTINCT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id'))), 2) AS avg_order_value
FROM
  `your-project.analytics_123456789.events_*`
WHERE
  event_name = 'purchase'
  AND _TABLE_SUFFIX BETWEEN '20260101' AND '20260215'
GROUP BY
  event_date
ORDER BY
  event_date

This gives you a daily time series with transaction count, total revenue, and average order value. Replace the _TABLE_SUFFIX range with your desired date range.

2. Top Products by Revenue

To find your best-selling products, you need to unnest the items array from purchase events:

SELECT
  items.item_id,
  items.item_name,
  items.item_category,
  SUM(items.quantity) AS total_units_sold,
  ROUND(SUM(items.price * items.quantity), 2) AS total_revenue,
  COUNT(DISTINCT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')) AS order_count
FROM
  `your-project.analytics_123456789.events_*`,
  UNNEST(items) AS items
WHERE
  event_name = 'purchase'
  AND _TABLE_SUFFIX BETWEEN '20260101' AND '20260215'
GROUP BY
  items.item_id,
  items.item_name,
  items.item_category
ORDER BY
  total_revenue DESC
LIMIT 25

This query tells you which products drive the most revenue, how many units were sold, and how many distinct orders each product appeared in.

3. Conversion Funnel

One of the most valuable analyses for ecommerce is the conversion funnel: how many sessions progressed through each stage from browsing to purchase? Here is a query that builds this funnel:

WITH sessions AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
    event_name
  FROM
    `your-project.analytics_123456789.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20260201' AND '20260215'
    AND event_name IN ('session_start', 'view_item', 'add_to_cart', 'begin_checkout', 'purchase')
),
 
funnel AS (
  SELECT
    user_pseudo_id,
    session_id,
    MAX(CASE WHEN event_name = 'session_start' THEN 1 ELSE 0 END) AS had_session,
    MAX(CASE WHEN event_name = 'view_item' THEN 1 ELSE 0 END) AS viewed_product,
    MAX(CASE WHEN event_name = 'add_to_cart' THEN 1 ELSE 0 END) AS added_to_cart,
    MAX(CASE WHEN event_name = 'begin_checkout' THEN 1 ELSE 0 END) AS began_checkout,
    MAX(CASE WHEN event_name = 'purchase' THEN 1 ELSE 0 END) AS purchased
  FROM sessions
  GROUP BY user_pseudo_id, session_id
)
 
SELECT
  COUNT(*) AS total_sessions,
  SUM(viewed_product) AS sessions_with_product_view,
  SUM(added_to_cart) AS sessions_with_add_to_cart,
  SUM(began_checkout) AS sessions_with_checkout,
  SUM(purchased) AS sessions_with_purchase,
  ROUND(SAFE_DIVIDE(SUM(viewed_product), COUNT(*)) * 100, 2) AS pct_viewed_product,
  ROUND(SAFE_DIVIDE(SUM(added_to_cart), SUM(viewed_product)) * 100, 2) AS pct_view_to_cart,
  ROUND(SAFE_DIVIDE(SUM(began_checkout), SUM(added_to_cart)) * 100, 2) AS pct_cart_to_checkout,
  ROUND(SAFE_DIVIDE(SUM(purchased), SUM(began_checkout)) * 100, 2) AS pct_checkout_to_purchase
FROM funnel

This query gives you both absolute counts and step-by-step conversion rates. You can see exactly where users drop off in your funnel and prioritize accordingly.

4. User Acquisition Channels

Understanding which channels drive valuable users requires combining traffic source data with purchase behavior:

SELECT
  traffic_source.source,
  traffic_source.medium,
  traffic_source.name AS campaign,
  COUNT(DISTINCT user_pseudo_id) AS users,
  COUNTIF(event_name = 'purchase') AS purchases,
  ROUND(SUM(CASE WHEN event_name = 'purchase' THEN ecommerce.purchase_revenue ELSE 0 END), 2) AS revenue,
  ROUND(SAFE_DIVIDE(
    SUM(CASE WHEN event_name = 'purchase' THEN ecommerce.purchase_revenue ELSE 0 END),
    COUNT(DISTINCT user_pseudo_id)
  ), 2) AS revenue_per_user
FROM
  `your-project.analytics_123456789.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20260101' AND '20260215'
GROUP BY
  traffic_source.source,
  traffic_source.medium,
  traffic_source.name
HAVING
  users > 10
ORDER BY
  revenue DESC
LIMIT 20

One important caveat: the traffic_source record in GA4's BigQuery export reflects the first-touch source for the user, not the session-level source. If you need session-scoped source/medium (the channel that drove a specific session), you need to extract it from event_params:

(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS session_source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS session_medium,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') AS session_campaign

These session-scoped parameters are typically set on session_start events.

Common Pitfalls and Gotchas

Working with GA4 data in BigQuery is powerful, but there are several traps that catch people regularly. Here is what to watch for.

Nested and Repeated Fields

The most common frustration for analysts new to GA4 BigQuery is the nested structure. You cannot simply write SELECT page_location FROM events because page_location is buried inside the event_params array. Every parameter extraction requires the UNNEST subquery pattern. This makes queries verbose and slower to write.

A practical tip: create a view or a flattened table for your most common queries. You can schedule a daily query that extracts the parameters you care about into a clean, flat table with columns like page_location, session_id, source, medium, etc. This saves time and makes ad hoc analysis much faster.

Session Scope vs. Event Scope

GA4 is event-based, but many business questions are session-based ("how many sessions resulted in a purchase?") or user-based ("how many users made a purchase this month?"). Bridging between these scopes requires careful query construction.

Sessions in GA4 BigQuery are identified by the combination of user_pseudo_id and ga_session_id (from event_params). There is no built-in session table. You have to construct sessions yourself by grouping events with the same user and session ID.

A common mistake is counting events when you mean to count sessions. For example, a user who views 10 pages in a session generates 10 page_view events. If you count page_view events, you get page views, not sessions. Always be explicit about what you are counting.

Sampling and Data Freshness

The BigQuery export is unsampled, which is one of its biggest advantages. However, there are freshness considerations:

  • Daily tables (events_YYYYMMDD) are finalized once per day, usually within 24-48 hours after the day ends.
  • Intraday tables (events_intraday_YYYYMMDD) are updated throughout the day but are deleted once the daily table is finalized.
  • If you query the current day, you need to use the intraday table. If it does not exist yet, there may be a short delay after midnight.

A practical pattern for querying "yesterday plus today" uses a union:

SELECT * FROM `your-project.analytics_123456789.events_20260214`
UNION ALL
SELECT * FROM `your-project.analytics_123456789.events_intraday_20260215`

Event Parameter Extraction Patterns

Because parameters can be stored in different value fields (string_value, int_value, float_value, double_value), you need to know which field each parameter uses. There is no single reference for this -- it depends on your implementation.

Here is a diagnostic query that shows you which value fields are populated for each parameter key in your data:

SELECT
  params.key,
  COUNTIF(params.value.string_value IS NOT NULL) AS string_count,
  COUNTIF(params.value.int_value IS NOT NULL) AS int_count,
  COUNTIF(params.value.float_value IS NOT NULL) AS float_count,
  COUNTIF(params.value.double_value IS NOT NULL) AS double_count
FROM
  `your-project.analytics_123456789.events_20260215`,
  UNNEST(event_params) AS params
GROUP BY
  params.key
ORDER BY
  params.key

Run this once against your data and keep the results as a reference. It will save you hours of debugging.

Duplicate Events

GA4 can sometimes send duplicate events, especially during network instability or tag misconfiguration. Purchase events are particularly sensitive to this -- a duplicate purchase event means double-counted revenue.

Always use DISTINCT on transaction_id when counting transactions or summing revenue:

SELECT
  COUNT(DISTINCT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')) AS unique_transactions
FROM
  `your-project.analytics_123456789.events_*`
WHERE
  event_name = 'purchase'
  AND _TABLE_SUFFIX BETWEEN '20260201' AND '20260215'

BigQuery Costs

Each query you run in BigQuery scans data and costs money ($5 per TB scanned in on-demand pricing). GA4 tables can grow large quickly, especially for high-traffic ecommerce sites. To keep costs down:

  • Always use _TABLE_SUFFIX to limit the date range you scan.
  • Avoid SELECT * -- only select the columns you need.
  • Consider partitioning or clustering frequently queried tables.
  • Use BigQuery's query validator to estimate scan size before running large queries.

How AI and Natural Language BI Tools Can Help

By now, you can probably see the challenge. GA4's BigQuery schema is powerful but demanding. Writing correct SQL against nested, repeated fields requires deep familiarity with the schema. Building a conversion funnel query from scratch takes 30+ lines of SQL. And if you want to join GA4 data with your ad spend data from Meta or Google Ads, the complexity multiplies.

This is exactly the kind of problem that AI-powered analytics tools are built to solve. Instead of memorizing schema details and writing verbose SQL, imagine asking a question in plain English:

  • "What was our conversion rate from add-to-cart to purchase last week, broken down by device type?"
  • "Which products had the highest revenue but lowest return rates in January?"
  • "Show me daily revenue by acquisition channel for the last 30 days."

A natural language BI tool that understands your GA4 BigQuery schema can translate these questions into correct, optimized SQL, run the query, and return the answer in seconds. It handles the UNNEST patterns, the parameter extraction, the session reconstruction, and the scope management automatically.

This does not replace the need to understand your data. Knowing what event_params are, how sessions work, and what the items array contains makes you a better analyst and helps you ask better questions. But it removes the bottleneck of writing and debugging complex SQL for every ad hoc question.

For ecommerce teams that need to move fast -- answering questions from marketing, merchandising, and leadership throughout the day -- the combination of a well-structured BigQuery dataset and an AI layer on top is transformative.

What Comes Next

This guide covered the foundations of GA4 ecommerce data in BigQuery: the event-based model, the nested parameter structure, the items array, and practical query patterns you can start using today. But GA4 is just one piece of the ecommerce data puzzle.

If you are interested in how natural language interfaces can make this kind of data more accessible to your entire team, read our guide on What is Natural Language BI? to understand how conversational analytics works in practice.

In future articles, we will cover the data structures behind Meta Ads, Google Ads, and Shopify -- and how they fit together with GA4 in a unified analytics layer. Each platform has its own schema quirks, and understanding those structures is the first step toward building a complete, queryable view of your ecommerce performance.

The data is already in your warehouse. The question is how fast you can turn it into decisions.

Stop Guessing. Start Asking.

Verity turns your data into a conversation. Ask questions in plain language, get trusted answers backed by your actual data.

Join the Waitlist

Related Articles