Google Analytics BigQuery Export Part Two: Example SQL recipes


July 25, 2019

As mentioned in our previous post, “Google Analytics BigQuery Export Part One: Why Export Google Analytics Data?” Google Analytics BigQuery Export Part One: Why Export Google Analytics Data?”, beginning to work on GA data can be difficult as there are nuances to the way it’s stored. We’ve put together a list of recipes you can hopefully use to jumpstart even greater analysis from your GA BQ export.

All the examples below require modifying the from clause as follows to your specific project and date range.

  1. <DataSetId> with the desired dataset id 
  2. <ProjectId> with the project id the dataset belongs to in #1 (above)
  3. YYYYMMDD with desired year (YYYY), Month (MM) and Date (DD)

The ga_sessions_YYYYMMDD table has schema documented here where each and every column (field) is explained in detail.

 

Recipe: GA Basics

GA BQ export is not just for complex analysis. It’s also capable of reproducing any metrics and reports for your custom dashboards. For example, the query below calculates metrics:

  • Users
  • Visits
  • Pageviews
  • Bounces
  • Transactions
  • Revenue
  • Revenue per Visit

For the following dimensions:

  • Channel (traffic medium)
  • Device Category
  • Date

During August 1 & 2, 2016 for the sample Google Analytics dataset provided by Google.

Go ahead and try it out using the SQL recipe below and compare to your current GA session reports. You may have to adjust the fixed date range from 1-aug-2016 to 2-aug-2018 to something else specific to your website or use the dynamic date range (commented out with ‘/ * …. */’.   In case you find that the user metric totals are different, read this post.

 

#standardSQL

SELECT
  channelGrouping AS traffic_type,
  device.deviceCategory as device_type,
  date AS Aug_2016_day,
  _TABLE_SUFFIX AS Aug_2016_day2,
  COUNT(DISTINCT fullVisitorId) AS Users,
  SUM(totals.visits) AS visits,
  SUM(totals.pageviews) AS pageviews,
  SUM(totals.bounces) AS bounces,
  SUM(totals.transactions) AS transactions,
  SUM(totals.transactionRevenue)/1000000 AS revenue,
  ROUND((SUM(totals.transactionRevenue)/1000000)/(SUM(totals.visits)),2) AS revenue_per_visit
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20160801'  AND '20160802'
  /* another way to filter date range for a rolling date range, like past 30 days */
  /*
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
   AND
   FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
   */
GROUP BY
  channelGrouping,  device_type, date,  _TABLE_SUFFIX
ORDER BY
  channelGrouping DESC, device_type desc,  date ASC, _TABLE_SUFFIX ASC

The main benefits of exporting GA data to BQ is for more details and exhaustive queries to produce information not available in GA or through the many data connectors (APIs, Super Metrics, Tableau data connectors, etc.) that only retrieve aggregated data and not hit-level records (page views, events, social, products, etc.).

Every row in either the ga_sessions_yyyymmdd or the ga_sessions_intraday_yyyymmdd tables represents a session. A session can often contain multiple pageviews and events or other interactions. These interactions are stored as hits within the session. Hits contain a lot of information like page path, title, custom dimensions, etc. and come in the form of a record.  

Since multiple hits are contained within a session, the hit record is defined as repeated. So in order to see every hit as a single row in your query results, that record must be unnested using the UNNEST function. Take note of both the hits and customDimensions record in screenshot of the schema below. They both have the word REPEATED. Repeated fields must be unnested to have them appear in each and every row and not contained (nested) within a row.

 

Recipe: Promotion Performance

The following query unnests the hit repeated record and then promotion repeated record within each hit. The result calculates and reports promotion metrics for each promotion and the position it’s in. The time range is August 2016.

SELECT
  promo.promoName AS Promotion_Name,
  promo.promoCreative AS Promotion_Type,
  promo.promoPosition as Promotion_Position,
  COUNT(hit.promotionActionInfo.promoIsView) AS Promotion_Views,
  COUNT(hit.promotionActionInfo.promoIsClick) AS Promotion_Clicks,
  ROUND((COUNT(hit.promotionActionInfo.promoIsClick)
        /COUNT(hit.promotionActionInfo.promoIsView))*100,2) AS Promotion_CTR
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
  UNNEST(hits) AS hit,
  UNNEST(hit.promotion) AS promo
WHERE
  _TABLE_SUFFIX BETWEEN '20160801' AND '20160831' -- date range
GROUP BY
  Promotion_Name,  Promotion_Type, Promotion_Position
HAVING
  Promotion_Views > Promotion_Clicks -- longtail cleanup
  AND Promotion_Views > 100 --threshold
ORDER BY
  Promotion_CTR DESC

Recipe: Hit Scoped Custom Dimensions

The following query will retrieve all hit-scoped custom dimensions for each page path in a session in long table orientation for the past 30 days.

#standardSQL
SELECT
  CONCAT(fullVisitorId, ".", CAST(visitId AS STRING)) AS Session_ID,
  hit.page.pagePath AS Page,
  cd.index AS cd_index,
  cd.value AS cd_value
FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_*`  session,
  UNNEST(session.hits) AS hit,
  UNNEST(hit.customDimensions) AS cd
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

Recipe: Reproducing goals

If there is any downside to GA data exported to BQ it is that goals are not exported. That means you need to replicate the logic used to configure goals in GA within your BQ query. 

The query below looks for specific events to count the number of leads a website receives. You’ll have to change the REGEXP_CONTAINS parameter values as per your event hierarchy to extract the goals you’re interested in.

SELECT
  CASE
    WHEN REGEXP_CONTAINS(hits.eventInfo.eventCategory, r'lead') AND REGEXP_CONTAINS(hits.eventInfo.eventAction, r'trial_signup|demo_request|contact_us') THEN "Web Leads"
    ELSE "null"
  END AS goal2,
  date,
  count (DISTINCT CONCAT(fullVisitorId, ".", CAST(visitId AS STRING))) AS Unique_sessions
FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
  UNNEST(hits) AS hits
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY
  goal2,  date
ORDER BY
  goal2

Recipe: Dates and Times

There are many date and times available in the GA BQ dataset. These can be used for lookback or window based analytical function reporting, similar to segments in GA but far more scalable and open. Below is a query showcasing the different date/times and time-zones available in GA exports to BQ. The query has a hard-coded time range of April 1-4, 2017.

#standardSQL
SELECT
  _TABLE_SUFFIX AS table_suffix_date,
  visitStartTime AS Session_Start_Time_in_epoch_UTC,  --number of seconds since Jan 1/1970
  TIMESTAMP_SECONDS(visitStartTime) AS Session_Start_Time_in_UTC,
  TIMESTAMP_SECONDS(visitStartTime + CAST(hit.time/1000 AS int64)) AS Hit_Time_in_UTC,
  DATETIME(TIMESTAMP_SECONDS(visitStartTime + CAST(hit.time/1000 AS int64)),    "PST8PDT" ) AS Hit_Time_in_GAViewTimezone, --also use MST7MDT, CST6CDT, EST5EDT as desired
  hit.time AS hit_seconds_since_first_hit,
  hit.hour AS hour_hit_occured_in_GAView_TimeZone,
  hit.minute AS minute_hit_occured,
  hit.hitNumber AS hit_number_of_session,
  hit.type AS hit_type
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,   /* change this to be client specifc <project>.<dataset_id>.<table_name> */
  UNNEST(hits) AS hit
WHERE
  _TABLE_SUFFIX BETWEEN '20170401'  AND '20170404'
ORDER BY
  table_suffix_date ASC,
  visitStartTime ASC,
  hit_number_of_session ASC

What’s Next?

Exporting session and hit data from a GA 360 account to BQ allows for limitless possibilities. The raw data logs is what you now have stored in a data warehouse and using SQL-like syntax to query it. When you export data to BQ, you own that data and can do with it what you please, including integrating with other data sets. Hopefully, this gives you a springboard to begin more advanced analysis on your hit-based data. We guarantee the data is there to produce the desired information, but please post in the comments any ways that you are using these queries or trouble you are having. And please feel free to contact us with any questions.