Full-Stack Data Scientist | Marketing
30 June 2019
To programmatically retrieve Google Analytics data for marketing analytics automation.
Accessing Google Analytics API to retrieve GA records is one of the quintessential requirements to build an end-to-end marketing analytics suite. We could achieve this objective through four major steps as listed here below:
1) Generate Client ID and Secret Key in Google Cloud.
2) Update .Renviron variables.
3) Import relevant libraries and refresh GA tokens locally.
4) Finally, build the GA dataset in R.
Step 1.1. Create a Google Cloud Project: Sign in to Google Cloud Console and create a project.
Step 1.2. Google Analytics Reporting API: Once when you have created the project, navigate to the project’s API’s and Services section, and enable ‘Google Analytics Reporting API’.
Step 1.3. Configure OAuth Consent Screen: If you are setting up Google Cloud Project for the first time, you will have to configure the OAuth Consent Screen before generating the credentials. Ensure that you choose the Analytics Reporting APIs in the project scope post entering the application name and support email in the consent screen.
Step 1.4. Create OAuth Client ID: Post configuring the OAuth consent screen, create OAuth Client ID credentials. Download the Client ID and Secret Key as a JSON file and store it in your current working directory.
Step 2.1. Open .Renviron using the following command in R and update the parameters:
usethis:: edit_r_environ()
googleAnalyticsR and googleAuthR are the must-have’s in the list. Ensure that you use ‘gar_set_client ()’ function before loading the googleAnalyticsR and googleAuthR packages. Also, point the location of the ‘refresh token’ inside the gar_auth() function.
googleAuthR::gar_set_client(
"C:\\Users\\Sree\\gcp_client.json")
library(googleAnalyticsR)
library(googleAuthR)
library(tidyverse)
library(lubridate)
library(dplyr)
googleAuthR::gar_auth(token = "sc_ga.httr-oauth")
Upon executing the aforementioned command, you would be able to successfully log into the Google Analytics Reporting Database via R through automated refresh tokens.
Step 4.1. Get the list of views from GA:
Identify the entire list of views that you need from Google Analytics. Get their ViewId’s either by using the ‘ga_account_list()’ function in googleAuthR package or by identifying them from the Google Analytics itself.
my_accounts <- ga_account_list()
viewId <- (my_accounts$viewId)
Step 4.2. Build the dataset by quering Google Analytics Reporting Server:
For the purpose of illustration, here I have built a loop that downloads data sequentially for all the views that have been set up in Google Analytics.
ga_data_final <- data.frame()
for (i in viewId) {
ga_data_temp <-
google_analytics(i,
date_range = c(GoogleA_Start, GoogleA_End),
metrics = c("sessions"),
dimensions = c("date"
,"channelGrouping"
,"deviceCategory"
,"source"),
anti_sample = TRUE,
#slow_fetch = TRUE,
max = c("-1"))
ga_data_temp$viewId <- i
ga_data_final <- rbind(ga_data_final, ga_data_temp)
}
Step 4.3. Clean the dataset:
GA_export <-
left_join(ga_data_final,my_accounts,
by = "viewId") %>%
select(date,channelGrouping,
deviceCategory,source,sessions,
Country) %>%
mutate(channelGrouping_refined =
if_else(
channelGrouping == 'Organic Search','SEO',
if_else(channelGrouping == 'Paid Search','PPC Search',
if_else(channelGrouping == 'Display', 'PPC Display',
if_else(channelGrouping == 'email,email','EDM',
if_else(channelGrouping == '(Other)',
'Unspecified', channelGrouping
)))))) %>% select(date,channelGrouping,
deviceCategory,source,sessions,
channelGrouping_refined,Country)
rm(my_accounts,ga_data_final,ga_data_temp,i, viewId)
Step 4.4. Publish the dataset:
Here’s a snapshot of the final output from the Google Analytics Reporting server.
Finally, publish the dataset with a timestamp either into a flat-file or into a cloud database.
GA_data <-
GA_export %>%
write_csv("GA_data.csv")
This is just the inception of marketing analytics automation. As we could do so much more on top of this like:
1) Automate the R scripts on a virtual machine and store the incremental data in a cloud database.
2) Repeat the same set of processes to build an automated cross channel data lake that brings in incremental Facebook Ads /Google Ads /Twitter Ads datasets.
3) Forecast the user sessions/metrics from each platform using time series forecasting.
4) Lastly, build a BI dashboard that retrieves data from the flat file/ cloud storage to visualize an end-to-end marketing analytics suite.
Feel free to reach out to me if you need any help in understanding the fundamentals of GA automation in R. Hope this helps:)