Skip to content

Latest commit

 

History

History
77 lines (63 loc) · 8.77 KB

README.md

File metadata and controls

77 lines (63 loc) · 8.77 KB

Montu Logo

Summary of Analytics Engineer Challenge

  • You will be creating models that run on the GA4 public dataset.
  • The data modelling will be completed in dbt (initial setup is completed). There are some steps required for it to work on your local env.
  • The data you will be using is described below and is based on raw clickstream events.

Dataset

Obfuscated Google Analytics 4 data emulating a web ecommerce implementation of Google Analytics. It’s a great way to look at business data and experiment and learn the benefits of analyzing Google Analytics data in BigQuery. Learn more

column_name data_type description
event_date STRING The date of the event
event_timestamp INT64 The timestamp of the event
event_name STRING The name of the event
event_params ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, double_value FLOAT64>>> The parameters associated with the event
event_previous_timestamp INT64 The timestamp of the previous event
event_value_in_usd FLOAT64 The value of the event in USD
event_bundle_sequence_id INT64 The sequence ID of the event bundle
event_server_timestamp_offset INT64 The offset of the server timestamp
user_id STRING The ID of the user
user_pseudo_id STRING The pseudonymous ID of the user
privacy_info STRUCT<analytics_storage INT64, ads_storage INT64, uses_transient_token STRING> Privacy-related information
user_properties ARRAY<STRUCT<key INT64, value STRUCT<string_value INT64, int_value INT64, float_value INT64, double_value INT64, set_timestamp_micros INT64>>> Properties associated with the user
user_first_touch_timestamp INT64 The timestamp of the user's first touch
user_ltv STRUCT<revenue FLOAT64, currency STRING> The lifetime value of the user
device STRUCT<category STRING, mobile_brand_name STRING, mobile_model_name STRING, mobile_marketing_name STRING, mobile_os_hardware_model INT64, operating_system STRING, operating_system_version STRING, vendor_id INT64, advertising_id INT64, language STRING, is_limited_ad_tracking STRING, time_zone_offset_seconds INT64, web_info STRUCT<browser STRING, browser_version STRING>>> Information about the device used
geo STRUCT<continent STRING, sub_continent STRING, country STRING, region STRING, city STRING, metro STRING> Geographic information
app_info STRUCT<id STRING, version STRING, install_store STRING, firebase_app_id STRING, install_source STRING> Information about the app
traffic_source STRUCT<medium STRING, name STRING, source STRING> Information about the traffic source
stream_id INT64 The ID of the stream
platform STRING The platform of the event
event_dimensions STRUCT Dimensions of the event
ecommerce STRUCT<total_item_quantity INT64, purchase_revenue_in_usd FLOAT64, purchase_revenue FLOAT64, refund_value_in_usd FLOAT64, refund_value FLOAT64, shipping_value_in_usd FLOAT64, shipping_value FLOAT64, tax_value_in_usd FLOAT64, tax_value FLOAT64, unique_items INT64, transaction_id STRING> E-commerce related information
items ARRAY<STRUCT<item_id STRING, item_name STRING, item_brand STRING, item_variant STRING, item_category STRING, item_category2 STRING, item_category3 STRING, item_category4 STRING, item_category5 STRING, price_in_usd FLOAT64, price FLOAT64, quantity INT64, item_revenue_in_usd FLOAT64, item_revenue FLOAT64 Item related information including id brand and other information

Local Setup

GCP project and service account

Setup your local virtual python environment for dbt:

Run the following commands in your terminal (Install Anaconda.Navigotor).

Step 1 (create the virtual environment):

conda create --name dbt python=3.10.11

Step 2 (activate the python environment):

conda activate dbt

Step 3 (Install dbt dependencies):

pip install -r requirements.txt

Step 4 (Navigate to the dbt project folder):

cd test

Challenge

Notes before you start

  • Access the sample data in a public dataset ga4_obfuscated_sample_ecommerce in BigQuery this will be the source data for this challenge.
  • Make sure that your dbt env can access BigQuery. Run dbt debug to check.

Create models

The details of the requirements will be shared separately.