Skip to content

Commit

Permalink
Added full suite of tinybird datasources and pipes (#20882)
Browse files Browse the repository at this point in the history
ref
https://linear.app/tryghost/issue/ANAL-27/setup-tinybird-project-and-cicd
ref
https://github.com/tinybirdco/web-analytics-starter-kit/blob/main/tinybird/pipes/analytics_sessions.pipe

- These datasources and pipes work together to define the main endpoints
we need for our stats dashboard
- They are based on the web analytics starter kit from tinybird
- We've updated them to handle site_uuid
- There's more to do to pipe the member-related and post-related data
through the system yet
  • Loading branch information
ErisDS authored Aug 29, 2024
1 parent f79f547 commit 08bf49e
Show file tree
Hide file tree
Showing 16 changed files with 602 additions and 0 deletions.
18 changes: 18 additions & 0 deletions ghost/tinybird/datasources/analytics_pages_mv.datasource
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
SCHEMA >
`site_uuid` String,
`member_uuid` String,
`member_status` String,
`post_uuid` String,
`date` Date,
`device` String,
`browser` String,
`location` String,
`pathname` String,
`visits` AggregateFunction(uniq, String),
`hits` AggregateFunction(count),
`logged_in_hits` AggregateFunction(count),
`logged_out_hits` AggregateFunction(count)

ENGINE AggregatingMergeTree
ENGINE_PARTITION_KEY toYYYYMM(date)
ENGINE_SORTING_KEY date, device, browser, location, pathname, member_uuid, member_status, post_uuid, site_uuid
14 changes: 14 additions & 0 deletions ghost/tinybird/datasources/analytics_sessions_mv.datasource
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
SCHEMA >
`site_uuid` String,
`date` Date,
`session_id` String,
`device` SimpleAggregateFunction(any, String),
`browser` SimpleAggregateFunction(any, String),
`location` SimpleAggregateFunction(any, String),
`first_hit` SimpleAggregateFunction(min, DateTime),
`latest_hit` SimpleAggregateFunction(max, DateTime),
`hits` AggregateFunction(count)

ENGINE AggregatingMergeTree
ENGINE_PARTITION_KEY toYYYYMM(date)
ENGINE_SORTING_KEY date, session_id, site_uuid
13 changes: 13 additions & 0 deletions ghost/tinybird/datasources/analytics_sources_mv.datasource
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
SCHEMA >
`site_uuid` String,
`date` Date,
`device` String,
`browser` String,
`location` String,
`referrer` String,
`visits` AggregateFunction(uniq, String),
`hits` AggregateFunction(count)

ENGINE AggregatingMergeTree
ENGINE_PARTITION_KEY toYYYYMM(date)
ENGINE_SORTING_KEY date, device, browser, location, referrer, site_uuid
15 changes: 15 additions & 0 deletions ghost/tinybird/datasources/fixtures/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Datasource fixtures

The file mockingbird-schema.json is a schema for generating fake data using the Mockingbird CLI.

The CLI is installed via npm:

```
npm install -g @tinybirdco/mockingbird
```

The command I'm currently using to generate the data is:

```
mockingbird-cli tinybird --schema ghost/tinybird/datasources/fixtures/mockingbird-schema.json --endpoint gcp_europe_west3 --token xxxx --datasource analytics_events --eps 50 --limit 5000
```
72 changes: 72 additions & 0 deletions ghost/tinybird/datasources/fixtures/mockingbird-schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
{
"timestamp": {
"type": "mockingbird.datetimeBetween",
"params": [
{
"start": "2024-07-01T00:00:00.000Z",
"end": "2024-08-20T12:00:00.000Z"
}
]
},
"session_id": {
"type": "string.uuid"
},
"action": {
"type": "mockingbird.pick",
"params": [
{
"values": [
"page_hit"
]
}
]
},
"version": {
"type": "mockingbird.pick",
"params": [
{
"values": [
"1"
]
}
]
},
"payload": {
"type": "mockingbird.pickWeighted",
"params": [
{
"values": [
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\", \"locale\":\"en-US\", \"referrer\":\"https://www.kike.io\", \"pathname\":\"/coming-soon/\", \"href\":\"https://web-analytics.ghost.is/coming-soon/\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/104.0.5112.79 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"IT\", \"referrer\":\"https://www.hn.com\", \"pathname\":\"/about/\", \"href\":\"https://web-analytics.ghost.is/about/\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0\", \"locale\":\"en-GB\", \"location\":\"ES\", \"referrer\":\"\", \"pathname\":\"/\", \"href\":\"https://web-analytics.ghost.is\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0\", \"locale\":\"en-US\", \"location\":\"US\", \"referrer\":\"https://www.google.com\", \"pathname\":\"/\", \"href\":\"https://web-analytics.ghost.is\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"US\", \"referrer\":\"https://web-analytics.ghost.is/\", \"pathname\":\"/coming-soon/\", \"href\":\"https://web-analytics.ghost.is/coming-soon/\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"US\", \"referrer\":\"https://www.google.com\", \"pathname\":\"/hello-world/\", \"href\":\"https://web-analytics.ghost.is/hello-world/\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"IL\", \"referrer\":\"https://www.google.com\", \"pathname\":\"/hello-world/\", \"href\":\"https://web-analytics.ghost.is/hello-world/\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1\", \"locale\":\"es-ES\", \"location\":\"ES\", \"referrer\":\"https://www.twitter.com\", \"pathname\":\"/\", \"href\":\"https://web-analytics.ghost.is/\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"GB\", \"referrer\":\"https://www.facebook.com\", \"pathname\":\"/\", \"href\":\"https://web-analytics.ghost.is/\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"CH\", \"referrer\":\"https://www.qq.ch\", \"pathname\":\"/coming-soon/\", \"href\":\"https://web-analytics.ghost.is/coming-soon/\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Linux; Android 13) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.118 Mobile Safari/537.36\", \"locale\":\"en-US\", \"location\":\"US\", \"referrer\":\"https://www.yandex.com\", \"pathname\":\"/about/\", \"href\":\"https://web-analytics.ghost.is/about/\"}",
"{\"site_uuid\":\"mock_site_uuid\", \"user-agent\":\"Mozilla/5.0 (Linux; Android 13; SM-A102U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.118 Mobile Safari/537.36\", \"locale\":\"en-US\", \"location\":\"FR\", \"referrer\":\"https://www.github.com\", \"pathname\":\"/coming-soon/\", \"href\":\"https://web-analytics.ghost.is/coming-soon/\"}",

"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\", \"locale\":\"en-US\", \"referrer\":\"https://www.kike.io\", \"pathname\":\"/products/\", \"href\":\"https://fake-site.ghost.is/products/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/104.0.5112.79 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"IT\", \"referrer\":\"https://www.hn.com\", \"pathname\":\"/blog/\", \"href\":\"https://fake-site.ghost.is/blog/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0\", \"locale\":\"en-GB\", \"location\":\"ES\", \"referrer\":\"\", \"pathname\":\"/contact/\", \"href\":\"https://fake-site.ghost.is/contact/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0\", \"locale\":\"en-US\", \"location\":\"US\", \"referrer\":\"https://www.google.com\", \"pathname\":\"/faq/\", \"href\":\"https://fake-site.ghost.is/faq/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"US\", \"referrer\":\"https://fake-site.ghost.is/\", \"pathname\":\"/services/\", \"href\":\"https://fake-site.ghost.is/services/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"US\", \"referrer\":\"https://www.google.com\", \"pathname\":\"/team/\", \"href\":\"https://fake-site.ghost.is/team/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"IL\", \"referrer\":\"https://www.google.com\", \"pathname\":\"/pricing/\", \"href\":\"https://fake-site.ghost.is/pricing/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1\", \"locale\":\"es-ES\", \"location\":\"ES\", \"referrer\":\"https://www.twitter.com\", \"pathname\":\"/resources/\", \"href\":\"https://fake-site.ghost.is/resources/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"GB\", \"referrer\":\"https://www.facebook.com\", \"pathname\":\"/careers/\", \"href\":\"https://fake-site.ghost.is/careers/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36\", \"locale\":\"en-US\", \"location\":\"CH\", \"referrer\":\"https://www.qq.ch\", \"pathname\":\"/support/\", \"href\":\"https://fake-site.ghost.is/support/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Linux; Android 13) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.118 Mobile Safari/537.36\", \"locale\":\"en-US\", \"location\":\"US\", \"referrer\":\"https://www.yandex.com\", \"pathname\":\"/partners/\", \"href\":\"https://fake-site.ghost.is/partners/\"}",
"{\"site_uuid\":\"fake_site_id\", \"user-agent\":\"Mozilla/5.0 (Linux; Android 13; SM-A102U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.118 Mobile Safari/537.36\", \"locale\":\"en-US\", \"location\":\"FR\", \"referrer\":\"https://www.github.com\", \"pathname\":\"/events/\", \"href\":\"https://fake-site.ghost.is/events/\"}"
],
"weights": [
200, 300, 300, 300, 300, 300, 300, 300, 300, 300, 300, 400,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100
]
}
]
}
}
71 changes: 71 additions & 0 deletions ghost/tinybird/pipes/analytics_hits.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
DESCRIPTION >
Parsed `page_hit` events, implementing `browser` and `device` detection logic.

TOKEN "dashboard" READ

NODE parsed_hits
DESCRIPTION >
Parse raw page_hit events

SQL >
SELECT
timestamp,
action,
version,
coalesce(session_id, '0') as session_id,
JSONExtractString(payload, 'locale') as locale,
JSONExtractString(payload, 'location') as location,
JSONExtractString(payload, 'referrer') as referrer,
JSONExtractString(payload, 'pathname') as pathname,
JSONExtractString(payload, 'href') as href,
JSONExtractString(payload, 'site_uuid') as site_uuid,
JSONExtractString(payload, 'member_uuid') as member_uuid,
JSONExtractString(payload, 'member_status') as member_status,
JSONExtractString(payload, 'post_uuid') as post_uuid,
lower(JSONExtractString(payload, 'user-agent')) as user_agent
FROM analytics_events
where action = 'page_hit'

NODE endpoint
SQL >
SELECT
site_uuid,
timestamp,
action,
version,
session_id,
case
when member_uuid REGEXP '^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$'
then true
else false
END as logged_in,
member_uuid,
member_status,
post_uuid,
location,
referrer,
pathname,
href,
case
when match(user_agent, 'wget|ahrefsbot|curl|urllib|bitdiscovery|\+https://|googlebot')
then 'bot'
when match(user_agent, 'android')
then 'mobile-android'
when match(user_agent, 'ipad|iphone|ipod')
then 'mobile-ios'
else 'desktop'
END as device,
case
when match(user_agent, 'firefox')
then 'firefox'
when match(user_agent, 'chrome|crios')
then 'chrome'
when match(user_agent, 'opera')
then 'opera'
when match(user_agent, 'msie|trident')
then 'ie'
when match(user_agent, 'iphone|ipad|safari')
then 'safari'
else 'Unknown'
END as browser
FROM parsed_hits
24 changes: 24 additions & 0 deletions ghost/tinybird/pipes/analytics_pages.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
NODE analytics_pages_1
DESCRIPTION >
Aggregate by pathname and calculate session and hits

SQL >
SELECT
site_uuid,
member_uuid,
member_status,
post_uuid,
toDate(timestamp) AS date,
device,
browser,
location,
pathname,
uniqState(session_id) AS visits,
countState() AS hits,
countStateIf(logged_in = true) AS logged_in_hits,
countStateIf(logged_in = false) AS logged_out_hits
FROM analytics_hits
GROUP BY date, device, browser, location, pathname, member_uuid, member_status, post_uuid, site_uuid

TYPE MATERIALIZED
DATASOURCE analytics_pages_mv
20 changes: 20 additions & 0 deletions ghost/tinybird/pipes/analytics_sessions.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
NODE analytics_sessions_1
DESCRIPTION >
Aggregate by session_id and calculate session metrics

SQL >
SELECT
site_uuid,
toDate(timestamp) AS date,
session_id,
anySimpleState(device) AS device,
anySimpleState(browser) AS browser,
anySimpleState(location) AS location,
minSimpleState(timestamp) AS first_hit,
maxSimpleState(timestamp) AS latest_hit,
countState() AS hits
FROM analytics_hits
GROUP BY date, session_id, site_uuid

TYPE MATERIALIZED
DATASOURCE analytics_sessions_mv
21 changes: 21 additions & 0 deletions ghost/tinybird/pipes/analytics_sources.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
NODE analytics_sources_1
DESCRIPTION >
Aggregate by referral and calculate session and hits

SQL >
WITH (SELECT domainWithoutWWW(href) FROM analytics_hits LIMIT 1) AS current_domain
SELECT
site_uuid,
toDate(timestamp) AS date,
device,
browser,
location,
referrer,
uniqState(session_id) AS visits,
countState() AS hits
FROM analytics_hits
WHERE domainWithoutWWW(referrer) != current_domain
GROUP BY date, device, browser, location, referrer, site_uuid

TYPE MATERIALIZED
DATASOURCE analytics_sources_mv
Loading

0 comments on commit 08bf49e

Please sign in to comment.