-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV exports (no UI) #3836
CSV exports (no UI) #3836
Conversation
lib/plausible/exports.ex
Outdated
|
||
import Ecto.Query | ||
|
||
# TODO time_on_page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the last metrics that needs work.
e421301
to
86dc8ed
Compare
@@ -0,0 +1,57 @@ | |||
defmodule Plausible.Workers.ExportCSV do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to open a new PR which would add a UI action on the exports button that starts this job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -11,7 +11,7 @@ | |||
"cachex": {:hex, :cachex, "3.6.0", "14a1bfbeee060dd9bec25a5b6f4e4691e3670ebda28c8ba2884b12fe30b36bf8", [:mix], [{:eternal, "~> 1.2", [hex: :eternal, repo: "hexpm", optional: false]}, {:jumper, "~> 1.0", [hex: :jumper, repo: "hexpm", optional: false]}, {:sleeplocks, "~> 1.1", [hex: :sleeplocks, repo: "hexpm", optional: false]}, {:unsafe, "~> 1.0", [hex: :unsafe, repo: "hexpm", optional: false]}], "hexpm", "ebf24e373883bc8e0c8d894a63bbe102ae13d918f790121f5cfe6e485cc8e2e2"}, | |||
"castore": {:hex, :castore, "1.0.5", "9eeebb394cc9a0f3ae56b813459f990abb0a3dedee1be6b27fdb50301930502f", [:mix], [], "hexpm", "8d7c597c3e4a64c395980882d4bca3cebb8d74197c590dc272cfd3b6a6310578"}, | |||
"certifi": {:hex, :certifi, "2.12.0", "2d1cca2ec95f59643862af91f001478c9863c2ac9cb6e2f89780bfd8de987329", [:rebar3], [], "hexpm", "ee68d85df22e554040cdb4be100f33873ac6051387baf6a8f6ce82272340ff1c"}, | |||
"ch": {:hex, :ch, "0.2.4", "d510fbb5542d009f7c5b00bb1ecab73307b6066d9fb9b220600257d462cba67f", [:mix], [{:db_connection, "~> 2.0", [hex: :db_connection, repo: "hexpm", optional: false]}, {:decimal, "~> 2.0", [hex: :decimal, repo: "hexpm", optional: false]}, {:ecto, "~> 3.5", [hex: :ecto, repo: "hexpm", optional: true]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}, {:mint, "~> 1.0", [hex: :mint, repo: "hexpm", optional: false]}], "hexpm", "8f065d15aaf912ae8da56c9ca5298fb2d1a09108d006de589bcf8c2b39a7e2bb"}, | |||
"ch": {:hex, :ch, "0.2.5", "b8d70689951bd14c8c8791dc72cdc957ba489ceae723e79cf1a91d95b6b855ae", [:mix], [{:db_connection, "~> 2.0", [hex: :db_connection, repo: "hexpm", optional: false]}, {:decimal, "~> 2.0", [hex: :decimal, repo: "hexpm", optional: false]}, {:ecto, "~> 3.5", [hex: :ecto, repo: "hexpm", optional: true]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}, {:mint, "~> 1.0", [hex: :mint, repo: "hexpm", optional: false]}], "hexpm", "97de104c8f513a23c6d673da37741f68ae743f6cdb654b96a728d382e2fba4de"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version improves streams in Ch.
fragment("toUInt64(round(countIf(?='pageview')*any(_sample_factor)))", e.name), | ||
:pageviews | ||
), | ||
# NOTE: are exits pageviews or any events? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3870 might be relevant here. Currently, exit_page is set on all events, not only pageviews but that seems to be incorrect and leads to wrong calculation of other derivative metrics under some circumstances. We'll most likely switch to recording exit_pages only on events that are strictly pageviews.
bounces: bounces(s), | ||
visits: visits(s), | ||
visit_duration: visit_duration(s) | ||
# NOTE: can we use just sessions_v2 table in this query? sum(pageviews) and visitors(s)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While investigating issue that led to #3870, we have found cases where session.pageviews
is inconsistent with actual related events - session had 0
for for pageviews
but there was a pageview
event associated with it, along with numerous other custom events. That might have been some one-off issue though as session updates don't do anything fancy here
analytics/lib/plausible/session/cache_store.ex
Lines 53 to 55 in 83a46fb
pageviews: | |
if(event.name == "pageview", do: session.pageviews + 1, else: session.pageviews), | |
events: session.events + 1 |
lib/plausible/exports.ex
Outdated
visitors_sessions_q = | ||
from s in "sessions_v2", | ||
# NOTE: no smapling is used right now | ||
# hints: ["SAMPLE", unsafe_fragment(^@sample)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sampling skipped because of visit duration calculation or just for better accuracy of exports in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first it was to get better accuracy... I'll re-enable it for full builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from e in "events_v2", | ||
# NOTE: no smapling is used right now | ||
# hints: ["SAMPLE", unsafe_fragment(^@sample)], | ||
where: e.site_id == ^site_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering it that shouldn't be limited to pageview events 🤔 If there's a custom event setup that for instance records custom events like "scrolled to the bottom", they might skew the time on page metric calculated like this. That's probably something that should be considered in the actual calculation logic at
analytics/lib/plausible/stats/breakdown.ex
Line 340 in 83a46fb
defp window_breakdown_time_on_page(site, query, pages) do |
) | ||
|> Plausible.S3.export_upload_multipart(s3_path) | ||
end, | ||
timeout: :infinity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we will probably want some upper time limit on this 😅
# 5 MiB is the smallest chunk size AWS S3 supports | ||
|> chunk_into_parts(5 * 1024 * 1024) | ||
|> ExAws.S3.upload(bucket, s3_path, | ||
content_disposition: ~s|attachment; filename="Plausible.zip"|, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, it probably would be good to make the downloaded filename a bit more unique - we could at least embed current timestamp in it (ideally, a sanitized domain name the export is for would be nice too). But that's something to flesh out later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some attempts at a better filename in #3921
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance to add at least one general integration test for the import routine? There was a helper function for running import/export cycle which could potentially be turned into one, I recall? Other than this, IMHO we could go ahead with this and work on further refinements in follow-ups as it's going to be hidden either way.
I'm going to add some tests next. |
5537647
to
2af250e
Compare
@@ -401,6 +397,157 @@ defmodule Plausible.Imported.CSVImporterTest do | |||
end | |||
end | |||
|
|||
describe "export -> import" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes
This PR adds CSV exports logic without any web or UI changes.
Tests
Changelog
Documentation
Dark mode