Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a permanent solution for analytics #10

Closed
Rikthepixel opened this issue Mar 27, 2024 · 23 comments · Fixed by #14
Closed

Find a permanent solution for analytics #10

Rikthepixel opened this issue Mar 27, 2024 · 23 comments · Fixed by #14
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Rikthepixel
Copy link
Collaborator

Rikthepixel commented Mar 27, 2024

Adblockers like uBlock that make use of EasyList to block the self-hosted Plausible analytics. The (publicily available) analytics are helpful to know how much effort should be spent into the website as a whole.

For now a workaround has been used by naming the analytics domain plausible.thenewoil.org, but this will likely be blocked later on. A reliable way to do analytics that can't be blocked (and in our case must respect privacy) is server-side analytics. A few options are on the table:

  1. Caddy metrics, these are server-side and can be made public. I don't know how much Caddy metrics respects privacy or if it conforms to GDPR, so that will have to be looked into.
  2. Manually calling the Plausible API, Somehow the server (Caddy) could call the Plausible API to register a page-view. Idk if this is possible.
  3. (Last resort 💀) Automate a way to change the analytics domain. If all else fails, this is a (janky) last resort. Add a hash in front of the domain so like aejklg.thenewoil.org. It could be automated, but that would be hard and janky. I advice against it, but it is technically possible.
@Rikthepixel Rikthepixel added bug Something isn't working help wanted Extra attention is needed labels Mar 27, 2024
@jonaharagon
Copy link
Contributor

jonaharagon commented Mar 27, 2024

Somehow the server (Caddy) could call the Plausible API to register a page-view. Idk if this is possible.

I was literally just looking into this, because I want this functionality for my own personal reasons as well, so if I find/make a solution I will share.

Useful resources on this topic:

@tnonate
Copy link
Owner

tnonate commented Mar 27, 2024

So Jonah was the one who suggested the first one to me, and based on that, here seems to be the basic order of operations:

  1. Enable metrics in the Caddyfile. The ideal recommendation is this one.
  2. Download and configure Prometheus (the documentation provided by Caddy I think? seems straighforward enough)
  3. Connect Grafana to Prometheus as a front-end to easily and publicly view analytics.

Regarding being privacy respecting, I think this is a sample of what it can do: https://caddyserver.com/docs/metrics#caddys-metrics

@jonaharagon
Copy link
Contributor

If you go the Grafana route you'll end up with a dashboard like this: https://grafana.com/grafana/dashboards/13460-caddy/ (this is a pre-made dashboard for Caddy, but you could also make your own with any Prometheus queries you want)

image

@tnonate
Copy link
Owner

tnonate commented Mar 27, 2024

That doesn't seem as obvious and user-friendly to a non-techy. Even I'm kind of struggling to figure out exactly what I'm looking at.

Not criticizing you, I appreciate the input, more just thinking aloud that this seems like the less desirable option for my use-case.

@jonaharagon
Copy link
Contributor

Yeah, that is kind of why I thought you should see that beforehand. I guess the main issue with option 1 in this issue is that Caddy metrics really only captures numerical values (i.e. number of requests, how long they take, etc.) and not things like IP address, web browser, country, etc. that a typical web analytics product might.

So in the example above, the only graph which is probably relevant to you is the first one, which may or may not be enough data for your use-case.

@tnonate
Copy link
Owner

tnonate commented Mar 27, 2024

I appreciate the heads-up. For the use case I have in mind, raw numbers are probably enough - assuming it's easy enough to make sense of - but certainly not ideal. Plausible is more than ideal, but we may have to face some serious questions about sustainability in the long run that may force us is into other less-than-ideal options like non-public server-side stats.

@jonaharagon
Copy link
Contributor

jonaharagon commented Mar 27, 2024

Also, having spent an hour looking at Safing's solution and at Caddy's module code, I think that option 2 is 100% possible, but unfortunately I don't think I'm good enough at Go to create a Caddy middleware module that does this (and mainly it's not a priority for my personal projects to spend any more time on myself).

But, if you could find or hire someone who does know Go, I think that adapting Safing's existing Go code to Caddy or just creating a Plausible middleware module for Caddy from scratch should be a pretty trivial task for them.

In that case, you'd basically have server-side analytics which feed directly into the Plausible dashboard, which I think we both agree is a very ideal interface for public stat sharing :)

@tnonate
Copy link
Owner

tnonate commented Mar 27, 2024

As an update, it seems like changing the domain name hasn't really fixed the issue. The numbers should be skyrocketing by comparison. I wonder if people are using other adblockers that are more effective or something.

Server-side seems like an unavoidable solution in the long run.

@Rikthepixel
Copy link
Collaborator Author

Rikthepixel commented Mar 28, 2024

So the options right now are:

  1. Custom Go Caddy middleware - Ideal but would take time
  2. Enabling Caddy analytics and publishing them via Grafana - Less user friendly, but accurate

I think the fix for now could be the second one + unreliable plausible client side statistics, while the first one is not available.

To me it seems doable to make the middleware, but it is something that will take time on my end (because learning a new language, learning CaddyModules, etc.)

@tnonate
Copy link
Owner

tnonate commented Mar 28, 2024

I'm not opposed to hiring a developer on a short-term basis to build this, so long as they can also provide support (initial troubleshooting + maybe a few years of security updates if needed) and, of course, assuming I can afford their rate. This is something I'm expecting would be an investment in the long-run.

@jonaharagon
Copy link
Contributor

Welp. I ended up coding a solution to this for privacyguides.org on my own after all: https://github.com/jonaharagon/caddy-umami

The only problem (for you) is that I ended up going with Umami instead of Plausible, mostly just because it's available from Pikapods, partly because it is rather nice looking. So if you folks decide to switch to Umami I can write up a guide on how to get it installed quick on your site too, otherwise yeah I'm very confident you'll be able to figure out your own custom solution, it was relatively easy to do.

Umami Demo: https://stats.privacyguides.net/share/nVWjyd2QfgOPBhMF/www.privacyguides.org

@tnonate
Copy link
Owner

tnonate commented Apr 2, 2024

Yeah those numbers look pretty accurate for PG lol

Is there any reason I may not want to use umami? I mean, it looks nearly identical (it actually collects even more info, but still nothing I would consider overly invasive), it can also be self-hosted, etc. Are there any drawbacks apart from less brand recognition compared to Plausible?

@jonaharagon
Copy link
Contributor

Nothing comes to mind for me. It might collect more data than Plausible if you're using the JavaScript code (just screen resolution?), but with this server side approach we're taking the server is limited in what data it can collect (the server can't ask the browser what its screen resolution is), so the data that will be collected in either case is identical.

@tnonate
Copy link
Owner

tnonate commented Apr 3, 2024

Well I say it seems to collect more data based on showing - for example - OS (down to which version) but like I said, I don't think that's fingerprinting users, nor is tracking them from site to site. So it's really not a big deal. More of a note than a complain tor anything.

Yeah this seems like a great solution. Thanks for your work and for sharing. I'll see about how to implement this.

@jonaharagon
Copy link
Contributor

methinks you may not have explored Plausible enough 😅

vDip62HMl0Two8Th
mI08o8dGSLaLYNoi

@Rikthepixel
Copy link
Collaborator Author

Damn Chrome OS, sounds accurate and totally not like a spoofed user agent 😂

@tnonate
Copy link
Owner

tnonate commented Apr 5, 2024

methinks you may not have explored Plausible enough 😅

Methinks you are correct lmao

@tnonate
Copy link
Owner

tnonate commented Apr 5, 2024

So for actually deploying this, I assume I use this documentation to install the public facing side (eg "stats.thenewoil.org") and then what?

@jonaharagon
Copy link
Contributor

On the Configure Umami step of install you will need to add a line at the bottom of the .env file it has you create which looks like:

CLIENT_IP_HEADER=X-Forwarded-For

@tnonate
Copy link
Owner

tnonate commented Apr 6, 2024

I never expected to see the day when Jonah would be recommending a bare metal install instead of docker :P

@jonaharagon
Copy link
Contributor

I mean, I would definitely use Docker if you're up for it 😉

@tnonate
Copy link
Owner

tnonate commented Apr 7, 2024

I'm down for whatever. It seemed to work fine with Plausible, but I'm not sure how to edit env variables. Rik sent me a config file on Signal that seemed to work until I tried to bring in Caddy for TLS and then I start getting issues.

@jonaharagon
Copy link
Contributor

Not sure why it'd be different from Plausible, what's the config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants