Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion Wanted: Truly anonymised telemetry for prowler OSS. #6389

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

metahertz
Copy link
Contributor

@metahertz metahertz commented Jan 8, 2025

Context

This PR contains code showing how i'd like to collect anonymous usage data for prowler.
I wanted to "show my working", with respect to privacy, metrics and telemetry are a sensitive topic and I beleive i've been careful to collect only non-sensitive data that can help improve the tool while respecting privacy.

Please take a look and feel free to discuss either in this Pull Request or in our community slack at https://goto.prowler.com/slack 👍

In the PR's code, i've suggested collecting the following data.

Basic execution info:
  • Provider type (AWS/Azure/GCP/Kubernetes)
  • Number of checks executed
  • Execution duration
  • Prowler version
  • Operating system type (no specifics)
  • Python version (major/minor)
Aggregated result numbers
  • Total number of findings by status (PASS/FAIL/ERROR)
  • Number of compliance frameworks used.
  • Output formats used (csv/json/html etc).
  • Whether features like quick inventory or fixer were used.
Anonymous feature usage:
  • Which command line flags were used (without their values)
  • Categories/services scanned (counts only)
  • Number of custom checks used (integer, no details)

On top of this, a list of failed checks containing ONLY the check_id name, the reasoning for this:

  • Identify which checks are failing most frequently
  • Prioritize which checks need better documentation or fixes
  • Understand which security issues are most common
  • Guide the development of new features and improvements

to my mind, the check IDs are generic identifiers (like "iam_user_mfa_enabled") and don't contain any sensitive information about the actual resources or findings. Inversely, we do not collect custom check ID's as these may be named more sensitively.

Automatic and manual disabling of telemetry.

Continuing the "open, transparent, privacy-first" theme. The code is designed to sacrifice telemetry over anything else:

  • Performance: If the telemetry POST takes more than 2 seconds, it will be abandoned.
  • Performance: The whole telemetry section is currently wrapped in a try>except>pass block, we'll silently fail rather than annoying the user if we have any issues with telemetry collection.
  • Privacy: Higher-security environments usually block outbound internet access via default routes/gateways and only allow access via proxies. To that end, the code automatically disables telemetry if any env variable relating to proxies is set.
  • A new global option of --no-telemetry will also disable the telemetry.

Description

For easier readability/discussion, the telemetry code is currently in a function in prowler/prowler/main.py, and called at the end of the file before the exit() block. It will be moved to a utils file before a PR is considered for merging.

Tests are being worked on, as is a new documentation page which will be added to this PR.

The "receiving end" of the telemetry will also be open source, so that users can A. See the global trends of prowler for themselves and B. Confirm that no other data is being collected from both client and server-side of the codebase.

Checklist

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@metahertz metahertz self-assigned this Jan 8, 2025
@metahertz metahertz requested review from a team as code owners January 8, 2025 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant