Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc(decision): Document sensitive data collected #70

Merged
merged 8 commits into from
Feb 10, 2023
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,10 @@ This repository contains RFCs and DACIs. Lost?
- [0042-gocd-succeeds-freight-as-our-cd-solution](text/0042-gocd-succeeds-freight-as-our-cd-solution.md): Plan to replace freight with GoCD
- [0043-instruction-addr-adjustment](text/0043-instruction-addr-adjustment.md): new StackTrace Protocol field that controls adjustment of the `instruction_addr` for symbolication
- [0044-heartbeat](text/0044-heartbeat.md): Heartbeat monitoring
- [0046-ttfd-automatic-transaction-span](text/0046-ttfd-automatic-transaction-span.md): Provide a new `time-to-full-display` span to the automatic UI transactions
- [0046-ttfd-automatic-transaction-span](text/0046-ttfd-automatic-transaction-span.md): Provide a new `time-to-full-display` span to the automatic UI transactions
- [0047-introduce-profile-context](text/0047-introduce-profile-context.md): Add Profile Context
- [0048-move-replayid-out-of-tags](text/0048-move-replayid-out-of-tags.md): Plan to replace freight with GoCD
- [0062-controlling-pii-and-credentials-in-sd-ks](text/0062-controlling-pii-and-credentials-in-sd-ks.md): Controlling PII and Credentials in SDKs
- [0063-sdk-crash-monitoring](text/0063-sdk-crash-monitoring.md): SDK Crash Monitoring
- [0070-document-sensitive-data-collected](text/0070-document-sensitive-data-collected.md): Document sensitive data collected
- [0071-continue-trace-over-process-boundaries](text/0071-continue-trace-over-process-boundaries.md): Continue trace over process boundaries
87 changes: 87 additions & 0 deletions text/0070-document-sensitive-data-collected.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
- Start Date: 2023-01-27
- RFC Type: decision
- RFC PR: https://github.com/getsentry/rfcs/pull/70
- RFC Status: approved

# Summary

We need an exact but concise documentation on what sensitive data our SDKs collect. This should be available in the SDKs documentation on docs.sentry.io and be specific to all the integrations that each SDK supports.

This RFC is related to [RFC-0062 Controlling PII and Credentials in SDKs](https://github.com/getsentry/rfcs/pull/62).

# Motivation

We collect a lot of data, and transparency creates trust. This documentation will make it easier for customers to choose Sentry because they know that their users data is in good hands.
It will also make it easier for our customers to be GDPR compliant. Companies that deal with data related to persons in the european union need to create a record of what data they are processing.
This documentation will make our customers lifes way easier while creating these records.
This will probably be a big selling point for larger customers.

# Background

After a data incident and a meeting with legal, we said that we need to take data issues to the next level.

# Decision

We will start with implementing **Option A)**.

# Options Considered

## A) Table in docs of each integration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like A a lot - also for the reason that we could use this to play around with the idea of keeping a "feature list" of our SDKs documented somewhere, so people can compare which SDKs have what features.

B seems like a lot of effort which may not be worth it at the current time. Perhaps we can re-evaluate when we have more docs personnel?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like it'll be hard to get a nice B without a properly implemented A, so IMO option A is a good start, and then we might consider implement B for languages/frameworks that have relevant tooling and conventions (versus starting to build a one-fits-all solutions and then regretting it).


Have a hand written (and maintained) table in the description that shows people in an easy to grasp way what data is collected. It also shows how the data collection is changed when certain options (like `sendDefaultPII`) are changed.

Here a example on how this could look like. (After talking with our designer Jesse, having two tables makes it way easier to ingest the information)

For issues:
https://sentry-docs-git-antonpirker-python-fastapi-sensitive-data.sentry.dev/platforms/python/guides/fastapi/#data-collected-by-issue-reporting

And for performance:
https://sentry-docs-git-antonpirker-python-fastapi-sensitive-data.sentry.dev/platforms/python/guides/fastapi/#data-collected-measuring-performance

The elements in the table can be different for different kinds (frontend, backend, mobile) of SDKS.

Here a list of all sensitive data that is collected:

- HTTP Headers (`event.request.headers`)
- HTTP Cookies (`event.request.cookies`)
- HTTP Request Body (`event.request.data`)
- Log Entry Params (`event.logentry.params`)
- Logged in User (`event.user`)
- Users IP address (`event.user`)
- Breadcrumb Values (`event.breadcrumbs.values -> value.data`)
- Local vars in Exceptions (`event.exception.values -> value.stacktrace.frames -> frame.vars`)
- Span Data (`event.spans -> span.data`)
- ... more to be defined ...

Pros:

- Easy understandable and nice to read documentation

Cons:

- Documentation need to be kept up to date with seperate PR in `sentry-docs` repo when changes to SDK are made
- Documentation for different versions of the SDK not solved yet

## B) Automatic documentation creation
Copy link
Member

@philipphofmann philipphofmann Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to work on such a project, not just for this specific case. Our docs often need to be updated when we add new options / features to our SDKs. Ideally, they would be linked, so they are always up to date, but I think that's a complex problem to solve. The book Living Documentation has great tips on automatic doc creation.


If we go with _Option B)_ in [RFC-0062 Controlling PII and Credentials in SDKs](https://github.com/getsentry/rfcs/pull/62) we could add doc strings in the code of the implemented `EventScrubber` and then generate documentation from this code to render a table similar to the one in Option A) in this RFC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to live in docs or can it simply be the code itself documenting what we consider sensitive data? Docs generation could be a future improvement if enough people ask for plain text version vs just looking at the code. This would also auto solve the problem of versioning the docs for people stuck on older versions of the SDKs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RFC was created because people were asking for documentation on what data is collected, so only having it in the code is not enough. :-)

But I guess when we have the EventScrubber we can have a documentation on how the event scrubbing works in general and then point to the default EventScrubbers code to the people that want to know the details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was referring to the other RFCs outcome of having a default filtering logic which could then also serve as docs (at least for people able to read the code).


Pros:

- Generated from code, so it should be always up to date
Copy link
Member

@philipphofmann philipphofmann Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a process to ensure that developers mark code collecting PII as such. A PR template checkbox as suggested in #70 (comment) could help.

- Possible to render docs for different versions of the SDK

Cons:

- Doc strings in code need to be kept up to date.
- Need to write tooling for exporting doc string from all SDKs to be able to include the generated documentation into docs.sentry.io

# Drawbacks

People tend to forget about documentation and then we end up with outdated documentation, which is kind of worse than having no documentation at all.

# Unresolved questions

- How do we guarantee, that the documentation stays up to date with the implementation?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Do we need documentation tied to different versions of SDKs?
- We should probably add some checks in CI that make sure that code changes need to be documented as well?