Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: obfuscation of values that is PII or PCI data #2116

Open
32bit opened this issue Apr 22, 2022 · 9 comments
Open

[Feature Request]: obfuscation of values that is PII or PCI data #2116

32bit opened this issue Apr 22, 2022 · 9 comments
Labels
Feature Request Request for new functionality to support use cases not already covered Needs Investigation

Comments

@32bit
Copy link

32bit commented Apr 22, 2022

🔎 Search Terms

obfuscating log data for PII or PCI

The vision

Provide a feature to obfuscate or replace the PII/PCI data so it won't get logged.

Use case

To avoid writing custom code to maintain fields to parse out, would it be possible to add "filter" to remove or obfuscate the fields in a json perhaps.

Additional information

NA

@32bit 32bit added Feature Request Request for new functionality to support use cases not already covered Needs Investigation labels Apr 22, 2022
@wbt
Copy link
Contributor

wbt commented Apr 28, 2022

How do you identify what data is PII/PCI? I would suggest just not passing that to your logging functions, as what constitutes PII/PCI is pretty application-specific. The custom formatting extensibility might also be useful to tap into as a good place to put a function which scans for patterns you consider PII/PCI and replaces it with whatever else you want.

@escodel
Copy link

escodel commented Oct 28, 2023

Came across this feature request and had a working idea for a solution. Custom formatting makes sense, but almost like a plugin for this type of data with some options, perhaps?

If this issue is still relevant, and contributions are welcome, I could try to integrate the proof-of-concept for a PR

@DABH
Copy link
Contributor

DABH commented Oct 28, 2023

Contributions are always welcome! The question would be whether something belongs in Winston itself or e.g. as a separate transport. We have a lot of transports in the ecosystem but there isn’t really a standard for or plugin repository of formatters. Perhaps something like this could live under examples? Or we should have an examples-like folder of useful formatters people have written? Open to ideas on how we’d best capture that kind of community knowledge somewhere people could find it.

@escodel
Copy link

escodel commented Oct 28, 2023

Makes sense, thanks! I could at least provide the example formatting and it could be grouped with other useful formats.

Thinking about it from the API payload/response scenario, it could be the approach of adding a flag similar to private: true but for masking, passing options with it.

I'll take a closer look at the transports, and to see how to integrate it as part of a formatting example

@wbt
Copy link
Contributor

wbt commented Nov 8, 2023

The example PR above is merged. I would still be supportive of a PR that turns on, at first optionally and by default in the next breaking-change version, some reasonable secret obfuscator, ideally drawing on something widely used elsewhere (by GitHub itself?) instead of reinventing something to be maintained separately.

@escodel
Copy link

escodel commented Nov 8, 2023

@wbt right on, that would be super useful. I'll look at some examples and take a shot at a feature

@escodel
Copy link

escodel commented Nov 13, 2023

The example PR above is merged. I would still be supportive of a PR that turns on, at first optionally and by default in the next breaking-change version, some reasonable secret obfuscator, ideally drawing on something widely used elsewhere (by GitHub itself?) instead of reinventing something to be maintained separately.

So I've been thinking about this feature but want to stay on the right track.

I'm looking at the problem of doing this with near-instant logging, whereas it seems the secret detection services (such as GitHub's) are more git repo/history scanners for hard-coded tokens and doing pre-commit hooks, sending matches for verification against an api, etc.

My thought process is to draw upon common regex patterns internally for performance, and maybe target common field names in dynamic input like in the formatted example from the PR.

If I follow what you're saying, should the source for those patterns be an existing outside library/service? I'm also trying to capture tweaking config options for including/excluding certain patterns.

I might not be seeing the full picture yet so any guidance would be awesome. Thanks!

@escodel
Copy link

escodel commented Nov 18, 2023

Bumping for @wbt and any maintainers for input on the above thanks 🙏

@wbt
Copy link
Contributor

wbt commented Dec 6, 2023

It seems like a reasonable approach. My main point is that we should try to avoid reinventing the wheel (and having to maintain the reinvention) to whatever extent things have already been done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Request for new functionality to support use cases not already covered Needs Investigation
Projects
None yet
Development

No branches or pull requests

4 participants