-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Privacy concern: user data is being sneakily collected #415
Comments
what is the easy way to disable it? (beside comments those lines..) |
|
Created a pull request: #423 |
Thanks for raising concern! Especially the broken link to terms of use. Fixing asap. The terms are very short so should be easy to read. As for prompts being recorded: This was discussed publicly in discord #general during the weekend. Since openai is doing this already we did not see an issue. Could anyone steelman the argument here for why a 20 page openai ToS stating that data is collected is fine but the terms of use here are not explicit enough? |
|
Stealth addition of MITM spyware would constitute an issue under the code of conduct or in any professional setting when dealing with likely commercial in confidence; trade secrets, IP, internal processes, etc. https://github.com/AntonOsika/gpt-engineer/blob/main/.github/CODE_OF_CONDUCT.md |
I appreciate the open discussion about this. For the bulk of the issue I'm definitely to blame here: We wrote a terms of use, explaining the data collection, and linked to it from the README when performing the telemetry update. I rushed getting this out, and did not add the terms of use to version control. We do not have any "automatic tests" for broken links, as we have for code, and it was merged before I caught it. Many parts of the negative reaction, apart for my huge blunder with the broken link (reaction on this is warranted) stand out to me as overly polarised against what is pretty standard product analytics. Two main reasons:
I am committed to doing what is best for the community here. This means striking the right balance of not invading privacy and building a useful tool. Without getting feedback, per default, on how well this tool works for users it is very difficult to do a good job on improving it. My experience before this issue was created was that: very few people are protective of sharing their prompts with external services (consider all the GPT chrome extension etc out there, where I'm many also share IP, fingerprint, etc). ConclusionsAppreciate your contribution @Gamekiller48 and everyone. I know everyone here wants what is best for the users. I will merge your PR @Gamekiller48 (opt out -> opt in PR). If someone, in addition, could make a PR to make the "CLI review flow" ask "is OK to send data", that would be great. Furthermore, as a follow-up here, I will ensure there is further review on what is the right policy for an application like this, and that there is an informed decision with input from experts and those with opinions from different sides. I will post in this issue again with the final conclusions, including if we decide to change the stance on data collection. In this way everyone subscribed to this issue will become notified. Personal note I ask from the community to get support in building something useful and open source, and get constructive contributions. Such as PRs to address concerns and improve the tool for everyone (some in this thread are role models here). |
When using the OpenAI models through the API (as this project does), OpenAI explicitly does not collect user data to further train their models:
Though they do retain data for 30 days in case of abuse and misuse, which is completely different from retaining data to improve the service:
|
I don't know if I would call it sneaky. I mean this repo has had so many things changed in the past 30 days, it is easy for a broken link to get overlooked. |
#471 is merged to explicitly ask for consent |
GPT Engineer collects user data, namely user prompts among other metadata. This fact is not mentioned in the README, nor is it mentioned that you can opt out by setting the
COLLECT_LEARNINGS_OPT_OUT
environment variable. The ToS link in the README is broken. Therefore, the only way for users to be aware of their data being sneakily collected is to read the code.I consider this to be a violation of users' privacy, and propose one of the following to be implemented as soon as possible:
gpt-engineer/gpt_engineer/collect.py
Line 25 in 0596b07
The text was updated successfully, but these errors were encountered: