Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 'yaml.safe_load' in 'load_yaml_from_docstring' #278

Closed
lbeaufort opened this issue Sep 4, 2018 · 6 comments
Closed

Use 'yaml.safe_load' in 'load_yaml_from_docstring' #278

lbeaufort opened this issue Sep 4, 2018 · 6 comments

Comments

@lbeaufort
Copy link
Contributor

lbeaufort commented Sep 4, 2018

Hi there! 👋Thanks for providing a great project.

Because yaml.load() has known security issues, would it be feasible to use the existing yaml.safe_load() option instead of yaml.load() here for load_yaml_from_docstring? I'm happy to put in a PR if you agree. It looks like Ansible has been using only yaml.safe_load() since v1.1.

It doesn't look like there's been much activity by PyYAML to move forward with making load() safe by default. Here is the pending PyYAML 4.2 release plan: yaml/pyyaml#193

The PyYAML security vulnerability is being flagged for our FEC API.

Please let me know if you have any questions, and thanks!
Laura

@sloria
Copy link
Member

sloria commented Sep 6, 2018

Thanks for the report!

I think you're right re: using safe_load(). I'd certainly review and merge a PR for this.

If you don't have time right now, I can take a look this weekend.

@lafrech
Copy link
Member

lafrech commented Sep 6, 2018

Sorry, I had a look the other day because we had a warning from pyup.io/safety-ci and forgot to comment here.

Here's my understanding (can be incorrect / incomplete):

pyyaml has had two ways to load yaml docstrings for a while, the normal and the safe. The safe is meant to be used when loading yaml strings from unknown sources, because the normal allows code injection. Since users overlooked this and used the normal way to parse input data, they decided it was safer to make safe the default. This was done in 4.1 thanks to yaml/pyyaml#74. But then for some reasons, this lead to issues (yaml/pyyaml#187) and the 4.1 was sort of retracted and the commit was reverted (yaml/pyyaml#194) and they hope to find a nice way out for 4.2 (yaml/pyyaml#193).

Meanwhile, a CVS was reported to account for the fact that default load is not safe, which had been the case since the beginning. It was a known issue, since safe_load existed to address that.

And now, people using dependencies security issues tracking solutions get warnings because yaml.load is not safe, which is nothing new.

In our case, the docstring being parsed is not external data, it is written by the developer of the application, so I see no reason to use safe_load.

This said, I don't know exactly the downsides of using safe_load. It may have constraints, and induce a performance penalty. Assuming we're not affected by the constraints and since the time penalty is not an issue (it must be minimal, and the code is typically called only on application startup), then we could switch to safe_load.

Yet, assuming I understood everything correctly, this would only serve the purpose of silencing a spurious security warning.

I may be wrong and if it is simpler to use safe_load than getting to the bottom of this and/or silence those warnings, then I have no objection to this change.

@sloria
Copy link
Member

sloria commented Sep 6, 2018

Good points, @lafrech . @lbeaufort Do you know if there are trade-offs with using safe_load vs. load?

@lbeaufort
Copy link
Contributor Author

Thanks for your replies, @sloria and @lafrech. Everything I've read has indicated that it's best to use safe_load when possible, but I'm far from an expert on arbitrary code execution. I'm not aware of any trade-offs with using safe_load, but I'm not as familiar with your project. I defer to your judgement, and am happy to put in a PR if that's what you decide to do.

@sloria
Copy link
Member

sloria commented Sep 7, 2018

From my quick search through the issues @lafrech posted and some other sources, there doesn't appear to be any downside to using safe_load. In practice, it shouldn't make much of a difference since we're not loading data from external sources, but it doesn't hurt to be too safe in this case.

@lbeaufort Would you like to send the PR?

@lbeaufort
Copy link
Contributor Author

@sloria sounds good! I will put one in today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@lafrech @sloria @lbeaufort and others