Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce extension validity at parse time #5888

Merged
merged 2 commits into from
Aug 9, 2024
Merged

Enforce extension validity at parse time #5888

merged 2 commits into from
Aug 9, 2024

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Aug 7, 2024

Summary

This PR adds a DistExtension field to some of our distribution types, which requires that we validate that the file type is known and supported when parsing (rather than when attempting to unzip). It removes a bunch of extension parsing from the code too, in favor of doing it once upfront.

Closes #5858.

@charliermarsh charliermarsh added the internal A refactor or improvement that is not user-facing label Aug 7, 2024
@charliermarsh
Copy link
Member Author

I think this is a good change but it's more aggressive. It means we might error if a registry serves distribution kinds that we don't support, even if we'd never read them.

@charliermarsh
Copy link
Member Author

Oh nevermind, we wouldn't fail in that case due to some other gating.

+ tqdm==4.66.0 (from https://files.pythonhosted.org/packages/a5/d6/502a859bac4ad5e274255576cd3e15ca273cdb91731bc39fb840dd422ee9/tqdm-4.66.0-py3-none-any.whl)
+ urllib3==2.2.1 (from https://files.pythonhosted.org/packages/a2/73/a68704750a7679d0b6d3ad7aa8d4da8e14e151ae82e6fee774e6e0d05ec8/urllib3-2.2.1-py3-none-any.whl)
error: Failed to parse metadata from built wheel
Caused by: Expected direct URL dependency to include an extension: `https://example.org/does/not/exist`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say "file extension"? Why did this succeed previously?

Copy link
Member

@zanieb zanieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically down for this behavior.

@charliermarsh charliermarsh force-pushed the charlie/ext branch 2 times, most recently from 34c73c0 to c569843 Compare August 7, 2024 21:54
Deserialize,
rkyv::Archive,
rkyv::Deserialize,
rkyv::Serialize,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing definition was:

#[derive(
    Clone,
    Debug,
    PartialEq,
    Eq,
    Serialize,
    Deserialize,
    rkyv::Archive,
    rkyv::Deserialize,
    rkyv::Serialize,
)]
#[archive(check_bytes)]
#[archive_attr(derive(Debug))]
pub enum SourceDistExtension {
    Zip,
    TarGz,
    TarBz2,
}

I think this is safe without a cache bump? @BurntSushi?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmmmmmm. Likely fine, but I actually do not believe it is guaranteed. rkyv doesn't have great docs on what kinds of changes are compatible with existing serialized data. I would generally assume that any change requires a cache bump, even if in some cases, it isn't in practice required.

@charliermarsh charliermarsh force-pushed the charlie/ext branch 2 times, most recently from f394275 to 2407eea Compare August 7, 2024 22:24
@charliermarsh charliermarsh marked this pull request as ready for review August 7, 2024 22:26
{
Ok(Self::TarXz)
}
"zst"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below we serialize this as zstd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's wrong!

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like this!

Deserialize,
rkyv::Archive,
rkyv::Deserialize,
rkyv::Serialize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find where rkyv was used with DistExtension, but couldn't. Do we need rkyv support for DistExtension? If not, I'd drop the trait impls and archive annotations below.

Deserialize,
rkyv::Archive,
rkyv::Deserialize,
rkyv::Serialize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmmmmmm. Likely fine, but I actually do not believe it is guaranteed. rkyv doesn't have great docs on what kinds of changes are compatible with existing serialized data. I would generally assume that any change requires a cache bump, even if in some cases, it isn't in practice required.

@charliermarsh charliermarsh force-pushed the charlie/ext branch 3 times, most recently from 17ef91a to ffb1e23 Compare August 8, 2024 17:05
@charliermarsh
Copy link
Member Author

I bumped the rkyv cache, to be conservative.

@charliermarsh charliermarsh merged commit 21408c1 into main Aug 9, 2024
57 checks passed
@charliermarsh charliermarsh deleted the charlie/ext branch August 9, 2024 01:39
zanieb pushed a commit that referenced this pull request Aug 9, 2024
This PR adds a `DistExtension` field to some of our distribution types,
which requires that we validate that the file type is known and
supported when parsing (rather than when attempting to unzip). It
removes a bunch of extension parsing from the code too, in favor of
doing it once upfront.

Closes #5858.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal A refactor or improvement that is not user-facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uv add https://.../foo.git throws a confusing error
4 participants