Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: track version in fingerprint dep-info files #14751

Merged
merged 3 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions crates/cargo-test-support/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1597,6 +1597,12 @@ pub fn assert_deps(project: &Project, fingerprint: &str, test_cb: impl Fn(&Path,
assert!(files.next().is_none(), "expected only 1 dep-info file");
let dep_info = fs::read(&info_path).unwrap();
let dep_info = &mut &dep_info[..];

// Consume the magic marker and version. Here they don't really matter.
read_usize(dep_info);
read_u8(dep_info);
read_u8(dep_info);

let deps = (0..read_usize(dep_info))
.map(|_| {
let ty = read_u8(dep_info);
Expand Down
152 changes: 147 additions & 5 deletions src/cargo/core/compiler/fingerprint/dep_info.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ use cargo_util::Sha256;
use crate::CargoResult;
use crate::CARGO_ENV;

/// The current format version of [`EncodedDepInfo`].
const CURRENT_ENCODED_DEP_INFO_VERSION: u8 = 1;

/// The representation of the `.d` dep-info file generated by rustc
#[derive(Default)]
pub struct RustcDepInfo {
Expand Down Expand Up @@ -61,20 +64,36 @@ pub enum DepInfoPathType {
/// Currently the format looks like:
///
/// ```text
/// +------------+------------+---------------+---------------+
/// | # of files | file paths | # of env vars | env var pairs |
/// +------------+------------+---------------+---------------+
/// +--------+---------+------------+------------+---------------+---------------+
/// | marker | version | # of files | file paths | # of env vars | env var pairs |
/// +--------+---------+------------+------------+---------------+---------------+
/// ```
///
/// Each field represents
///
/// * _Marker_ --- A magic marker to ensure that older Cargoes, which only
/// recognize format v0 (prior to checksum support in [`f4ca7390`]), do not
/// proceed with parsing newer formats. Since [`EncodedDepInfo`] is merely
/// an optimization, and to avoid adding complexity, Cargo recognizes only
/// one version of [`CURRENT_ENCODED_DEP_INFO_VERSION`].
/// The current layout looks like this
/// ```text
/// +----------------------------+
/// | [0x01 0x00 0x00 0x00 0xff] |
/// +----------------------------+
/// ```
/// These bytes will be interpreted as "one file tracked and an invalid
/// [`DepInfoPathType`] variant with 255" by older Cargoes, causing them to
/// stop parsing. This could prevent problematic parsing as noted in
/// rust-lang/cargo#14712.
/// * _Version_ --- The current format version.
/// * _Number of files/envs_ --- A `u32` representing the number of things.
/// * _File paths_ --- Zero or more paths of files the dep-info file depends on.
/// Each path is encoded as the following:
///
/// ```text
/// +-----------+-------------+------------+---------------+-----------+-------+
/// | Path type | len of path | path bytes | cksum exists? | file size | cksum |
/// | path type | len of path | path bytes | cksum exists? | file size | cksum |
/// +-----------+-------------+------------+---------------+-----------+-------+
/// ```
/// * _Env var pairs_ --- Zero or more env vars the dep-info file depends on.
Expand All @@ -84,7 +103,9 @@ pub enum DepInfoPathType {
/// | len of key | key bytes | value exists? | len of value | value bytes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have a sanity-check value on "len of key" We likely should never have a dep info key that is larger than 10k for instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and ture for any length we deal with)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. What if they do track that many environment variables?

An allocation error seems reasonable in that case. Cargo may crash in other places even it doesn't here.

Copy link
Member Author

@weihanglo weihanglo Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#method.try_with_capacity.
(not yet stable though)

though somehow the core dumpe happened with the first push, not when calling with_capacity.

/// +------------+-----------+---------------+--------------+-------------+
/// ```
#[derive(Default)]
///
/// [`f4ca7390`]: https://github.com/rust-lang/cargo/commit/f4ca739073185ea5e1148ff100bb4a06d3bf721d
#[derive(Default, Debug, PartialEq, Eq)]
pub struct EncodedDepInfo {
pub files: Vec<(DepInfoPathType, PathBuf, Option<(u64, String)>)>,
pub env: Vec<(String, Option<String>)>,
Expand All @@ -93,6 +114,12 @@ pub struct EncodedDepInfo {
impl EncodedDepInfo {
pub fn parse(mut bytes: &[u8]) -> Option<EncodedDepInfo> {
let bytes = &mut bytes;
read_magic_marker(bytes)?;
let version = read_u8(bytes)?;
if version != CURRENT_ENCODED_DEP_INFO_VERSION {
return None;
}

let nfiles = read_usize(bytes)?;
let mut files = Vec::with_capacity(nfiles);
for _ in 0..nfiles {
Expand Down Expand Up @@ -129,6 +156,18 @@ impl EncodedDepInfo {
}
return Some(EncodedDepInfo { files, env });

/// See [`EncodedDepInfo`] for why a magic marker exists.
fn read_magic_marker(bytes: &mut &[u8]) -> Option<()> {
let _size = read_usize(bytes)?;
let path_type = read_u8(bytes)?;
if path_type != u8::MAX {
// Old depinfo. Give up parsing it.
None
} else {
Some(())
}
}

fn read_usize(bytes: &mut &[u8]) -> Option<usize> {
let ret = bytes.get(..4)?;
*bytes = &bytes[4..];
Expand Down Expand Up @@ -162,6 +201,10 @@ impl EncodedDepInfo {
pub fn serialize(&self) -> CargoResult<Vec<u8>> {
let mut ret = Vec::new();
let dst = &mut ret;

write_magic_marker(dst);
dst.push(CURRENT_ENCODED_DEP_INFO_VERSION);

write_usize(dst, self.files.len());
for (ty, file, checksum_info) in self.files.iter() {
match ty {
Expand Down Expand Up @@ -189,6 +232,14 @@ impl EncodedDepInfo {
}
return Ok(ret);

/// See [`EncodedDepInfo`] for why a magic marker exists.
///
/// There is an assumption that there is always at least a file.
fn write_magic_marker(dst: &mut Vec<u8>) {
write_usize(dst, 1);
dst.push(u8::MAX);
}

fn write_bytes(dst: &mut Vec<u8>, val: impl AsRef<[u8]>) {
let val = val.as_ref();
write_usize(dst, val.len());
Expand Down Expand Up @@ -614,3 +665,94 @@ pub enum InvalidChecksum {
#[error("expected a string with format \"algorithm=hex_checksum\"")]
InvalidFormat,
}

#[cfg(test)]
mod encoded_dep_info {
use super::*;

#[track_caller]
fn gen_test(checksum: bool) {
let checksum = checksum.then_some((768, "c01efc669f09508b55eced32d3c88702578a7c3e".into()));
let lib_rs = (
DepInfoPathType::TargetRootRelative,
PathBuf::from("src/lib.rs"),
checksum.clone(),
);

let depinfo = EncodedDepInfo {
files: vec![lib_rs.clone()],
env: Vec::new(),
};
let data = depinfo.serialize().unwrap();
assert_eq!(EncodedDepInfo::parse(&data).unwrap(), depinfo);

let mod_rs = (
DepInfoPathType::TargetRootRelative,
PathBuf::from("src/mod.rs"),
checksum.clone(),
);
let depinfo = EncodedDepInfo {
files: vec![lib_rs.clone(), mod_rs.clone()],
env: Vec::new(),
};
let data = depinfo.serialize().unwrap();
assert_eq!(EncodedDepInfo::parse(&data).unwrap(), depinfo);

let depinfo = EncodedDepInfo {
files: vec![lib_rs, mod_rs],
env: vec![
("Gimli".into(), Some("Legolas".into())),
("Beren".into(), Some("Lúthien".into())),
],
};
let data = depinfo.serialize().unwrap();
assert_eq!(EncodedDepInfo::parse(&data).unwrap(), depinfo);
}

#[test]
fn round_trip() {
gen_test(false);
}

#[test]
fn round_trip_with_checksums() {
gen_test(true);
}

#[test]
fn path_type_is_u8_max() {
#[rustfmt::skip]
let data = [
0x01, 0x00, 0x00, 0x00, 0xff, // magic marker
CURRENT_ENCODED_DEP_INFO_VERSION, // version
0x01, 0x00, 0x00, 0x00, // # of files
0x00, // path type
0x04, 0x00, 0x00, 0x00, // len of path
0x72, 0x75, 0x73, 0x74, // path bytes ("rust")
0x00, // cksum exists?
0x00, 0x00, 0x00, 0x00, // # of env vars
];
// The current cargo doesn't recognize the magic marker.
assert_eq!(
EncodedDepInfo::parse(&data).unwrap(),
EncodedDepInfo {
files: vec![(DepInfoPathType::PackageRootRelative, "rust".into(), None)],
env: Vec::new(),
}
);
}

#[test]
fn parse_v0_fingerprint_dep_info() {
#[rustfmt::skip]
let data = [
0x01, 0x00, 0x00, 0x00, // # of files
0x00, // path type
0x04, 0x00, 0x00, 0x00, // len of path
0x72, 0x75, 0x73, 0x74, // path bytes: "rust"
0x00, 0x00, 0x00, 0x00, // # of env vars
];
// Cargo can't recognize v0 after `-Zchecksum-freshess` added.
assert!(EncodedDepInfo::parse(&data).is_none());
}
}