Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the compressed artifacts #27

Closed
drahnr opened this issue Feb 14, 2021 · 8 comments · Fixed by #32
Closed

Cache the compressed artifacts #27

drahnr opened this issue Feb 14, 2021 · 8 comments · Fixed by #32
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@drahnr
Copy link
Contributor

drahnr commented Feb 14, 2021

In order to be able to include the .bin artifacts in a repository and craft releases / publish with cargo the sources may not be larger than 10MB or failures like:

error: api errors (status 200 OK): max upload size is: 10485760

will pop up.

The simplest path is to cache the compressed artifacts rather than the uncompressed and decompress at runtime. An optional builder API could be used to load compressed or decompressed .bin variants.

@bminixhofer
Copy link
Owner

bminixhofer commented Feb 14, 2021

cache the compressed artifacts rather than the uncompressed and decompress at runtime. An optional builder API could be used to load compressed or decompressed .bin variants.

Definitely! This is quite straightforward to do. Also confirms to me that the build crate was the way to go, this would've caused some issues with the previous approach.

Until this is implemented you can work around this by manually zipping all outputs:

build.rs

fn main() {
    println!("cargo:rerun-if-changed=build.rs");

    let builder = nlprule_build::BinaryBuilder::new(
        Some(&["en"]),
        std::env::var("OUT_DIR").expect("OUT_DIR is set when build.rs is running"),
    )
    .build()
    .validate();

    for output in builder.outputs() {
        // `output` is the full path to each binary the builder created, you can gzip here
    }
}

main.rs

// [...]
let mut rules_bytes: &'static [u8] = include_bytes!(concat!(
    env!("OUT_DIR"),
    "/",
    concat!(rules_filename!("en"), ".gz")
));
// [...]

nlprule-build will get a compress option to handle this. One thing that is not trivial is how / if the rules_filename!("en") should switch between compressed / not compressed. rules_filename!("en", compressed) (or rules_filename!("en", compressed=true)) would be nice but I'm not sure if the macro syntax allows that.

@bminixhofer bminixhofer added enhancement New feature or request good first issue Good for newcomers labels Feb 14, 2021
@bminixhofer
Copy link
Owner

I will call this .gzip instead of .compress so that concat!(rules_filename!("en"), ".gz") is unambigous and there is no need to change the filename macro.

By the way, I saw you where committing some of the nlprule binaries in the cache to the cargo-spellcheck repo which is of course not ideal. Do you still have problems with downloading from GH releases? I could add a max_retries config option to hopefully fix that if it is still an issue (I myself unfortunately was never able to reproduce any timeouts).

@drahnr
Copy link
Contributor Author

drahnr commented Feb 16, 2021

I will call this .gzip instead of .compress so that concat!(rules_filename!("en"), ".gz") is unambigous and there is no need to change the filename macro.

That's one option, I would prefer to be able to pass an arbitrary compressor and define the file extension that way. Mostly motivated by making the files contained in the repository as small as possible without needing upstream changes.

By the way, I saw you where committing some of the nlprule binaries in the cache to the cargo-spellcheck repo which is of course not ideal. Do you still have problems with downloading from GH releases? I could add a max_retries config option to hopefully fix that if it is still an issue (I myself unfortunately was never able to reproduce any timeouts).

I do not have any errors at this point and I think it was only a version mismatch between git and the last release before 0.4.x releases in the first place.
Adding the files just makes the whole repo self contained, ideally I want to make them part of crates.io releases (hence this issue) to be able to have a self contained repository that does not required network access other than the initial git pull or crates-io checkout. This is also partially motivated with rather frequent partial github outages. Note that the initial clone might not necessarily from github or the network.

@bminixhofer
Copy link
Owner

pass an arbitrary compressor and define the file extension that way

Oh right that's better! Just having file extension as the second argument and passing that same file extension to concat!(..). I didn't think of that, let's do it that way.

I do not have any errors at this point and I think it was only a version mismatch between git and the last release before 0.4.x

Great, I won't add any retry logic then until someone has issues.

@drahnr
Copy link
Contributor Author

drahnr commented Feb 16, 2021

This seems like the one (blocking) crate that covers a lot of relevant algorithms https://lib.rs/crates/smush

@bminixhofer
Copy link
Owner

There is now an implementation of this: #32. I didn't want to commit to using smush so I added a generic postprocess method. It is still reasonably concise e. g.:

BinaryBuilder::new(Some(&["en"]), tempdir)
    .build()
    .validate()
    .postprocess(
        |buffer, mut writer| {
            Ok(writer.write_all(&smush::encode(
                &buffer,
                smush::Codec::Gzip,
                smush::Quality::Default,
            )?)?)
        },
        |p| {
            let mut path = p.as_os_str().to_os_string();
            path.push(".gz");
            path
        },
    )?;

What do you think?

@drahnr
Copy link
Contributor Author

drahnr commented Feb 16, 2021

That's even better!

@bminixhofer
Copy link
Owner

This works as of v0.4.4 now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants