Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building from a local sdist file url broken in 0.21.0 #1045

Closed
beenje opened this issue Sep 4, 2024 · 46 comments
Closed

Building from a local sdist file url broken in 0.21.0 #1045

beenje opened this issue Sep 4, 2024 · 46 comments

Comments

@beenje
Copy link
Contributor

beenje commented Sep 4, 2024

Support for local source file url scheme was added in #177 and working in version 0.5.0.

I hadn't tested that in a while. When trying to build a recipe using a local file as source with rattler-build 0.21.0, it fails.

Issue can be reproduced with:

context:
  version: "13.4.2"

package:
  name: "rich"
  version: ${{ version }}

source:
  - url: file:///tmp/rich/rich-13.4.2.tar.gz
    sha256: d653d6bccede5844304c605d5aac802c7cf9621efd700b46c7ec2b51ea914898

build:
  # Thanks to `noarch: python` this package works on all platforms
  noarch: python
  script:
    - python -m pip install . -vv --no-deps --no-build-isolation

requirements:
  host:
    - pip
    - poetry-core >=1.0.0
    - python 3.10
  run:
    # sync with normalized deps from poetry-generated setup.py
    - markdown-it-py >=2.2.0
    - pygments >=2.13.0,<3.0.0
    - python 3.10
    - typing_extensions >=4.0.0,<5.0.0

tests:
  - python:
      imports:
        - rich
      pip_check: true

about:
  homepage: https://github.com/Textualize/rich
  license: MIT
  license_file: LICENSE
  summary: Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal
  description: |
    Rich is a Python library for rich text and beautiful formatting in the terminal.

    The Rich API makes it easy to add color and style to terminal output. Rich
    can also render pretty tables, progress bars, markdown, syntax highlighted
    source code, tracebacks, and more — out of the box.
  documentation: https://rich.readthedocs.io
  repository: https://github.com/Textualize/rich
$ rattler-build build
...
 ╭─ Running build for recipe: rich-13.4.2-pyh4616a5c_0
 │
 │ ╭─ Fetching source code
 │ │ Validated SHA256 values of the downloaded file!
 │ │ Using local source file.
 │ │ Copying source from url: "/tmp/rich/rich-13.4.2.tar.gz" to "/tmp/rich/output/bld/rattler-build_rich_1725435886/work"
...
 │ ╭─ Running build script
 │ │ + python -m pip install . -vv --no-deps --no-build-isolation
 │ │ Using pip 24.2 from $PREFIX/lib/python3.10/site-packages/pip (python 3.10)
 │ │ Non-user install because user site-packages disabled
 │ │ Ignoring indexes: https://pypi.org/simple
 │ │ Created temporary directory: /tmp/pip-build-tracker-l3rer2po
 │ │ Initialized build tracking at /tmp/pip-build-tracker-l3rer2po
 │ │ Created build tracker: /tmp/pip-build-tracker-l3rer2po
 │ │ Entered build tracker: /tmp/pip-build-tracker-l3rer2po
 │ │ Created temporary directory: /tmp/pip-install-149wgmke
 │ │ ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
...
$ ls /tmp/rich/output/bld/rattler-build_rich_1725435886/work
build_env.sh  conda_build.sh  rich-13.4.2.tar.gz

The local file was copied to the work directory but wasn't unarchived.

@wolfv
Copy link
Member

wolfv commented Sep 4, 2024

Thanks! There are a few workarounds, of course (e.g. making pip unarchive the file). I would also be interested if path: /tmp/rich-.tar.gz works differently?

Lastly, I do think you are right and this file should be un-archived to have the same behavior as fetching from a URL.

@beenje
Copy link
Contributor Author

beenje commented Sep 4, 2024

It's working with path: :-)

 ╭─ Running build for recipe: rich-13.4.2-pyh4616a5c_0
 │
 │ ╭─ Fetching source code
 │ │ Fetching source from path: "/tmp/rich/rich-13.4.2.tar.gz"
 │ │ Extracted to "/tmp/rich/output/bld/rattler-build_rich_1725441621/work"
 │ │
 │ ╰─────────────────── (took 0 seconds)

We can see in the logs that it is extracted.

Using path: instead of url: file:// is fine for me.

Would still be nice to fix the file url behaviour as you said.

@beenje beenje changed the title Building from a local sdist broken in 0.21.0 Building from a local sdist file url broken in 0.21.0 Sep 4, 2024
@DimitriPapadopoulos
Copy link

DimitriPapadopoulos commented Oct 11, 2024

Same here, but unfortunately path: also fails:

  • An url: key followed by a file:// URL fails to build:

    source:
      url: file:///path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
    rattler-build just copies the ZIP file, does not unarchive it
     │ ╭─ Fetching source code
     │ │ Validated SHA256 values of the downloaded file!
     │ │ Using local source file.
     │ │ Copying source from url: "/path/to//matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip" to "/tmp/channel/bld
     │ │ /rattler-build_matlab-runtime_1728648393/work"
     │ │
     │ ╰─────────────────── (took 69 seconds)
    
  • An url: key followed by an https:// URL works just fine:

    source:
      url: https://ssd.mathworks.com/supportfiles/downloads/R2019b/Release/9/deployment_files/installer/complete/glnxa64/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
    rattler-build unarchives the ZIP file
     │ ╭─ Fetching source code
     │ │ Validated SHA256 values of the downloaded file!
     │ │ Found valid source cache file.
     │ │ Using extracted directory from cache: "/tmp/channel/src_cache/MATLAB_Runtime_R2019b_Update_9_glnxa64_d213e296"
     │ │ Copying source from url: "/tmp/channel/src_cache/MATLAB_Runtime_R2019b_Update_9_glnxa64_d213e296" to "/tmp/channel/bld/rattler-
     │ │ build_matlab-runtime_1728648935/work"
     │ │
     │ ╰─────────────────── (took 32 seconds)
    
  • A path: key initially seemed to work equally fine, but rattler-build keeps unarchiving forever:

      source:
        path: /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
    rattler-build attempts to unarchive the ZIP file, but unzipping lasts forever...
     │ ╭─ Fetching source code
     │ │ Fetching source from path: "/path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip"
     │ │ ⠤ Extracting zip       [00:04:53] [━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╾──────] 2.16 GiB @ 7.56 MiB/s  
    

The issue is probably might be that rattler-build is unable to handle ZIP files larger than 2 GB:

$ ls -lh /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
-rwxrwx---+ 1 username nogroup 2.6G Aug 12  2021 /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
$ 

@wolfv
Copy link
Member

wolfv commented Oct 11, 2024

@DimitriPapadopoulos thank you for the detailed write-up!

4:53 doesn't sound like forever to me. Also the indicator is still going at 7.50 MiB/s. I am wondering if it's just slow? Do you have a reference for how long it should take to extract?

Ah, I see that in the URL case it takes only 30 seconds so something is wrong. I'll have to take a look.

@DimitriPapadopoulos
Copy link

While /path/to is indeed on a network (NFS) share, our workstations have 1 Gb/s network interfaces and our storage infrastructure is a CephFS cluster with quite decent throughput:

$ rsync --progress /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip /tmp/
MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
  2,786,688,287 100%  448.76MB/s    0:00:05 (xfr#1, to-chk=0/1)
$ 

@DimitriPapadopoulos
Copy link

I'm not used to building/running Rust programs, but chances are function extract_zip stalls in our context:

extract_zip
/// `.zip` files archived with compression other than deflate would fail.
pub(crate) fn extract_zip(
    archive: impl AsRef<Path>,
    target_direcextract_ziptory: impl AsRef<Path>,
    log_handler: &LoggingOutputHandler,
) -> Result<(), SourceError> {
    let archive = archive.as_ref();
    let target_directory = target_directory.as_ref();
    fs::create_dir_all(target_directory)?;

    let len = archive.metadata().map(|m| m.len()).unwrap_or(1);
    let progress_bar = log_handler.add_progress_bar(
        indicatif::ProgressBar::new(len)
            .with_finish(indicatif::ProgressFinish::AndLeave)
            .with_prefix("Extracting zip")
            .with_style(log_handler.default_bytes_style()),
    );

    let mut archive = zip::ZipArchive::new(progress_bar.wrap_read(
        File::open(archive).map_err(|_| SourceError::FileNotFound(archive.to_path_buf()))?,
    ))
    .map_err(|e| SourceError::InvalidZip(e.to_string()))?;

    let tmp_extraction_dir = tempfile::Builder::new().tempdir_in(target_directory)?;
    archive
        .extract(&tmp_extraction_dir)
        .map_err(|e| SourceError::ZipExtractionError(e.to_string()))?;

    move_extracted_dir(tmp_extraction_dir.path(), target_directory)?;
    progress_bar.finish_with_message("Extracted...");

    Ok(())
}

Could it be that MATLAB_Runtime_R2019b_Update_9_glnxa64.zip is "archived with compression other than deflate"?

@wolfv
Copy link
Member

wolfv commented Oct 11, 2024

Would you be able to try with the file on the same filesystem? It could be related to NFS, after all.

@DimitriPapadopoulos
Copy link

Will try next week.

By the way, the compression method is either defX or stor for all entries in the ZIP file, nothing exotic here:

$ zipinfo -l /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip | grep -v -e ' defX '  -e ' stor '
Archive:  /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
Zip file size: 2786688287 bytes, number of entries: 5487
5487 files, 2989357849 bytes uncompressed, 2785399227 bytes compressed:  6.8%
$ 

@DimitriPapadopoulos
Copy link

DimitriPapadopoulos commented Oct 11, 2024

My workstation was updated from Ubuntu 22.04 to Ubuntu 24.04 a few days ago. I wonder whether a filesystem issue could plague it. After "heavy use" (typically running rattler-build to build from simple but large sources) Google Chrome starts complaining (without reason) about invalid site certificates or identifies other sites as non-existent. I couldn't find anything suspicious in the system logs. I will try on a machine still running Ubuntu 22.04, this might be totally unrelated to rattler-build — could be a Linux kernel bug.

@wolfv
Copy link
Member

wolfv commented Oct 11, 2024

That sounds strange. rattler-build itself should not modify anything system-wide. Of course, I don't know what the build scripts are doing.

@DimitriPapadopoulos
Copy link

Oh, I mean it wouldn't be a rattler-build issue, rather a Linux kernel bug triggered by something specific to rattler-build operation, perhaps manipulating lots of hardlinks.

@DimitriPapadopoulos
Copy link

The scripts are very simple, they just unzip and don't event test. For example:
https://github.com/neurospin/neuro-forge/pull/15/files

@DimitriPapadopoulos
Copy link

DimitriPapadopoulos commented Oct 14, 2024

My issue was probably a Linux kernel issue, or more generally a system issue. Today, ZIP extraction works just fine, either from the local file system:

 │ ╭─ Fetching source code
 │ │ Fetching source from path: "/tmp/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip"
 │ │ Extracted zip to "/tmp/channel/bld/rattler-build_matlab-runtime_1728884183/work"
 │ │
 │ ╰─────────────────── (took 32 seconds)

or the NFS share:

 │ ╭─ Fetching source code
 │ │ Fetching source from path: "/path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip"
 │ │ Extracted zip to "/tmp/channel/bld/rattler-build_matlab-runtime_1728885071/work"
 │ │
 │ ╰─────────────────── (took 31 seconds)

DimitriPapadopoulos added a commit to neurospin/neuro-forge that referenced this issue Oct 29, 2024
Allow multiple versions of the Matlab Runtime

For now, keep the package non-relocatable. The `patchelf` tool fails
with obscure error messages on MATLAB Runtime binaries.

While developping, we retrieve the binary locally because https:// is
too damn slow and NFS breaks rattler-builder:
prefix-dev/rattler-build#1045 (comment)
DimitriPapadopoulos added a commit to neurospin/neuro-forge that referenced this issue Oct 29, 2024
Allow multiple versions of the Matlab Runtime

For now, keep the package non-relocatable. The `patchelf` tool fails
with obscure error messages on MATLAB Runtime binaries.

While developing, we retrieve from local disk because retrieving the
binary from MathWorks usign HTTPS is damn slow and NFS breaks rattler-builder:
prefix-dev/rattler-build#1045 (comment)
@DimitriPapadopoulos
Copy link

DimitriPapadopoulos commented Oct 29, 2024

Unfortunately, I am again having freezing issues with path: pointing to an NFS share. Yet, unzipping from that same NFS share works without problem:

$ time unzip /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
Archive:  /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
  inflating: sys/os/glnxa64/libgcc_s.so.1  
  inflating: sys/os/glnxa64/README.libstdc++  
    linking: sys/os/glnxa64/libstdc++.so.6  -> libstdc++.so.6.0.22 
  inflating: sys/os/glnxa64/libstdc++.so.6.0.22  
 extracting: sys/java/jre/glnxa64/jre/LICENSE  
 extracting: sys/java/jre/glnxa64/jre/bin/ControlPanel  
 .
 .
 .
 .
 .
  inflating: productdata/35212.txt   
finishing deferred symbolic links:
  sys/os/glnxa64/libstdc++.so.6 -> libstdc++.so.6.0.22
  bin/glnxa64/libcrypto.so.1 -> libcrypto-mw.so.1.1
  bin/glnxa64/libssl.so.1 -> libssl-mw.so.1.1

real	0m28,605s
user	0m24,789s
sys	0m3,682s
$ 

I don't see anything relevant in the system logs.

@wolfv
Copy link
Member

wolfv commented Oct 29, 2024

Hmm, maybe we need to use a BufferReader or something like that somewhere ...

@wolfv
Copy link
Member

wolfv commented Oct 29, 2024

@DimitriPapadopoulos it was indeed missing a BufReader: #1144 - I believe this will help nicely in your case.

@DimitriPapadopoulos
Copy link

@wolfv Thank you very much for looking into this issue. I don't know much about Rust, I understand it provides unbuffered I/O by default and that unbuffered I/O can be slow due to repeated system calls. Yet progress_bar.wrap_read really felt like it was frozen. Any way, I probably won't have time to test a specific commit, but I will make sure to test the next release. Again, than you very much.

@wolfv
Copy link
Member

wolfv commented Oct 30, 2024

@DimitriPapadopoulos - the progress bar is just for showing the progress. The main problem was the unbuffered read which will result in many more system calls and generally be slow. I am very sure that this can be exaggerated by slow disk / NFS filesystems. We already had this optimization for the Tar-file reader but missed it for Zip.

I already made the release so you can try out 0.28.2 whenever you have time. I am quite sure that it should give you a decent improvement :)

@DimitriPapadopoulos
Copy link

DimitriPapadopoulos commented Oct 30, 2024

Just upgraded to 0.28., it's still slow. The throughput shown by the progress bar keeps dropping forever:

 ╭─ Running build for recipe: matlab-runtime-9.7-9-hb0f4dca_0
 │
 │ ╭─ Fetching source code
 │ │ Fetching source from path: /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
 │ │ ⠦ Extracting zip       [00:00:13] [━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╾] 2.54 GiB @ 195.72 MiB/s
 │ │ Fetching source from path: /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
 │ │ ⠉ Extracting zip       [00:01:13] [━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╾─] 2.49 GiB @ 34.93 MiB/s
 │ │ Fetching source from path: /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
 │ │ ⠦ Extracting zip       [00:33:10] [━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╾─────────] 1.95 GiB @ 1.00 MiB/s

@wolfv
Copy link
Member

wolfv commented Oct 30, 2024

argh. Just to be sure - 0.28.2, right?

@DimitriPapadopoulos
Copy link

Yes, it's 0.28.2 (I forgot to copy/paste the output of --version):

$ rattler-build --version
rattler-build 0.28.2
$ 

@DimitriPapadopoulos
Copy link

DimitriPapadopoulos commented Oct 30, 2024

Image

When I start rattler-build, I see:

  • a surge of CPU use (with one proc at 100 %) without much network traffic,
  • then (receiving) network traffic kicks in and eventually oscillates well under 1000 KiB/s and CPU use drops to almost nothing (see screen capture), while the throughput displayed by rattler-build drops drastically,
  • when I forcibly stop rattler-build with Ctrl+C, network traffic immediately drops to 0.

In short, at the system level, network resources are not used as they should. When running unzip /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip, receiving network traffic steadily peaks at ~ 80 MiB/s which is consistent with the 1 Gb/s link of the workstation.

Nothing in the system logs.

Note that file /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip is > 2 GB, but then it's not a problem with path: /tmp/... or url: https://....

@wolfv
Copy link
Member

wolfv commented Oct 30, 2024

Where is your output folder located and the corresponding src_cache folder? Is that also on the network drive?
I am not really sure what we're doing wrong .. I had high hopes for the BufReader! :)

@DimitriPapadopoulos
Copy link

The output dir is /tmp/channel, it's the local disk.

@DimitriPapadopoulos
Copy link

Now about the cache. We used to have home dirs on NFS servers, but that's not the case any more. Besides, even with home dirs on NFS servers, we used to point the environment variable XDG_CACHE_HOME to local disk. Where it gets interesting is that I run rattler-build though the script of a colleague which executes, env HOME=/tmp/channel rattler-build in an effort to make doubly sure the cache is local. Let me try to skim that:

Initial command:

env HOME=/tmp/channel rattler-build build -r /local/disk/recipes/matlab-runtime-9.7 --output-dir /tmp/channel --experimental -c conda-forge -c bioconda

Skimmed down command:

rattler-build build -r /local/disk/recipes/matlab-runtime-9.7 --output-dir /tmp/channel -c conda-forge

Unfortunately it remains as slow as before. I'm not sure how to further investigate. Do you have a Rust code snippet that unzips a file I could try to build and test locally? I wouldn't be suprised if it were a Rust bug.

@DimitriPapadopoulos
Copy link

What does progress_bar.wrap_read really do? Could it be that it somehow adversely affects disk reads?

@wolfv
Copy link
Member

wolfv commented Oct 30, 2024

I kicked off a build that you could try for debugging: #1146 ...

And when you run unzip locally, you also extract to that same /tmp/... folder?

@wolfv
Copy link
Member

wolfv commented Oct 30, 2024

@DimitriPapadopoulos
Copy link

I unzip in /tmp:

$ mkdir /tmp/channel
$ 
$ cd /tmp/channel/
$ 
$ time unzip /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip 
Archive:  /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
  inflating: sys/os/glnxa64/libgcc_s.so.1  
  inflating: sys/os/glnxa64/README.libstdc++  
  .
  .
  .
  inflating: productdata/35212.txt   
finishing deferred symbolic links:
  sys/os/glnxa64/libstdc++.so.6 -> libstdc++.so.6.0.22
  bin/glnxa64/libcrypto.so.1 -> libcrypto-mw.so.1.1
  bin/glnxa64/libssl.so.1 -> libssl-mw.so.1.1

real	0m44,915s
user	0m30,763s
sys	0m6,814s
$ 

@DimitriPapadopoulos
Copy link

DimitriPapadopoulos commented Oct 30, 2024

I do see a x86_64-unknown-linux-musl build, but am not sure how to install/run locally (I am new to Rust). Is it as simple as git clone and cargo build?

EDIT: Ah, just found the binaries.

@DimitriPapadopoulos
Copy link

Here is the output, rattler-build gets stuck there "forever":

$ ~/Downloads/rattler-build-x86_64-unknown-linux-musl/rattler-build build -r /local/disk/recipes/matlab-runtime-9.7 --output-dir /tmp/channel -c conda-forge

 ╭─ Finding outputs from recipe
 │ Found 1 variants
 │ Build variant: matlab-runtime-9.7-9-hb0f4dca_0

 │ ╭─────────────────┬──────────╮
 │ │ Variant         ┆ Version  │
 │ ╞═════════════════╪══════════╡
 │ │ target_platform ┆ linux-64 │
 │ ╰─────────────────┴──────────╯

 ╰─────────────────── (took 0 seconds)

 ╭─ Running build for recipe: matlab-runtime-9.7-9-hb0f4dca_0

 │ ╭─ Fetching source code
 │ │ Fetching source from path: /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
 │ │ Starting zip extraction
 │ │ Zip file size: 2786688287
 │ │ Extracting zip file: "/path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip" to "/tmp/channel/bld/rat
 │ │ tler-build_matlab-runtime-9.7_1730300983/work"

I don't see anything in the extraction directory:

$ ls -a /tmp/channel/bld/rattler-build_matlab-runtime-9.7_1730300983/work/
.
..
$ 

@wolfv
Copy link
Member

wolfv commented Oct 30, 2024

Hmm, this might be completely unrelated, but your version number is also broken. It should not contain a - ...

matlab-runtime-9.7-9-hb0f4dca_0 is illegal because conda expects to have the trailing pieces to be separated by dashes e.g. <name>-<version>-<buildstring>.<ext>. Can you change the version to read _9 instead? I wonder if that has an effect .. probably not. But it does change the folder to which things should get extracted

@DimitriPapadopoulos
Copy link

DimitriPapadopoulos commented Oct 30, 2024

The package name is matlab-runtime-9.7. The version is 9. Is this illegal? Do you really want me to change the version to _9?

EDIT: At least for the sake of debugging, I guess I need to change the package name to matlab-runtime and the version to 9.7.9.

Our "requirement" is to be able to install multiple versions of the MATLAB Runtime package alongside. We need to share the MATLAB Runtime between packages that use the same version of the MATLAB Runtime, because it is really huge - latest versions weigh 4.5 GB. At the same time, not all packages depend on the same version of the MATLAB Runtime. I understand that the proper way to do that is to have each package embark its own MATLAB Runtime instead of depending on a specific version of an external MATLAB Runtime package, but then we end pulling successive dependencies weighing 5 GB each, which my colleagues feel becomes untractable.

@DimitriPapadopoulos
Copy link

Mmmh... Your debug version doesn't work much better using a local copy of MATLAB_Runtime_R2019b_Update_9_glnxa64.zip. It gets stuck "forever" too:

$ ~/Downloads/rattler-build-x86_64-unknown-linux-musl/rattler-build build -r /local/disk/recipes/matlab-runtime-9.7 --output-dir /tmp/channel -c conda-forge

 ╭─ Finding outputs from recipe
 │ Found 1 variants
 │ Build variant: matlab-runtime-9.7-9-hb0f4dca_0

 │ ╭─────────────────┬──────────╮
 │ │ Variant         ┆ Version  │
 │ ╞═════════════════╪══════════╡
 │ │ target_platform ┆ linux-64 │
 │ ╰─────────────────┴──────────╯

 ╰─────────────────── (took 0 seconds)

 ╭─ Running build for recipe: matlab-runtime-9.7-9-hb0f4dca_0

 │ ╭─ Fetching source code
 │ │ Fetching source from path: /tmp/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
 │ │ Starting zip extraction
 │ │ Zip file size: 2786688287
 │ │ Extracting zip file: "/tmp/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip" to "/tmp/channel/bld/rattler-build_matlab-runtime-9.7_17
 │ │ 30301459/work"

@DimitriPapadopoulos
Copy link

Changed the naming/versioning scheme. Same thing, stuck again with a local /tmp/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip "source" file:

$ ~/Downloads/rattler-build-x86_64-unknown-linux-musl/rattler-build build -r /local/disk/recipes/matlab-runtime-9.7 --output-dir /tmp/channel -c conda-forge

 ╭─ Finding outputs from recipe
 │ Found 1 variants
 │ Build variant: matlab-runtime-9.7.9-hb0f4dca_0

 │ ╭─────────────────┬──────────╮
 │ │ Variant         ┆ Version  │
 │ ╞═════════════════╪══════════╡
 │ │ target_platform ┆ linux-64 │
 │ ╰─────────────────┴──────────╯

 ╰─────────────────── (took 0 seconds)

 ╭─ Running build for recipe: matlab-runtime-9.7.9-hb0f4dca_0

 │ ╭─ Fetching source code
 │ │ Fetching source from path: /tmp/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
 │ │ Starting zip extraction
 │ │ Zip file size: 2786688287
 │ │ Extracting zip file: "/tmp/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip" to "/tmp/channel/bld/rattler-build_matlab-runtime_173030
 │ │ 2794/work"

Let me reboot though, just in case.

@wolfv
Copy link
Member

wolfv commented Oct 30, 2024

I started a more isolated repository that we can test things on... https://github.com/wolfv/zippy

You can run this code locally with cargo r -- --help. I am downloading the big zip file right now.

@wolfv
Copy link
Member

wolfv commented Oct 30, 2024

I am looking at a flamegraph and there is a lot of seeking before the extraction starts. I wonder if that's really slow over NFS (as far as I understand this tries to find all the entries of the archive before starting the extraction).

Image

@wolfv
Copy link
Member

wolfv commented Oct 30, 2024

@DimitriPapadopoulos - I think we're not alone: zip-rs/zip2#231 ... Hopefully it gets fixed upstream and we might also be able to dedicate a few cycles to this. We do have a lot of other things on our hands as well though, so I can't make big promises right now. There is a second issue with the zip crate in rattler-build and large archives though (#1147), so we might wanna prioritize this at some point.

Is it possible for you to work around this issue at the moment by e.g. copying the file to your disk before starting the build? Or is it a complete blocker?

@wolfv
Copy link
Member

wolfv commented Oct 31, 2024

Hi @DimitriPapadopoulos - there is a chance that the new builds are much faster: https://github.com/prefix-dev/rattler-build/actions/runs/11607783736 - I pulled in the changes from the PR that I linked above. Would love to hear if that changes things for you and thank you again for all the help debugging this!

@DimitriPapadopoulos
Copy link

For now I do work around this issue by copying the file locally.

Hopefully I understand enough of Rust to run the new branch. I'll spend some limited time today to that task today.
https://github.com/prefix-dev/rattler-build/actions/runs/11607783736

@wolfv
Copy link
Member

wolfv commented Oct 31, 2024

You don't need to learn any rust. Just download teh artifact for your platform, extract and try :) E.g. for linux-64 it would be this one: https://github.com/prefix-dev/rattler-build/actions/runs/11607783736/artifacts/2126986446

@DimitriPapadopoulos
Copy link

DimitriPapadopoulos commented Oct 31, 2024

Good news, I get decent unzipping times using the latest experimental version.

Unzipping MATLAB_Runtime_R2019b_Update_9_glnxa64.zip from the local disk:

$ ~/Downloads/rattler-build-x86_64-unknown-linux-musl/rattler-build build -r /local/disk/recipes/matlab-runtime-9.7 --output-dir /tmp/channel -c conda-forge

 ╭─ Finding outputs from recipe
 │ Found 1 variants
 │ Build variant: matlab-runtime-9.7-9-hb0f4dca_0

 │ ╭─────────────────┬──────────╮
 │ │ Variant         ┆ Version  │
 │ ╞═════════════════╪══════════╡
 │ │ target_platform ┆ linux-64 │
 │ ╰─────────────────┴──────────╯

 ╰─────────────────── (took 0 seconds)

 ╭─ Running build for recipe: matlab-runtime-9.7-9-hb0f4dca_0

 │ ╭─ Fetching source code
 │ │ Fetching source from path: /tmp/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
 │ │ Extracted zip to /tmp/channel/bld/rattler-build_matlab-runtime-9.7_1730380055/work
 │ │
 │ ╰─────────────────── (took 15 seconds)
[...]
$ 

Unzipping MATLAB_Runtime_R2019b_Update_9_glnxa64.zip from the NFS server:

$ ~/Downloads/rattler-build-x86_64-unknown-linux-musl/rattler-build build -r /local/disk/recipes/matlab-runtime-9.7 --output-dir /tmp/channel -c conda-forge

 ╭─ Finding outputs from recipe
 │ Found 1 variants
 │ Build variant: matlab-runtime-9.7-9-hb0f4dca_0

 │ ╭─────────────────┬──────────╮
 │ │ Variant         ┆ Version  │
 │ ╞═════════════════╪══════════╡
 │ │ target_platform ┆ linux-64 │
 │ ╰─────────────────┴──────────╯

 ╰─────────────────── (took 0 seconds)

 ╭─ Running build for recipe: matlab-runtime-9.7-9-hb0f4dca_0

 │ ╭─ Fetching source code
 │ │ Fetching source from path: /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip
 │ │ Extracted zip to /tmp/channel/bld/rattler-build_matlab-runtime-9.7_1730380270/work
 │ │
 │ ╰─────────────────── (took 80 seconds)
[...]
$ 

Note that zip-rs remains largely suboptimal compared to unzip, I would have expected Rust to have a decent zip library:

$ time unzip -q /path/to/matlab-runtime/MATLAB_Runtime_R2019b_Update_9_glnxa64.zip 

real	0m30,287s
user	0m26,266s
sys	0m4,004s
$ 

Any way, we're back to a tractable situation at least. Thank you very much for fixing this issue.

@wolfv
Copy link
Member

wolfv commented Oct 31, 2024

great, thanks for testing. Seems a lot better than before. There is another zip library we could try, async-zip if things don't get better :)

@wolfv
Copy link
Member

wolfv commented Nov 6, 2024

I finally got around to fixing the original issue in #1164 @beenje

@DimitriPapadopoulos the latest release of rattler-build (0.29.0) ships with a patched zip crate. Hopefully things get merged soon upstream so that we can switch back! Thank you for your help with the debugging of this issue!

@beenje
Copy link
Contributor Author

beenje commented Nov 14, 2024

I tested the original issue with rattler-build 0.30.0 and confirm it's fixed. Thanks @wolfv !
Closing the issue.

@beenje beenje closed this as completed Nov 14, 2024
@wolfv
Copy link
Member

wolfv commented Nov 14, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants