Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document/best practice for tracking ABI apart from package version #2401

Open
minrk opened this issue Dec 10, 2024 · 8 comments
Open

document/best practice for tracking ABI apart from package version #2401

minrk opened this issue Dec 10, 2024 · 8 comments
Labels

Comments

@minrk
Copy link
Member

minrk commented Dec 10, 2024

Your question:

Many libraries version their ABI separately from their API/package version. ABI stability varies widely, with some being extremely stable (major package revisions don't break the ABI for years and are backward compatible), while others are the opposite (patch releases can break the ABI). Right now, we only have a standard of semver-pinning the package (with varying degrees of strictness, as appropriate), assuming the package version is a good-enough proxy. I think that works okay most of the time.

What I'd like is to document a standard/recommended approach for packagers who want to explicitly track the ABI, so that it's not the wild west of different strategies for different packages. I expect this to be a relatively small number of packages, only those that have explicit ABI stability plans / versioning.

Strategies include:

  • put SOVERSION in the package name, as is done in most linux distros (e.g. libzmq5). This can technically lead to allowing concurrent installation of multiple ABI versions, if we really want (I'm not sure we do)
  • track versioned libsomething_abi package with the SOVERSION as the major version (minor/patch version may not always be the most obvious, but might come from the package version or the library filename). There is some precedent in python_abi, though it has more to consider than the version.
  • others?

It would be nice to have written down:

  • a preferred strategy, package naming
  • how run_exports/constraints should be done
  • migration process for existing packages (repodata patches, if needed)

As a simplest possible example, libfabric has an explicitly versioned ABI, and the upcoming 2.0 release drops some old APIs but maintains backward compatibility in the ABI, so it is a minor release (ABI 1.8). This cannot be represented as a version constraint in the package, because <2.0 excludes a compatible version, and <3.0 may exclude a compatible version, we cannot know until 3.0 is planned/released.

So the correct run_exports should result in a build with libfabric 1.19.1 being runtime compatible with:

  • libfabric 1.14 (older, same abi)
  • libfabric 2.0 (newer, major package release, minor abi release)

but if/when there is an abi break, it won't be accepted at runtime, and the package version in which it will happen cannot be known.

My current thought is to have:

  • libfabric_abi empty metapackage, version matching the documented ABI version (set build string to avoid/ensure duplicate uploads?)
  • libfabric has:
    run_exports:
    - {{ pin_subpackage('libfabric_abi', max_pin='x') }}
    -  libfabric # unconstrained
    run:
    - libfabric_abi {{ abi_version }}.*

which would allow safe backward-and-forward ABI compatibility. Relevant to this strategy is that the unconstrained libfabric means we would need repodata patches for all published libfabric builds to depend on libfabric_abi, otherwise the run_exports won't properly constrain the libfabric builds that are published without a dependency on libfabric_abi. We could avoid the repodata patch if libfabric_abi also had a run_constrained on libfabric to exclude any previously-published libfabric versions that are not ABI compatible.

but I don't want to do this if other strategies are preferred and would be (or already are?) taken elsewhere in conda-forge.

links:

This is related to #2326 in that it is another case where the package version doesn't adequately capture the ABI, but in that case there are other inputs to the ABI (the compiler version), whereas this is purely about the contents of the package.

@minrk minrk added the question label Dec 10, 2024
@isuruf
Copy link
Member

isuruf commented Dec 10, 2024

@carterbox
Copy link
Member

I actually wrote a CFEP a few years ago to propose tracking the ABI in the build string, but @isuruf convinced me that appending the soname to the package name is the better approach.

This approach already deployed by myself and others in multiple feedstocks including:

https://github.com/conda-forge/dav1d-feedstock
https://github.com/conda-forge/libavif-feedstock
https://github.com/conda-forge/llvmdev-feedstock
https://github.com/conda-forge/openlibm-feedstock
https://github.com/conda-forge/libsecp256k1-feedstock
https://github.com/conda-forge/libnvpl-fft-feedstock

@minrk
Copy link
Member Author

minrk commented Dec 11, 2024

Great, thanks for linking to that prior discussion! Sorry I didn't find it in my search.

There does seem to be some inconsistency in how files are packaged for the packages that are doing something like this already. For example:

  • libgfortran is an empty metapackage, depending on an exact pin of libgfortran5, which contains libgfortran.so and libgfortran.so.5 (meaning libgfortran5 conflicts with libgfortran4)
  • llvmdev contains libLLVM.so, and has an exact pin on libllvm19, which contains libllvm-19.so and libllvm.so.19.1. There is no libllvm` package
  • dav1d 1.5 only pins libdav1d7 to >=1.5,<2, dav1d-dev contains libdav1d.so and exact pin on libdav1d7, which has libdav1d.so.7

So none of these seem to quite follow the same conventions as each other in terms of which files go in which package and how dependencies relate and the split output names. I'm not suggesting any of these need to change, but they do suggest to me that it would be a good idea to have a canonical example to reference for other packages to follow.

For the libfabric example, the ABI has its own version number. Would we use that version number anywhere, or just to help us look up the package version to use in the lower bound of run_exports? i.e. libfabric 1.19 is ABI 1.6, so it would have run_exports libfabric1 >=1.14.0? It shouldn't have an upper bound, which is handled by the soversion in the name and libfabric1 2.0 should be accepted as ABI compatible with libfabric1 1.14.0.

How's this for a sketch to document:

  • libname is the 'development' package (it may end up called libname-devel or name-devel)
    • this is the package that should be in host dependencies
    • this should have the usual dev files like headers, but most notably this package should have the unversioned libname.so symlink
    • it should have exact pinning on libname{soversion}
    • it should have run_exports on libname{soversion}, and usually not on libname itself
    • run_exports should have a lower bound that should default to the current version as the safest option, but may set a lower version in the case of a stable, backward-compatible ABI
    • run_exports should not have an upper bound (the soversion is already responsible for this)
  • libname{soversion} contains the actual dynamic library
    • only the library itself and the files required for runtime (typically libname.so.{SOVERSION} and libname.so.{SOVERSION}.x.y)
    • this has all the true runtime dependencies
    • this package shouldn't be depended on in host and doesn't need run_exports of its own

Any other suggestion on what to say for if/when this is worth doing? Personally, I'm mostly concerned with correctly expressing how broad ABI compatibility is in some packages I maintain, since the standard package pinning practice is currently far too strict. Or suggestions for example tests to verify e.g. SONAME/install_name is the right value for this to work? I opened prefix-dev/rattler-build#1217 recently on rattler-build to suggest support for standardized tests for these things at the level of the build tool, but I imagine that's not a high priority.

@isuruf
Copy link
Member

isuruf commented Dec 11, 2024

libllvm is special because we want to allow multiple major versions to exist together since it's so hard to keep one LLVM version for downstream packages. LLVM is notorious for API/ABI breakage in major version changes which happens every 6 months (or less).

Here's what I think should happen

  1. Disallow co-installability (99% of cases. Special cases like LLVM does not fall into this)
{% set libfoo_soversion = 5 %}
- name: libfoo-devel
  build:
     run_exports:
        - {{ pin_subpackage("libfoo", max_pin=None) }}
        - {{ pin_subpackage("libfoo" ~ libfoo_soversion, max_pin=None) }}
  requirements:
     run:
        - {{ pin_subpackage("libfoo" ~ libfoo_soversion, exact=True) }}
        - {{ pin_subpackage("libfoo", exact=True) }}
  files:
    - lib/libfoo.so
    - lib/libfoo.a
    - include/foo.h

- name: libfoo
  requirements:
     run:
        - {{ pin_subpackage("libfoo" ~ libfoo_soversion, max_pin=None) }}

- name: libfoo{{ libfoo_soversion }}
  requirements:
     host:
        - libbar
  files:
    - lib/libfoo.so.{{ libfoo_soversion }}
    - lib/libfoo.so.{{ full_version }}
  1. Allow co-installbility (for special cases like LLVM. Note that not all packages can be co-installable)
{% set libfoo_soversion = 5 %}
- name: libfoo-devel
  build:
     run_exports:
        - {{ pin_subpackage("libfoo" ~ libfoo_soversion, max_pin=None) }}
  requirements:
     run:
        - {{ pin_subpackage("libfoo" ~ libfoo_soversion, exact=True) }}
        - {{ pin_subpackage("libfoo", exact=True) }}
  files:
    - lib/libfoo.so
    - lib/libfoo.a
    - include/foo.h

- name: libfoo{{ libfoo_soversion }}
  requirements:
     host:
        - libbar
  files:
    - lib/libfoo.so.{{ libfoo_soversion }}
    - lib/libfoo.so.{{ full_version }}

@minrk
Copy link
Member Author

minrk commented Dec 11, 2024

Got it, thank you! It looks like the presence of libfoo is for the mutual exclusivity, but I'm not sure I 100% get the libfoo dependency difference. Why doesn't libfoo have an exact dependency on libfoo{so}? In case 1,libfoo does not pin libfoo{SO} (so libfoo 1.5 may install libfoo1 1.8), while in case 2 there is no libfoo but libfoo-devel still depends on it in run (or should I assume it's the same, or maybe it isn't meant to be there?).

What needs to be said about Windows, since it doesn't do SOVERSION as I recall? It always needs to be mutually exclusive, right? Unless special consideration is taken? I see libllvm19 exists but is empty on Windows, but maybe that's not the best example. I also see dav1d-dev has run_exports on libdav1d7 but the file contents of libdav1d7 are not versioned on Windows. I believe this means that two packages may depend on libdav1d7 and libdav1d8 depending on build-time david1-dev and be co-installable with a clobber error on bin/dav1d.dll.

@carterbox
Copy link
Member

I believe this means that two packages may depend on libdav1d7 and libdav1d8 depending on build-time david1-dev and be co-installable with a clobber error on bin/dav1d.dll.

The dav1d-feedstock has a mutex package for Windows which serves the same purpose as the unversioned library packages in @isuruf's examples.

the file contents of libdav1d7 are not versioned on Windows

That's correct. Windows doesn't have a native mechanism for sonames. It hasn't quite caught on in the Windows platform to include an ABI version in the library name by convention. For example, dav1d.dll could be dav1d7.dll or dav1d-7.dll. Some projects do this, but dav1d does not.

@carterbox
Copy link
Member

carterbox commented Dec 11, 2024

Disallow co-installability (99% of cases. Special cases like LLVM does not fall into this)

Disallowing co-installation implies channel-wide pinnings and migrators because you want to keep all the packages in sync in order to prevent older packages from holding newer packages back. However, it (edit: it being co-installation) may result in larger environments because multiple copies of the same library are installed.

@minrk
Copy link
Member Author

minrk commented Dec 12, 2024

The dav1d-feedstock has a mutex package for Windows which serves the same purpose as the unversioned library packages in @isuruf's examples.

Thanks for pointing that out! This variety of approaches to the same goal is why I would like to have something written down. Not because there's anything wrong with one approach or another, just to have a reference pattern for new packages to adopt as a default, at least.

However, it may result in larger environments because multiple copies of the same library are installed.

Allowing co-installation is what results in this, right? Not Disallowing it? Maybe I misunderstood what it refers to here.

I'll try to summarize these things in a KB entry for you to review. Thanks for all your insights.

In the meantime, I've tried applying your suggestions to libfabric in conda-forge/libfabric-feedstock#16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants