Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Package Handling Proposal #1777

Closed
10 of 15 tasks
ferd opened this issue May 4, 2018 · 12 comments
Closed
10 of 15 tasks

New Package Handling Proposal #1777

ferd opened this issue May 4, 2018 · 12 comments
Labels
enhancement new behaviour or additional functionality

Comments

@ferd
Copy link
Collaborator

ferd commented May 4, 2018

This is a large laundry list to discuss and track all the work to do with future package management.

Objectives

  • Stay up to date with Hex API to be a good citizen
  • Reduce networking trips and disk storage required
  • Support mirrors as a geographic optimization (higher priority to local mirror)
  • Support mirrors as a backup (higher priority to hex)
  • Support corporate/private indexes (controlled environment, prevents default hex from being used)
  • Support partial indexes (controlled environment for private repositories)
  • Prepare the way for offline builds (on-disk index only, or controlled local cache)

Invariants

  • The lockfile format does not need to change
  • A user can keep alternating between rebar3 versions in different projects without them breaking each other
    • The cache format on disk doesn't need to be blown up, the disk index file is distinct
  • A user can alternate between rebar3 versions without conflict
    • if the lock format is the same and the cache is mostly compatible, things should keep working unless a new feature without equivalent in past versions is used
  • No major performance regression

Step 1

  • deprecate R16 once OTP-21 lands
  • Use hex_erl (https://github.com/hexpm/hex_erl) and vendor it
  • First develop copies of all providers under the 'unstable' or 'experimental' namespace so people can try them there
    • maybe some fancy macro stuff to avoid having to otherwise duplicate code for all dependencies since a non-namespaced dialyzer will need to depend on non-namespaced install_deps
  • start with an empty set of packages
  • whenever a new package is needed, fetch all of its versions and add them to an ETS table that gets dumped to disk as a quick index
    • we can now work from a partial index
  • rebar3 update only updates known packages in its local index, and ignores unseen ones
  • the rest of the code can mostly work the same

Step 2

  • introduce a way to do a search on multiple package indexes;
    • allow an ordered list of packages indexes, possibly with distinct public keys, to be used as a config mechanism
    • if the first one does not contain a package, look into a later one. Should be able to match based on a checksum in the lockfile to know when we're at the right spot. Only once the whole list of indexes in exhausted will the build fail
  • the list of preferences should be either global or only defined by the top-level app to prevent deps from doing shitty stuff, or allowing corporate control to work
  • it is possible that some packages will have multiple checksums for the same version (i.e. mypkg-1.0.0 could be published with different code on two indexes). Not quite sure how to handle that one yet
    • last one wins, quite unreliable, shouldn't use
    • maintain multiple valid checksums on conflict in the local index copy?
    • maintain a per-index keying of the table of the form {Index, ActualKey} which allows to do clean lookups on each
      • requires rebar3 update to update all indexes known, and maybe to specify per-index updates
      • probably the one we should use
      • when a known package with a known hash is desired but not found in an index, we have to pick between looking it up over the network over each index we go through them, or first scan all the known local indexes before reaching for the network back from the first one again. The latter is likely more efficient for network usage, and likely to be faster in practice
  • on a first fetch with no known hash, first one found wins and gets stored in the lock file
  • maybe (we need to discuss if it is necessary) add a way to specify a checksum in the dep declaration
    • {uuid, "1.7.3", {pkg, uuid_erl}} is shorthand for {uuid, {"1.7.3", undefined}, {pkg, uuid_erl}} which resolves after a search to {uuid, {"1.7.3", "c5df97"}, {pkg, uuid_erl}} ?)
      • no support required from hex.pm, since the hash check is for client-local validation. Displaying it could be done in the checksum field maybe?
    • {pkg, Name, Vsn, Hash} as an internal format must stay the same

Step 3

  • Fold back experimental providers to replace regular ones, deprecate old stuff
    • maybe fold back into mainline after each phase
  • do we want to add the old providers as 'deprecated' namespace in case of an issue?

Nice to have

  • allow the user to specify a local index file with a custom cache path? (global_rebar_dir reuse on a local config?)
  • Use partial local index files to write tests with arbitrary packages and index locations, rather than mocking
  • add a rebar3 vendor command that takes the current lockfile, fetches all the packages for them, and dumps a local copy of the index with only the required apps; this just lifts the local index
  • private hex repositories
@ferd ferd added the enhancement new behaviour or additional functionality label May 4, 2018
@ericmj
Copy link
Contributor

ericmj commented May 4, 2018

rebar3 update only updates known packages in its local index, and ignores unseen ones

Will it do a full update of the index? That would mean you would have to do as many HTTP requests as you have known packages. If you allow top-level apps to configure repos does that mean rebar3 update will update differently depending on which project you are in?

@ferd
Copy link
Collaborator Author

ferd commented May 4, 2018

The way I had it in mind is that when we update the package index, we take the list of all current locally known applications (i.e. all the projects you have developed on your system based on the cache index we maintain) and go fetch new versions for them. Hopefully over a single keepalive HTTPS connection to avoid handshaking super often, ideally with pipelining enabled so that we are less likely to suffer jittery head-of-line blocking in aggregate

This is under a cost model I assumed to be based on bytes transferred; if you have 100,000 packages but only have seen 150, it would probably be cheaper to get the 150 rather than pruning 99% of them on reception, and would likely keep a smaller working memory size for the build tool on smaller devices. It is what I thought was the main incentive behind the Hex API v2.

If you pay hosting by the request though, I can see why that would be a concern since it flips the cost budget around.

@ericmj
Copy link
Contributor

ericmj commented May 4, 2018

Assuming updating the index is fast, which it is in my experience, what's the benefit of rebar3 update over updating a partial index only when fetching/updating dependencies? When doing this you can also avoid making the request when a package is locked which is the common case.

I am not worried about costs, I am only looking from a user perspective.

@ericmj
Copy link
Contributor

ericmj commented May 4, 2018

Hopefully over a single keepalive HTTPS connection to avoid handshaking super often, ideally with pipelining enabled so that we are less likely to suffer jittery head-of-line blocking in aggregate

We had to disable pipelining on httpc because it breaks on fast networks, there is some race condition that I haven't found the cause of yet. On hex we configure httpc to use 8 keep-alive connections instead.

@ferd
Copy link
Collaborator Author

ferd commented May 4, 2018

The case where it might be useful that jumps to mind is one where your local copy has a reference to a package that was in its first hour of existence and has since changed checksums on a re-publish.
In such a case, we would see a local version with an existing checksum, say that it does not match, and defer to another index (if any) and fail if nothing is found. An update fixes the situation.

Another case could be one where you have an order of hex indices like [Corporate, Mirror1, OfficialHex] for which case the Corporate index may be more mutable than Hex's official and may need more frequent updates. Another case here is with Mirror1 that didn't have a package existing in the OfficialHex at the time of first fetch: if we do "lookup all local indexes before reaching the network", then we will never discover the package existing in a mirror, corporate or otherwise. Calling rebar3 update would let you re-fetch copies of a new version in the mirror to discover it, so that next time you clear your cache, you do get the closest copy.

Aside from that, my expectation is that rebar3 update will almost never be required since, as you said, we look up missing packages individually anyway.

@tsloughter
Copy link
Collaborator

We should get these broken into individual cards so they can go on the project board.

@ferd
Copy link
Collaborator Author

ferd commented Jun 12, 2018

I've added the main ones in order, but skipped some of them for now.

@tsloughter
Copy link
Collaborator

I'm not liking the idea of having to put all these providers under an unstable namespace.

I think a better idea is to simply make rc releases to get feedback and bug reports.

@ferd
Copy link
Collaborator Author

ferd commented Aug 20, 2018

That can work. Should we have a diverging branch for it? Or we're freezing mainline dev until then? If so we should cut a release earlier than later.

@tsloughter
Copy link
Collaborator

I think we can make that call once I get to the first PR.

@tsloughter
Copy link
Collaborator

I've crossed out the task in Step 1 about using an experimental namspace as we discussed.

I've checked off the tasks completed by the branch hex_core.

@ferd
Copy link
Collaborator Author

ferd commented Jun 8, 2019

I'll close this since the nice to have were mostly pointing to some ad-hoc vendoring/Monorepo mechanism that would better be tackled elsewhere. The rest was all completed.

@ferd ferd closed this as completed Jun 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new behaviour or additional functionality
Projects
None yet
Development

No branches or pull requests

3 participants