Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Find traits #267

Merged
merged 51 commits into from
Dec 5, 2021
Merged

Refactor Find traits #267

merged 51 commits into from
Dec 5, 2021

Conversation

Byron
Copy link
Member

@Byron Byron commented Nov 30, 2021

Untangle and simplify the Find trait

Tasks

  • add contains(oid) to trait
  • review the need for bundle::Location and if it's worth the effort. If so, see if it can be taken out of the Object type, which would allow it to move to git-object.
    • split into simple Find trait and packed::Find trait, the latter providing more information for speedups
    • assure pack-ids are static by assigning usize ids while remembering their hash, assuring the small ids don't ever conflict while assigning the same id to each pack (in design sketch), to simplify it.
    • remove pack-cache argument from packed::Find as well for consistency and accepting that handle-based implementations have advantages. This also means that a pack-cache must be leveraged internally and probably is trait-based to allow it to be a no-op.
    • Implement Find for the linked/static db
    • Figure out how to deal with this low-level bundle access in git_pack::Find - probably better to split it up so it's compatible with multi-pack indices
    • Add a 'handle' implementation for odb and use it everywhere
      • Make sure trait implementations are 'static, too, if they have to be, which means no internal OwnShared :/, must probably be generic over a Deref implementation like the linked iterator.
      • use in pack-create
      • Make object-cache an implementation detail for pack::Find, and make pack/create.rs enable it.
      • use it in git-repository state and make it configurable like before, maybe adapting the handle API even
      • It's clear that there only needs to be one Easy, thread-safe, and maybe an easy-local for the Rc based version. Keep the 'Access' abstraction though, because if this goes, State is Easy, and everything is compile time. Actually remove it, cargo smart-release shows the API in action and it seems like there is an unnecessary layer. The reason Repository stays is merely for having something that is Sync.
    • use the Find trait where possible.
    • consider making 'single-threadedness' opt-in, basically inverting the meaning of parallel. That way, it's the 'right' choice for most applications today, and can be opted in by some with other requirements.
      • doesn't work, cargo features are additive, and so are optional dependencies.
  • write docs

@Byron Byron force-pushed the find-refactor branch 2 times, most recently from 3bd9261 to 87930a1 Compare December 2, 2021 02:21
Byron added 20 commits December 3, 2021 08:03
The plan is to separate pack location entirely from Object and put
the location specific functions into a separate trait.
This is typed data baked by a slice for conversion into parsed ObjectRef's
for example.

This is usually the result of a `Find` operation on an object database.
It's meant for the next generation of object db handles which keep a
local cache of all the details of the actual object database.
… alter git_odb::Find trait (#266)

This will break a lot, but has to happen to prepare these traits for the
next generation of object databases.
With the new architecture this can be an implementation detail without
forcing it to be Sync.
Borrow is the manual form, Deref has allows for more automatic use
and more idiomatic looking code.
For completeness in case of single-threaded operations
It could easily be general over all kinds of store as long as there
is support for pack-caches, which might be helpful later for the 'final'
store type.

Ideally, this one now shows how to do it.
With this method it's easier to bypass local caches and control
the cache oneself entirely.
Byron added 7 commits December 3, 2021 08:31
We avoid using Bundle and boil it down to what it really needs to be
to build packs.
Now it should be mostly ready.
This saves virtual method calls and allocations, and probably
is cleaner too as one doesn't have to know that the default is
a no-op cache.
It's nearly there, but for some reason the boxed dyn traits don't get to
be Send even though it's specified.
Byron added 24 commits December 3, 2021 10:58
…utput::count::objects()` (#266)

It now is an implementation detail of the Find trait.
With the linked DB this is simply not possible anymore and we expect
these updates to happen automatically in future for greater convenience.

For now, in order to refresh a repository, one has to reopen it.
This way it's more inline with `Repository::refs`.
As the first step to remove the 'Easy' abstraction.
This great reduction of complexity allows for being multi-threading
capabie by default with the option to turn that off at compile time.

All `to|into_easy…()` methods are removed in favor of `to_easy()`
along with the removal of all `Easy` types in favor of the single
`easy::Handle`.
The `easy::Handle` is now a full (but shared) clone of the original
Rpeository with additional thread-local state, hence there is no more
need for a way to access the original repository.
…::Find*` (#266)

These are higher-level and generally more desirable.
The Find traits in `git-pack` are more useful internally when packs
have to be handled directly, for example when generating packs.
…ke a regression (#266)

Could be due to the new implementation going through a refcell, whereas
the previous one didn't.

That's fair, I guess, it gets quite a few calls, but also means that
counting with tree-diff got about 10% slower now that the cache is
built-in, maybe.

For max-speed, one would probably have a local cache to avoid
unnecessary object lookups for previously obtained information
alltogether.
It's a tiny bit slower, with or without cache, when going through
the handle.

It's interesting as it appears going through an Arc is measurable
on that level already.
@Byron Byron merged commit f71806f into main Dec 5, 2021
@Byron Byron deleted the find-refactor branch January 10, 2022 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant