Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve core.trustXXX config options #127

Open
msohn opened this issue Jan 10, 2025 · 18 comments
Open

Improve core.trustXXX config options #127

msohn opened this issue Jan 10, 2025 · 18 comments
Labels
enhancement New feature or request

Comments

@msohn
Copy link
Member

msohn commented Jan 10, 2025

Moving discussion started in change 1206684 to an issue for better visibility.

[Matthias]

Change 1206684 adds usage of FileSnapshot to FileReftableStack to improve performance of determining if the tables.list file, which contains the names of currently existing reftable files, is modified and needs to be reloaded. On top of that we need to add support for NFS using the usual workarounds we use in the RefDirectory implementation using the core.trustXXX config options.

Maybe it's time to consolidate proliferation of trustXXX config options ?

We already have

  • core.trustFolderStat
  • core.trustLooseRefStat
  • core.trustPackedRefsStat
    Should we add another option core.trustTablesListStat for tables.list or consolidate all of these options into a single one ?

Do we really need to configure these options independently ?

@msohn msohn added the enhancement New feature or request label Jan 10, 2025
@msohn
Copy link
Member Author

msohn commented Jan 10, 2025

[Nasser]

Unfortunately I think we probably do need them all.
We use these values for our Gerrit primaries in production:

    trustPackedRefsStat = after_open
    trustLooseRefStat = after_open
    trustFolderStat = false

and (for performance reasons) these values for our Gerrit replicas:

    trustPackedRefsStat = after_open
    trustLooseRefStat = always
    trustFolderStat = false

I don't know how we would express that replica config with a single option.

@msohn
Copy link
Member Author

msohn commented Jan 10, 2025

[Matthias]

ok, I see

I think these options still feel odd:

  • looking at TrustPackedRefsStat I think we don't need UNSET since that's effectively the same as NEVER assuming you set trustFolderStat = false on NFS
  • for LooseObjects we either have trustFolderStat=true or we implicitly use AFTER_OPEN opening an input stream on the parent directory
  • it seems for LooseObjects we shouldn't look at trustFolderStat but rather use TrustLooseRefStat.ALWAYS which is currently unused, and explicitly use TrustLooseRefStat.AFTER_OPEN` instead of assuming that implicitly (this value is currently also unused)
  • why does TrustLooseRefStat not have NEVER` ?
  • apparently we should also implement an AFTER_OPEN for PackDirectory#searchPacksAgainto avoid always rescanning the pack directory iftrustFolderStat = false` by opening an InputStream on the pack directory

Would it be clearer to remove trustFolderStat and add options for loose objects and the pack directory:

  • trustLooseRefStat
  • trustPackedRefStat
  • trustLooseObjStat
  • trustPackDirStat

Can we consolidate to a single TrustFileStat enum with the values NEVER, ALWAYS, AFTER_OPEN for possible values of all these options ?

And for RefTable reuse trustPackedRefStat option to configure how stats for tables.list are handled ?

@msohn
Copy link
Member Author

msohn commented Jan 10, 2025

[Nasser]

I think these options still feel odd:

  • looking at TrustPackedRefsStat I think we don't need UNSET since that's effectively the same as NEVER assuming you set trustFolderStat = false on NFS

I don't recall the use case for UNSET. Martin Fick do you recall?

  • for LooseObjects we either have trustFolderStat=true or we implicitly use AFTER_OPEN opening an input stream on the parent directory

I think the input stream opening only happens on failures though, correct?

  • it seems for LooseObjects we shouldn't look at trustFolderStat but rather use TrustLooseRefStat.ALWAYS which is currently unused, and explicitly use TrustLooseRefStat.AFTER_OPEN` instead of assuming that implicitly (this value is currently also unused)

I don't think we would want TrustLooseRefStat affecting loose objects, since those are fundamentally different things (objects are immutable).

  • why does TrustLooseRefStat not have NEVER` ?

I don't remember... Kaushik Lingarkar do you recall?

  • apparently we should also implement an AFTER_OPEN for PackDirectory#searchPacksAgainto avoid always rescanning the pack directory iftrustFolderStat = false` by opening an InputStream on the pack directory

Yeah, that might be a nice improvement. I don't know if you need to open the pack dir or the parent (objects/).

Would it be clearer to remove trustFolderStat and add options for loose objects and the pack directory:

  • trustLooseRefStat
  • trustPackedRefStat
  • trustLooseObjStat
  • trustPackDirStat

Yes, I think having consistent naming like this would be nice.

Can we consolidate to a single TrustFileStat enum with the values NEVER, ALWAYS, AFTER_OPEN for possible values of all these options ?

That sounds reasonable to me.

And for RefTable reuse trustPackedRefStat option to configure how stats for tables.list are handled ?

I don't know the reftable implementation well enough to know if tables.list conceptually maps better to packed refs or loose refs for purposes of file modification, but I think if we have these other 4 configs, a 5th for trustRefTablesListStat would be acceptable too.

@msohn
Copy link
Member Author

msohn commented Jan 10, 2025

[Matthias]

  • it seems for LooseObjects we shouldn't look at trustFolderStat but rather use TrustLooseRefStat.ALWAYS which is currently unused, and explicitly use TrustLooseRefStat.AFTER_OPEN` instead of assuming that implicitly (this value is currently also unused)

I don't think we would want TrustLooseRefStat affecting loose objects, since those are fundamentally different things (objects are immutable).

sorry, I went off the rails here, what I meant is we should introduce another trustLooseObjStat option instead of using trustFolderStat which seems not to be well defined.

And for RefTable reuse trustPackedRefStat option to configure how stats for tables.list are handled ?

I don't know the reftable implementation well enough to know if tables.list conceptually maps better to packed refs or loose refs for purposes of file modification, but I think if we have these other 4 configs, a 5th for trustRefTablesListStat would be acceptable too.

ok, then let's introduce another option trustRefTablesListStat

In FileReftableStack the tables.list file contains the file names of currently existing reftable files, if it was modified the stack of reftables needs to be reloaded. Updates are protected by creating a file lock on this file.

@mfick-nvidia
Copy link
Contributor

looking at TrustPackedRefsStat I think we don't need UNSET since that's effectively the same as NEVER assuming you set trustFolderStat = false on NFS

I believe the idea is to be able to use trustFolderStat (or some better named thing) to control everything in a single place eventually. i.e. imagine that trustFolerStat started to support after_open, then maybe no one would need to set trustLooseRefStat = after_open or trustPackedRefsStat = after_open, they could just set trustFolderStat = after_open.

@quic-nasserg
Copy link
Contributor

Ah, so UNSET means INHERIT? I agree that makes sense if we have a good "base" config option, maybe re-using that enum name Matthias suggested trustFileStat? Then if you want after_open for everything except loose refs (like my replica config above), then you could have:

trustFileStat = after_open
trustLooseRefStat = always

Did I get that right?

@msohn
Copy link
Member Author

msohn commented Jan 10, 2025

ok, so do we agree on this set of options:

  • trustFileStat general option, used if the more specific options aren't set (or set to INHERIT by default to make this very explicit)

and the more specific options

  • trustLooseRefStat for loose refs
  • trustPackedRefsStat for packed-refs
  • trustRefTablesListStat for tables.list with RefTable
  • trustLooseObjStat for loose objects
  • trustPackDirStat for pack directory
    which can be used to override the value of trustFolderStat for specific files

@msohn
Copy link
Member Author

msohn commented Jan 13, 2025

related Gerrit issue: https://issues.gerritcodereview.com/issues/40010601

@mfick-nvidia
Copy link
Contributor

How about trustStats instead of trustFileStat?

@mfick-nvidia
Copy link
Contributor

The reason I suggest trustStats instead of trusFileStat is because I believe that sometimes it is the folder stats and not the files stats that we are trusting.

I like all the other names that Matthias has suggested, and I like the idea of an explicit INHERIT setting and concept. It will be really nice to basically have only one setting to need to set. Any chance we can make the default for that setting AUTO, and use the FS type to set it to after_open on NFS?

@msohn
Copy link
Member Author

msohn commented Jan 13, 2025

I thought about the name of the linux fstat function "get file status" which can be also applied to files and folders. How about naming it trustfstat ?

Do you have a proposal how to reliably detect NFS from Java ?

@mfick-nvidia
Copy link
Contributor

I'm not sure I love adding the "f", but I'm OK with it. Would it be trustFStat?

As for NFS, the best I could find is to run

stat -fc %T

which seems to work on centOs. It may not be ideal, but it might still be worth it? After all auto would be a helper for when it is unset, if it is wrong, the user can configure it, in the meantime it would mostly just work for many people?

@msohn
Copy link
Member Author

msohn commented Jan 13, 2025

We can also call it trustStat after the stat function. Capitalization doesn't matter since keys of git config options are case-insensitive.

Tried this command and it works on Mac OS 15.2, Debian 11, Ubuntu 24.04 and SLES 15 :-)
Will implement this in FS_Posix.

@mfick-nvidia
Copy link
Contributor

Will implement this in FS_Posix.

Since NFS is not actually Posix, it might be better to have a superclass, maybe something like FS_nix which then has FS_Posix and FS_NFS as subclasses?

@msohn
Copy link
Member Author

msohn commented Jan 15, 2025

Pushed implementation for review https://eclipse.gerrithub.io/c/eclipse-jgit/jgit/+/1206686

Automatic configuration will be done in another change.

@mfick-nvidia
Copy link
Contributor

I'm confused, why are we eliminating the finer grained ref options if we are adding a global switch that can be inherited?

@msohn
Copy link
Member Author

msohn commented Jan 15, 2025

@mfick-nvidia
Copy link
Contributor

I see, sorry, I misread the deprecation notices, they were for code, not for the config items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants