-
-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-epoch based on queue message attribute #2824
Conversation
So basically we will "ignore" the epoch for most mods, only using it as the starting value for the epoch incrementation counter. So spec-wise, this means making |
Added a spec update to say that I don't want to add |
What am I talking about, of course |
My only comment is, that we would be putting a lot version logic into the scheduler. Potentially doing a fair amount of heavy lifting, to figure out the latest version. I know versions have historically been problematic, but I'm conscious of getting into a race condition of versioning here. I'd like to see us dig into the Metadata a bit and see big of a problem we're having and is it something we can solve by setting problematic mods to staging or having a different mechanism for flagging potential issues. One of the things we can potentially do with the new infrastructure is better signal when something is a problem and proactively raise an Issue. The code looks tidy with good test coverage and doco. If I've missed something or I'm being too conservative about the solution, I'm very happy to be corrected 🙂 |
Maybe a little personal background: I've been considering taking a break for a while, so when I fix issues such as those listed in the OP, I'm struck by how much manual intervention the current data model requires. CKAN needs a human to be actively inspecting metadata and reacting to problem reports from users, mostly due to out of order versions, where the manual clean-up is to increment the epoch and clean up the versions that were mis-filed under the old epoch, something that is eminently automatizable. Maybe I'm too much of a perfectionist, but if we ever want to get to a point where CKAN quietly runs itself in the background, this is the 80% case on which to focus.
How would you feel about a
No, the problematic mods are not known ahead of time (or to put it another way, all mods are potentially problematic). Our data integrity depends on mod authors to input version numbers sensibly, and we use
I'll give that a think. The committer component that listens to the Inflator's output would also be able to detect out of order versions, but I wanted to put the logic earlier than that so we could reduce the amount of manual intervention needed rather than just making it a smoother work flow. I'm happy to let this one simmer for a while. There's no point in merging it until the new infrastructure is almost ready to go live anyway, since these spec updates would be documenting things that don't in fact work yet. |
I've embarked upon a more in-depth survey, and we are not anywhere close to keeping up with the manual maintenance. Bad data is rife, but "out of sight out of mind." I'll continue tracking the fixes with links in the OP here (I'm going in alphabetical order, and at time of writing I'm just finishing up the A's). Another issue has emerged: The manual clean-up is itself error-prone. See KSP-CKAN/CKAN-meta#1477 and KSP-CKAN/NetKAN#7286; since we cannot test the sorting except by making a change and seeing what happens, it's easy to make a fix that causes another problem. Manually epoching old .ckan files is a bit fiddly too as it requires renaming them and editing the All of this would be a thing of the past with the idea in this PR. |
Okay, I think I can compromise on the notice in the spec. Most mod authors probably don't care about diverting epochs, else they would also care whether their mods are well-ordered.
Which could be added to the status json and be displayed on the status page. This way we would have an easy way to spot potential problems and can update the epoch in the netkans from time to time if the difference grows too high. |
Contributor burn out is something that is always at the forefront of my mind. If you need a break take one 🙂 You've been on the coal face for a while and it's easy for me to miss the number of manual interventions required when it isn't my primary focus. You've also made some incredible contributions and improved many long standing issues. One of the really big successes of the CKAN has been the ability to not force mod authors to do anything special, but it also means that we have to go to great lengths to ensure the metadata is correct. On reflection, there have been numerous metadata wranglers (including yourself) over the years that have done an amazing job to keep up. You're absolutely right about the mission to reduce this load as much as possible.
I think at this stage, lets keep it in python. It'll help us move fast, iterate and keep the scheduler light. It also will keep the burden lower on those who work on CKAN more focused on that part of the code base and I'm hoping moving to Python with a much tighter scope of each of the Infrastructure functions will make it easier for new contributors to get involved with the infrastructure side.
Versions are hard. Everyone has an opinion on how they work and it can be really hard to be consistent when doing things by hand. @HebaruSan has more than demonstrated that we have a problem to address. I'm pondering maybe we set a flag of some description (maybe staged? maybe it's own flag? epoched?), so we can raise PRs initially. Let me think on it, but for now I'm gunna give this the thumbs up because it fits our IfItsBetterMergedIt™ philosophy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From brief read of the code, this all looks good to me. I'll leave the full review to a more experienced C# contributor as the scope is a little wider than I'm familiar with.
I find that truly impressive as well. Debian assigns each package a maintainer who has to track patches and compile it and so on. CKAN not only uses the upstream packages unaltered, but it indexes them automatically! It's amazing that this works as well as it does.
Sounds good. My only concern is that the version comparison logic is fairly complex (hence the idea to re-use it directly), but whatever works is fine with me. CKAN/Core/Versioning/ModuleVersion.cs Lines 191 to 365 in 14ab152
For what it's worth, I come from an "adversarial code review" environment where the goal is to find as many problems as possible and address them to the satisfaction of the reviewer before anything gets merged, so I'm quite happy to field objections and questions. Please don't feel like you have to approve anything of mine before you're satisfied with it. 👌
I think most authors would be surprised that adding or removing a 'v' prefix would be significant, for example. I think most of them assume that we would just naturally sort versions in the order they were released chronologically, rather than according to version numbers.
Interesting idea, almost certainly do-able. Set a flag when the epoch incrementing code gets called, flow that through to the Staged attribute. As long as the downstream bot component handles that attribute, we'd get PRs for out of order releases. Nice! |
I did a short talk on indexing and archiving at LCA2018 -> https://www.youtube.com/watch?v=Y2e8wjcQo8g
No probs, if it gets too hard we can come back to it. But I think with some tests it should be ok, any edge cases that crop up we can write more tests. It's likely the first pass won't include the functionality, but once we're running and stable, adding it won't take much work.
Yeah, I think at least initially we might want to be involved with epochs and we'll already have the functionality to do it. |
I've seen it - who did you think that 'like' was from! 😁 |
Turns out that EDIT: Some Python code to take advantage of that: https://gist.github.com/HebaruSan/1fc031cbd97a9a1db2c7bf3c46e9894f |
Yeah, only problem is that calling subprocesses is expensive, which we'll have to do many times per mod. I have looked into binding directly to C# with python, but it looked very dependent on the code and what it's dependencies were. The version comparison is probably pretty tight in scope without dependencies, so building a ckan_version.dll is probably doable. But I think we can defer that for now. I've got the bulk of the harder logic in the new indexer done, mostly tying it all together left now. |
0755152
to
e72734b
Compare
Rebased to resolve conflicts now that the Scheduler might be ready soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall really like the concept. My only question, at least initially, do we want to set the staging flag + reason when an out of order release is detected? Might be good at least initially and if we're happy we can drop that requirement.
At least in the early stages, yes, I agree that would be a good idea. I'll make a note to add that. |
OK, an auto-epoch event should now turn on staging downstream. It was a little awkward because there wasn't an easy way to send data outside of the inflated metadata itself. I ended up adding the staging data to |
Before we merge this, we should make sure that
@DasSkelett, can you think of any others? Maybe Kerbalism? |
Kerbalism seems to only have multi-version releases lately, without need for "backports". It should be fine with auto-epoching. Historian-Expanded had two releases for different KSP versions closely after another a few days ago, but they were in the right order, and this hasn't happened before, so not sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With staging turned on, I'm happy to give this a go (once the mentioned netkans have been changed).
This PR grew out of a sidebar discussion on #2789. Please discuss it here instead of there, to allow that issue to stay focused on its main goals.
Problem
As things stand, CKAN-meta will inevitably accumulate inaccurate metadata every time a mod author releases an out-of-order version, which includes things like dropping a 'v' prefix or changing it to/from capital 'V', numbering like
["0.6", "0.65", "0.7"]
, accidentally dropping the0.
prefix of a version, fat-fingering an extra '.' in a version number, etc, Users will get "stuck" on older versions because the newer release isn't considered newer by CKAN.There are no safeguards against such things (and no alerts that it has happened), and all require manual clean-up in the form of updating the
x_netkan_epoch
property in .netkan files and sometimes manually updating versions and filenames of .ckan files. It also relies on someone to notice the problem in the first place, which often takes a while. Recent examples of such manual fixes:This property is also prone to errors, as a few times an author has re-homed a netkan via a metanetkan and dropped
x_netkan_epoch
entirely because they don't know what it does, and we had to tell them to add it back because it's important.Changes
Now the Netkan queue Inflator from #2798 supports an additional optional input attribute:
The
version
property of a newly inflated module will be compared to this value:When we detect an out of order release, the version epoch is increased automatically until the new release's version is greater (new version) or equal (same version update). The "main" part of the version remains the same.
If/when the scheduler component chooses to set this attribute (by checking the existing files in CKAN-meta), then this method will index out-of-order releases with correct epochs automatically. This will fix the above-mentioned problems with bad metadata.
Tests are included.
Caveats
We will need a way to suppress this behavior. It makes sense for the typical mod, but a few mods (KSPInterstellarExtended and Kopernicus-Backports) intentionally post "older" releases that are meant to be treated as such. I imagine this would take the form of a new .netkan property:
If set, the scheduler would skip checking the CKAN-meta files to find the highest version and not set the HighestVersion attribute. The Inflator would then fall back to using the
x_netkan_epoch
property as it does today.