Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-epoch based on queue message attribute #2824

Merged
merged 2 commits into from
Nov 29, 2019

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented Jul 2, 2019

This PR grew out of a sidebar discussion on #2789. Please discuss it here instead of there, to allow that issue to stay focused on its main goals.

Problem

As things stand, CKAN-meta will inevitably accumulate inaccurate metadata every time a mod author releases an out-of-order version, which includes things like dropping a 'v' prefix or changing it to/from capital 'V', numbering like ["0.6", "0.65", "0.7"], accidentally dropping the 0. prefix of a version, fat-fingering an extra '.' in a version number, etc, Users will get "stuck" on older versions because the newer release isn't considered newer by CKAN.

There are no safeguards against such things (and no alerts that it has happened), and all require manual clean-up in the form of updating the x_netkan_epoch property in .netkan files and sometimes manually updating versions and filenames of .ckan files. It also relies on someone to notice the problem in the first place, which often takes a while. Recent examples of such manual fixes:

This property is also prone to errors, as a few times an author has re-homed a netkan via a metanetkan and dropped x_netkan_epoch entirely because they don't know what it does, and we had to tell them to add it back because it's important.

Changes

Now the Netkan queue Inflator from #2798 supports an additional optional input attribute:

Attr HighestVersion: The highest version string of this module that is currently indexed

The version property of a newly inflated module will be compared to this value:

  • If it is higher, then we are indexing a new release with a good version string
  • If it is the same, then we are updating an existing release
  • If it is lower, then we have detected an out of order release!

When we detect an out of order release, the version epoch is increased automatically until the new release's version is greater (new version) or equal (same version update). The "main" part of the version remains the same.

If/when the scheduler component chooses to set this attribute (by checking the existing files in CKAN-meta), then this method will index out-of-order releases with correct epochs automatically. This will fix the above-mentioned problems with bad metadata.

Tests are included.

Caveats

We will need a way to suppress this behavior. It makes sense for the typical mod, but a few mods (KSPInterstellarExtended and Kopernicus-Backports) intentionally post "older" releases that are meant to be treated as such. I imagine this would take the form of a new .netkan property:

    "x_netkan_allow_out_of_order": true,

If set, the scheduler would skip checking the CKAN-meta files to find the highest version and not set the HighestVersion attribute. The Inflator would then fall back to using the x_netkan_epoch property as it does today.

@HebaruSan HebaruSan added Enhancement New features or functionality Pull request In progress We're still working on this Netkan Issues affecting the netkan data Tests Issues affecting the internal tests labels Jul 2, 2019
@DasSkelett
Copy link
Member

So basically we will "ignore" the epoch for most mods, only using it as the starting value for the epoch incrementation counter.
In this case I would prefer to drop the x_netkan_epoch fully from the netkan files (except those with x_netkan_allow_out_of_order), else this will lead to a lot of confusion for everyone who doesn't know the internals of the bot / indexing mechanism.

So spec-wise, this means making x_netkan_epoch dependent on x_netkan_allow_out_of_order, or combine the two to "x_netkan_use_manual_epoch" : <epoch_integer>.

@HebaruSan
Copy link
Member Author

Added a spec update to say that x_netkan_epoch specifies the minimum epoch and that it may be set or incremented automatically.

I don't want to add x_netkan_allow_out_of_order to the spec in this PR since its functionality cannot be implemented in this PR.

@HebaruSan
Copy link
Member Author

What am I talking about, of course x_netkan_allow_out_of_order can be implemented here. Added in the latest commit. What can't be implemented here is the optimization of the scheduler skipping the version scan step, but now that will be purely an optional optimization on top of netkan.exe's behavior rather than necessary to make x_netkan_allow_out_of_order work at all.

@techman83
Copy link
Member

My only comment is, that we would be putting a lot version logic into the scheduler. Potentially doing a fair amount of heavy lifting, to figure out the latest version.

I know versions have historically been problematic, but I'm conscious of getting into a race condition of versioning here.

I'd like to see us dig into the Metadata a bit and see big of a problem we're having and is it something we can solve by setting problematic mods to staging or having a different mechanism for flagging potential issues. One of the things we can potentially do with the new infrastructure is better signal when something is a problem and proactively raise an Issue.

The code looks tidy with good test coverage and doco. If I've missed something or I'm being too conservative about the solution, I'm very happy to be corrected 🙂

@HebaruSan
Copy link
Member Author

HebaruSan commented Jul 3, 2019

Maybe a little personal background: I've been considering taking a break for a while, so when I fix issues such as those listed in the OP, I'm struck by how much manual intervention the current data model requires. CKAN needs a human to be actively inspecting metadata and reacting to problem reports from users, mostly due to out of order versions, where the manual clean-up is to increment the epoch and clean up the versions that were mis-filed under the old epoch, something that is eminently automatizable. Maybe I'm too much of a perfectionist, but if we ever want to get to a point where CKAN quietly runs itself in the background, this is the 80% case on which to focus.

My only comment is, that we would be putting a lot version logic into the scheduler. Potentially doing a fair amount of heavy lifting, to figure out the latest version.

How would you feel about a netkan.exe --find-latest-version AwesomeMod command for this? We could avoid duplicating the version logic in Python by having the Scheduler run that command instead. It would use existing logic to load each .ckan file in a subfolder of CKAN-meta and compare the versions to find the maximum. Then all the Scheduler would need is copies of netkan.exe and CKAN-meta.

is it something we can solve by setting problematic mods to staging

No, the problematic mods are not known ahead of time (or to put it another way, all mods are potentially problematic). Our data integrity depends on mod authors to input version numbers sensibly, and we use x_netkan_epoch to work around the resultant "human error" problems.

having a different mechanism for flagging potential issues

I'll give that a think. The committer component that listens to the Inflator's output would also be able to detect out of order versions, but I wanted to put the logic earlier than that so we could reduce the amount of manual intervention needed rather than just making it a smoother work flow.

I'm happy to let this one simmer for a while. There's no point in merging it until the new infrastructure is almost ready to go live anyway, since these spec updates would be documenting things that don't in fact work yet.

@HebaruSan
Copy link
Member Author

HebaruSan commented Jul 3, 2019

I've embarked upon a more in-depth survey, and we are not anywhere close to keeping up with the manual maintenance. Bad data is rife, but "out of sight out of mind." I'll continue tracking the fixes with links in the OP here (I'm going in alphabetical order, and at time of writing I'm just finishing up the A's).

Another issue has emerged: The manual clean-up is itself error-prone. See KSP-CKAN/CKAN-meta#1477 and KSP-CKAN/NetKAN#7286; since we cannot test the sorting except by making a change and seeing what happens, it's easy to make a fix that causes another problem. Manually epoching old .ckan files is a bit fiddly too as it requires renaming them and editing the version property by hand.

All of this would be a thing of the past with the idea in this PR.

@DasSkelett
Copy link
Member

Okay, I think I can compromise on the notice in the spec. Most mod authors probably don't care about diverting epochs, else they would also care whether their mods are well-ordered.
But I would find a way to easily see especially big auto incrementations nice to have. I can imagine another attribute in the outgoing SQS:

Attr AutoIncrements: The number of automated epoch bumps for the generated CKAN

Which could be added to the status json and be displayed on the status page.

This way we would have an easy way to spot potential problems and can update the epoch in the netkans from time to time if the difference grows too high.

@techman83
Copy link
Member

Maybe a little personal background: I've been considering taking a break for a while, so when I fix issues such as those listed in the OP, I'm struck by how much manual intervention the current data model requires. CKAN needs a human to be actively inspecting metadata and reacting to problem reports from users, mostly due to out of order versions, where the manual clean-up is to increment the epoch and clean up the versions that were mis-filed under the old epoch, something that is eminently automatizable. Maybe I'm too much of a perfectionist, but if we ever want to get to a point where CKAN quietly runs itself in the background, this is the 80% case on which to focus.

Contributor burn out is something that is always at the forefront of my mind. If you need a break take one 🙂 You've been on the coal face for a while and it's easy for me to miss the number of manual interventions required when it isn't my primary focus. You've also made some incredible contributions and improved many long standing issues.

One of the really big successes of the CKAN has been the ability to not force mod authors to do anything special, but it also means that we have to go to great lengths to ensure the metadata is correct. On reflection, there have been numerous metadata wranglers (including yourself) over the years that have done an amazing job to keep up. You're absolutely right about the mission to reduce this load as much as possible.

How would you feel about a netkan.exe --find-latest-version AwesomeMod command for this? We could avoid duplicating the version logic in Python by having the Scheduler run that command instead. It would use existing logic to load each .ckan file in a subfolder of CKAN-meta and compare the versions to find the maximum. Then all the Scheduler would need is copies of netkan.exe and CKAN-meta.

I think at this stage, lets keep it in python. It'll help us move fast, iterate and keep the scheduler light. It also will keep the burden lower on those who work on CKAN more focused on that part of the code base and I'm hoping moving to Python with a much tighter scope of each of the Infrastructure functions will make it easier for new contributors to get involved with the infrastructure side.

Okay, I think I can compromise on the notice in the spec. Most mod authors probably don't care about diverting epochs, else they would also care whether their mods are well-ordered.
But I would find a way to easily see especially big auto incrementations nice to have. I can imagine another attribute in the outgoing SQS

Versions are hard. Everyone has an opinion on how they work and it can be really hard to be consistent when doing things by hand.

@HebaruSan has more than demonstrated that we have a problem to address. I'm pondering maybe we set a flag of some description (maybe staged? maybe it's own flag? epoched?), so we can raise PRs initially. Let me think on it, but for now I'm gunna give this the thumbs up because it fits our IfItsBetterMergedIt™ philosophy.

Copy link
Member

@techman83 techman83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From brief read of the code, this all looks good to me. I'll leave the full review to a more experienced C# contributor as the scope is a little wider than I'm familiar with.

@HebaruSan
Copy link
Member Author

One of the really big successes of the CKAN has been the ability to not force mod authors to do anything special

I find that truly impressive as well. Debian assigns each package a maintainer who has to track patches and compile it and so on. CKAN not only uses the upstream packages unaltered, but it indexes them automatically! It's amazing that this works as well as it does.

lets keep it in python. It'll help us move fast, iterate and keep the scheduler light

Sounds good. My only concern is that the version comparison logic is fairly complex (hence the idea to re-use it directly), but whatever works is fine with me.

public int CompareTo(ModuleVersion other)
{
Comparison stringComp(string v1, string v2)
{
var comparison = new Comparison { FirstRemainder = "", SecondRemainder = "" };
// Our starting assumptions are that both versions are completely
// strings, with no remainder. We'll then check if they're not.
var str1 = v1;
var str2 = v2;
// Start by walking along our version string until we find a number,
// thereby finding the starting string in both cases. If we fall off
// the end, then our assumptions made above hold.
for (var i = 0; i < v1.Length; i++)
{
if (char.IsNumber(v1[i]))
{
comparison.FirstRemainder = v1.Substring(i);
str1 = v1.Substring(0, i);
break;
}
}
for (var i = 0; i < v2.Length; i++)
{
if (char.IsNumber(v2[i]))
{
comparison.SecondRemainder = v2.Substring(i);
str2 = v2.Substring(0, i);
break;
}
}
// Then compare the two strings, and return our comparison state.
// Override sorting of '.' to higher than other characters.
if (str1.Length > 0 && str2.Length > 0)
{
if (str1[0] != '.' && str2[0] == '.')
{
comparison.CompareTo = -1;
}
else if (str1[0] == '.' && str2[0] != '.')
{
comparison.CompareTo = 1;
}
else if (str1[0] == '.' && str2[0] == '.')
{
if (str1.Length == 1 && str2.Length > 1)
{
comparison.CompareTo = 1;
}
else if (str1.Length > 1 && str2.Length == 1)
{
comparison.CompareTo = -1;
}
}
else
{
comparison.CompareTo = string.CompareOrdinal(str1, str2);
}
}
else
{
comparison.CompareTo = string.CompareOrdinal(str1, str2);
}
return comparison;
}
Comparison numComp(string v1, string v2)
{
var comparison = new Comparison { FirstRemainder = "", SecondRemainder = "" };
var minimumLength1 = 0;
for (var i = 0; i < v1.Length; i++)
{
if (!char.IsNumber(v1[i]))
{
comparison.FirstRemainder = v1.Substring(i);
break;
}
minimumLength1++;
}
var minimumLength2 = 0;
for (var i = 0; i < v2.Length; i++)
{
if (!char.IsNumber(v2[i]))
{
comparison.SecondRemainder = v2.Substring(i);
break;
}
minimumLength2++;
}
if (!int.TryParse(v1.Substring(0, minimumLength1), out var integer1))
integer1 = 0;
if (!int.TryParse(v2.Substring(0, minimumLength2), out var integer2))
integer2 = 0;
comparison.CompareTo = integer1.CompareTo(integer2);
return comparison;
}
if (other == null)
throw new ArgumentNullException(nameof(other));
if (other._epoch == _epoch && other._version.Equals(_version))
return 0;
// Compare epochs first.
if (_epoch != other._epoch)
return _epoch > other._epoch ? 1 : -1;
// Epochs are the same. Do the dance described in
// https://github.com/KSP-CKAN/CKAN/blob/master/Spec.md#version-ordering
var tuple = new Tuple<string, string>(_string, other._string);
if (ComparisonCache.TryGetValue(tuple, out var ret))
return ret;
Comparison comp;
comp.FirstRemainder = _version;
comp.SecondRemainder = other._version;
// Process our strings while there are characters remaining
while (comp.FirstRemainder.Length > 0 && comp.SecondRemainder.Length > 0)
{
// Start by comparing the string parts.
comp = stringComp(comp.FirstRemainder, comp.SecondRemainder);
// If we've found a difference, return it.
if (comp.CompareTo != 0)
{
ComparisonCache.TryAdd(tuple, comp.CompareTo);
return comp.CompareTo;
}
// Otherwise, compare the number parts.
// It's okay not to check if our strings are exhausted, because
// if they are the exhausted parts will return zero.
comp = numComp(comp.FirstRemainder, comp.SecondRemainder);
// Again, return difference if found.
if (comp.CompareTo != 0)
{
ComparisonCache.TryAdd(tuple, comp.CompareTo);
return comp.CompareTo;
}
}
// Oh, we've run out of one or both strings.
if (comp.FirstRemainder.Length == 0)
{
if (comp.SecondRemainder.Length == 0)
{
ComparisonCache.TryAdd(tuple, 0);
return 0;
}
// They *can't* be equal, because we would have detected that in our first test.
// So, whichever version is empty first is the smallest. (1.2 < 1.2.3)
ComparisonCache.TryAdd(tuple, -1);
return -1;
}
ComparisonCache.TryAdd(tuple, 1);
return 1;
}

it fits our IfItsBetterMergedIt™ philosophy

For what it's worth, I come from an "adversarial code review" environment where the goal is to find as many problems as possible and address them to the satisfaction of the reviewer before anything gets merged, so I'm quite happy to field objections and questions. Please don't feel like you have to approve anything of mine before you're satisfied with it. 👌

Versions are hard. Everyone has an opinion on how they work and it can be really hard to be consistent when doing things by hand.

I think most authors would be surprised that adding or removing a 'v' prefix would be significant, for example. I think most of them assume that we would just naturally sort versions in the order they were released chronologically, rather than according to version numbers.

I'm pondering maybe we set a flag of some description (maybe staged? maybe it's own flag? epoched?), so we can raise PRs initially.

Interesting idea, almost certainly do-able. Set a flag when the epoch incrementing code gets called, flow that through to the Staged attribute. As long as the downstream bot component handles that attribute, we'd get PRs for out of order releases. Nice!

@techman83
Copy link
Member

I find that truly impressive as well. Debian assigns each package a maintainer who has to track patches and compile it and so on. CKAN not only uses the upstream packages unaltered, but it indexes them automatically! It's amazing that this works as well as it does.

I did a short talk on indexing and archiving at LCA2018 -> https://www.youtube.com/watch?v=Y2e8wjcQo8g

Sounds good. My only concern is that the version comparison logic is fairly complex (hence the idea to re-use it directly), but whatever works is fine with me.

No probs, if it gets too hard we can come back to it. But I think with some tests it should be ok, any edge cases that crop up we can write more tests. It's likely the first pass won't include the functionality, but once we're running and stable, adding it won't take much work.

Interesting idea, almost certainly do-able. Set a flag when the epoch incrementing code gets called, flow that through to the Staged attribute. As long as the downstream bot component handles that attribute, we'd get PRs for out of order releases. Nice!

Yeah, I think at least initially we might want to be involved with epochs and we'll already have the functionality to do it.

@HebaruSan
Copy link
Member Author

I did a short talk on indexing and archiving at LCA2018 -> https://www.youtube.com/watch?v=Y2e8wjcQo8g

I've seen it - who did you think that 'like' was from! 😁

@HebaruSan
Copy link
Member Author

HebaruSan commented Jul 7, 2019

No probs, if it gets too hard we can come back to it. But I think with some tests it should be ok, any edge cases that crop up we can write more tests. It's likely the first pass won't include the functionality, but once we're running and stable, adding it won't take much work.

Turns out that mono ckan.exe compare --machine-readable versionA versionB already exposes the version comparison logic (thanks to #2773), so even if we decide it's too hard to re-do the logic in Python, there's no need for a new command.

EDIT: Some Python code to take advantage of that:

https://gist.github.com/HebaruSan/1fc031cbd97a9a1db2c7bf3c46e9894f

@techman83
Copy link
Member

Yeah, only problem is that calling subprocesses is expensive, which we'll have to do many times per mod. I have looked into binding directly to C# with python, but it looked very dependent on the code and what it's dependencies were. The version comparison is probably pretty tight in scope without dependencies, so building a ckan_version.dll is probably doable. But I think we can defer that for now. I've got the bulk of the harder logic in the new indexer done, mostly tying it all together left now.

@HebaruSan
Copy link
Member Author

Rebased to resolve conflicts now that the Scheduler might be ready soon.

Copy link
Member

@techman83 techman83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall really like the concept. My only question, at least initially, do we want to set the staging flag + reason when an out of order release is detected? Might be good at least initially and if we're happy we can drop that requirement.

@HebaruSan
Copy link
Member Author

do we want to set the staging flag + reason when an out of order release is detected?

At least in the early stages, yes, I agree that would be a good idea. I'll make a note to add that.

@HebaruSan
Copy link
Member Author

HebaruSan commented Nov 28, 2019

OK, an auto-epoch event should now turn on staging downstream. It was a little awkward because there wasn't an easy way to send data outside of the inflated metadata itself. I ended up adding the staging data to TransformOptions, which was the closest thing we had to an out-of-band channel.

@HebaruSan HebaruSan added In progress We're still working on this and removed In progress We're still working on this labels Nov 28, 2019
@HebaruSan
Copy link
Member Author

HebaruSan commented Nov 28, 2019

Before we merge this, we should make sure that x_netkan_allow_out_of_order is added to any mods that regularly do backports, otherwise they might get unnecessarily epoched (yes, the staging will help, but it would be better to get valid metadata and not have to clean it up manually). Short list off the top of my head:

  • KSPInterstellarExtended
  • Kopernicus-Backports

@DasSkelett, can you think of any others? Maybe Kerbalism?

@DasSkelett
Copy link
Member

Before we merge this, we should make sure that x_netkan_allow_out_of_order is added to any mods that regularly do backports, otherwise they might get unnecessarily epoched (yes, the staging will help, but it would be better to get valid metadata and not have to clean it up manually). Short list off the top of my head:

* KSPInterstellarExtended

* Kopernicus-Backports

@DasSkelett, can you think of any others? Maybe Kerbalism?

Kerbalism seems to only have multi-version releases lately, without need for "backports". It should be fine with auto-epoching.

Historian-Expanded had two releases for different KSP versions closely after another a few days ago, but they were in the right order, and this hasn't happened before, so not sure.
Otherwise the only two coming to my mind are the two you already mentioned.

Copy link
Member

@DasSkelett DasSkelett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With staging turned on, I'm happy to give this a go (once the mentioned netkans have been changed).

@HebaruSan
Copy link
Member Author

Auto PRs for future reference:

Manual clean-up/prep:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New features or functionality In progress We're still working on this Netkan Issues affecting the netkan data Tests Issues affecting the internal tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants