Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More intelligent diffing in changes of file.serialize state #48609

Merged
merged 10 commits into from
Dec 11, 2018
Merged

More intelligent diffing in changes of file.serialize state #48609

merged 10 commits into from
Dec 11, 2018

Conversation

anlutro
Copy link
Contributor

@anlutro anlutro commented Jul 16, 2018

I wanted some feedback on this minor feature before I tidy it up, write tests etc. - I've confirmed it works with a local checkout of the salt codebase.

Basically I have a JSON config file being generated by a Go application, and I want to manage parts of it with Salt. However, because Python and Go order their JSON differently (order of keys mostly), the diff output is always massive, making it impossible to see what changed.

I found salt.utils.dictdiffer which does exactly what I want in this scenario. I added an optional argument to file.serializer which switches the behaviour of the state's changes output.

Some problems/questions which I'd like feedback on:

  • Should this behaviour be the default or is it best to keep it behind this optional argument? Is the name of the argument okay?
  • The changes output is potentially misleading, because even if the serialized data doesn't change, the file's contents (indentation or whatever) may change, which could screw up onchanges etc.
  • dictdiffer only works on dicts, but it's possible to serialize other data types such as lists.

@rallytime rallytime requested review from a team July 26, 2018 21:37
@DmitryKuzmenko
Copy link
Contributor

Basically I'm OK with the idea.

Should this behaviour be the default or is it best to keep it behind this optional argument?

False by default as for me, to not break this for others.

Is the name of the argument okay?

Maybe shorter? show_diff? But I like to hear an opinion of somebody else.

The changes output is potentially misleading, because even if the serialized data doesn't change, the file's contents (indentation or whatever) may change, which could screw up onchanges etc.

Maybe just document this point?

dictdiffer only works on dicts, but it's possible to serialize other data types such as lists.

Not sure it's a good idea but the simplest way is if it's not a dict make it a dict {'data': existing_data} or better implement a list diffing logic.

@anlutro
Copy link
Contributor Author

anlutro commented Jul 28, 2018

Isn't show_diff reserved as a global argument meant to control whether any diff at all is shown? I think it sounds a little bit misleading anyway - if it's false, the diff is still shown, it's just shown in a different way.

@DmitryKuzmenko
Copy link
Contributor

@anlutro I'm ok then. Sorry for misleading

@anlutro
Copy link
Contributor Author

anlutro commented Jul 30, 2018

Cool. I will work on the list stuff as well as docs a bit later today, hopefully.

Copy link
Contributor

@rallytime rallytime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with this change, but the current behavior should be the default. I'd also like @terminalmage to take a look here as well.

@@ -6374,6 +6375,7 @@ def serialize(name,
encoding=None,
encoding_errors='strict',
serializer_opts=None,
diff_serializer_data=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be False by default to keep the previous behavior, yes?

@rallytime rallytime requested a review from terminalmage July 30, 2018 14:03
Copy link
Contributor

@terminalmage terminalmage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like anything that helps normalize changes. However, I don't think we need to hide this behavior behind an argument. I'm not sure I understand what changing the format of the diffs in the changes key of the return data is going to "break".

IMO this behavior should be the new default and we don't need to make it contingent on an argument passed to the state.

blind-review

@rallytime
Copy link
Contributor

OK, after talking to @terminalmage about this offline, I agree with him. We definitely need a mention of the change in the release notes though.

@anlutro
Copy link
Contributor Author

anlutro commented Jul 31, 2018

Okay, cool. I just need to figure out how to do the diffing, then. I can't find anything in dictdiffer that does list diffs in particular, and the listdiffer module seems to be aimed at lists of dictionaries in particular. Will I need to implement it myself? What would the module name be, since listdiffer is already taken?

@anlutro
Copy link
Contributor Author

anlutro commented Jul 31, 2018

On second thought, after considering how difficult smartly diffing lists is, maybe I should just not do that... But that might mean that data structures such as lists of dicts or lists of lists of dicts won't get smartly diffed.

Maybe I can simply wrap the dictdiff call in an if isinstance(data, dict) and call it a day?

@rallytime rallytime requested a review from terminalmage August 14, 2018 18:53
Copy link
Contributor

@terminalmage terminalmage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anlutro Can you elaborate on what you mean by list diffing? The recursive diffing in dictdiffer does show where lists differ.

To me this looks good as-is, but since I'm not sure where you're coming from, I want to make sure we're on the same page before I approve this PR for merging.

@anlutro
Copy link
Contributor Author

anlutro commented Aug 20, 2018

I mean if we have this JSON data: [1,2,3] and we change it to [1,2,4] we could show that 3 has been removed and 4 has been added. But when I thought more about it, it would be really difficult to implement.

@terminalmage
Copy link
Contributor

Oh OK, so if the data is being serialized and was formed as a list instead of a dict.

I guess you could always do type checking like you suggested, but we would still want there to be something in the changes key of the return dictionary if we were dealing with lists. the reason is that requisites such as onchanges look at the changes dict to detect whether or not any changes happened, so if we want these requisites to work properly, we do need to have some value there.

Copy link
Contributor

@isbm isbm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no much to add here what already has been said.

@anlutro anlutro changed the title [wip] implement file.serialize dict diffing More intelligent diffing in changes of file.serialize state Aug 21, 2018
@cachedout
Copy link
Contributor

@anlutro Are you ready for a final review here? I saw that you removed the WIP indicator from the PR title.

@anlutro
Copy link
Contributor Author

anlutro commented Aug 28, 2018 via email

@rallytime
Copy link
Contributor

@anlutro I think this test failure is related: https://jenkinsci.saltstack.com/job/pr-kitchen-ubuntu1604-py2/job/PR-48609/10/

Can you take a look?

Copy link
Contributor

@terminalmage terminalmage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still nothing in ret['changes'] if the data is non-dictionary. We need to have at least something in there for onchanges requisites to work.

@anlutro
Copy link
Contributor Author

anlutro commented Aug 29, 2018

@rallytime Ah - fixed that now. Let's see if other failures pop up.

@terminalmage ret['changes'] will be set by file.check_managed_changes or file.manage_file, we just overwrite it if we think we can.

Sorry for the late replies, I'm out travelling.

@terminalmage
Copy link
Contributor

Ah ok, thanks!

@rallytime
Copy link
Contributor

@cachedout
Copy link
Contributor

@anlutro Any updates here?

@anlutro
Copy link
Contributor Author

anlutro commented Oct 16, 2018

Hey, sorry for going quiet. I see there's a test failure but it doesn't make any sense to me. It looks like self.run_state is returning a boolean - why? I'm pretty sure none of the code I added/modified could trigger this behavior.

Copy link
Contributor

@terminalmage terminalmage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started new tests since the old results are aged out of Jenkins. Pending getting them fixed, I think this is good to merge.

@cachedout
Copy link
Contributor

@terminalmage
Copy link
Contributor

I just pushed a commit to this PR to fix the docs build issue, @anlutro. You'll need to pull before you make any more changes.

@cachedout
Copy link
Contributor

I'll rebase this PR in the mean time.

@terminalmage
Copy link
Contributor

I found and fixed a problem in this PR, we were including the dictdiffer object in the changes dict and msgpack couldn't serialize it. 1af036a should fix this.

@terminalmage terminalmage merged commit 63a75b3 into saltstack:develop Dec 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants