Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented the update_endtime and get_endtime functions for the etcd_returner from PR #51363. #55186

Closed
wants to merge 90 commits into from

Conversation

arizvisa
Copy link
Contributor

@arizvisa arizvisa commented Nov 2, 2019

What does this PR do?

This PR implements the update_endtime and get_endtime functions that are also implemented in the salt.returners.local_cache returner. It seems that these functions are updated when using the display_progress functionality in some of the runners. These aren't documented anywhere but within that returner, but this should implement everything that is supported collectively by all of the returners. The only thing it fails to implement is the get_register and load_register functions(?) which was only used by thorium which has since been deprecated.

The documentation for this PR has also been updated to describe the new ".endtime" addition to the schema.

Project 5 has a card mentioning that PR #51363 needs to be ported to master. Since I was the original author of that PR and needed these capabilities, I re-based it on top of master and added the new functionality. This PR is the result of that port with the new additions.

What issues does this PR fix or reference?

This is PR #51363, ported to master as mentioned in project https://github.com/saltstack/salt/projects/5#card-28249107, along with two new functions that make it consistent with the salt.returners.local_cache returner.

Previous Behavior

Previously these functions weren't implemented as none of the other returners seemed to define them. This might be for a particular case, but I figured that we might as well have a complete returner implementation that can be used as a reference.

New Behavior

Now jobs can be updated with the endtime that is passed as a string to the update_endtime function.

Tests written?

This uses the exact same interface as the previous etcd returner that was already merged, so the already existing tests should exhaustively test this.

Commits signed with GPG?

No

(edited to describe the new editions of this PR. Please review history of this comment to see prior reason for the PR)

@arizvisa
Copy link
Contributor Author

Force-pushed due to rebase.

Also implemented the get_endtime and update_endtime functions that are supported by the salt.returners.local_cache returner.

@arizvisa arizvisa changed the title Ported PR #51363 to master and fixed TypeError being raised in the etcd returner when logging debug-leveled messages in Python3 Implemented the update_endtime and get_endtime functions for the etcd_returner from PR #51363. Jan 22, 2020
@arizvisa
Copy link
Contributor Author

This PR's initial comment was also updated to reference the new additions. Please review the history if you wish to see the initial reason for the PR prior to the addition of the new functionality.

arizvisa added a commit to arizvisa/lolfuzz3 that referenced this pull request Jan 22, 2020
arizvisa added a commit to arizvisa/lolfuzz3 that referenced this pull request Jan 22, 2020
@Ch3LL Ch3LL removed the request for review from a team April 15, 2020 14:34
@Ch3LL Ch3LL self-assigned this Apr 15, 2020
@arizvisa arizvisa force-pushed the etcd-returner-is-busted branch from 3ac9974 to f161c64 Compare April 21, 2020 22:18
@arizvisa
Copy link
Contributor Author

force-pushed due to yet another rebase.

@arizvisa
Copy link
Contributor Author

what's going on with this, it's been 2 years back when you guys were asking people to merge into the develop branch?

This PR significantly improves the etcd returner whilst still retaining the original etcd functionality.

@sagetherage sagetherage assigned garethgreenaway and unassigned Ch3LL May 5, 2020
@dwoz dwoz added has-failing-test Needs-Testcase PR needs test cases written, or the issue is about a bug/feature that needs test cases labels May 13, 2020
arizvisa added 8 commits May 22, 2020 18:46
…d into salt.utils but didn't test anything and hid all the exceptions. To restore functionality we just pass through to the client attribute.
…table manner by ensuring None is always returned (the caller seems to depend on this) and empty strings aren't deserialized.
…hing the EtcdKeyNotFound exception, added a ton of comments, and debug + trace logs.
…organized the trace logs to output the key that was returned from EtcdResult, and re-worded the logs so they use the word "job" when referring to a jid.
@arizvisa
Copy link
Contributor Author

This was tagged with "needs-testcase". Not sure why here as it's backwards compatible with the already-existing etcd returner testcases.

@arizvisa
Copy link
Contributor Author

The errors before the prior rebase are all similar to the following which isn't related to the PR.

18:20:55 23:20:55  Connecting to https://api.github.com using salt-jenkins/****** (Service User for accessing github api (public_repo access only))
18:20:56  Connecting to https://api.github.com to check permissions of obtain list of arizvisa for saltstack/salt
18:20:56  Loading trusted files from base branch master at 1ce5f24c65169edeb071f59bf7d26fd3fa1102f5 rather than b1e59a4c8f830edf028c8a022a812461a7384a0e
18:20:57  ERROR: Could not do lightweight checkout, falling back to heavyweight
18:20:57  java.io.FileNotFoundException
18:20:57  	at jenkins.plugins.git.GitSCMFile.lambda$content$2(GitSCMFile.java:164)
18:20:57  	at jenkins.plugins.git.GitSCMFileSystem.lambda$invoke$e8567d7e$1(GitSCMFileSystem.java:182)
18:20:57  	at org.jenkinsci.plugins.gitclient.AbstractGitAPIImpl.withRepository(AbstractGitAPIImpl.java:29)
18:20:57  	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.withRepository(CliGitAPIImpl.java:81)
18:20:57  	at jenkins.plugins.git.GitSCMFileSystem.invoke(GitSCMFileSystem.java:182)
18:20:57  	at jenkins.plugins.git.GitSCMFile.content(GitSCMFile.java:158)
18:20:57  	at jenkins.scm.api.SCMFile.contentAsString(SCMFile.java:335)
18:20:57  	at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:107)
18:20:57  	at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:309)
18:20:57  	at hudson.model.ResourceController.execute(ResourceController.java:97)
18:20:57  	at hudson.model.Executor.run(Executor.java:428)
18:20:58  Checking out git https://github.com/saltstack/salt.git into /var/lib/jenkins/workspace/entos7-py3-pycryptodome_PR-55186@script to read .ci/kitchen-centos7-py3-pycryptodome

…this PR yet nobody has merged this for 2 years without mentioning any kind of reason.
@arizvisa
Copy link
Contributor Author

ci/py3/ubuntu1604 is erroring out because of:

java.io.FileNotFoundException: https://api.github.com/repos/saltstack/salt
18:58:22  	at com.squareup.okhttp.internal.huc.HttpURLConnectionImpl.getInputStream(HttpURLConnectionImpl.java:243)
18:58:22  	at com.squareup.okhttp.internal.huc.DelegatingHttpsURLConnection.getInputStream(DelegatingHttpsURLConnection.java:210)
18:58:22  	at com.squareup.okhttp.internal.huc.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:25)
18:58:22  	at org.kohsuke.github.GitHubHttpUrlConnectionClient$HttpURLConnectionResponseInfo.bodyStream(GitHubHttpUrlConnectionClient.java:197)
18:58:22  	at org.kohsuke.github.GitHubResponse$ResponseInfo.getBodyAsString(GitHubResponse.java:313)
18:58:22  	at org.kohsuke.github.GitHubResponse.parseBody(GitHubResponse.java:80)
18:58:22  	at org.kohsuke.github.Requester.lambda$fetch$1(Requester.java:71)
18:58:22  	at org.kohsuke.github.GitHubClient.createResponse(GitHubClient.java:406)
18:58:22  	at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:360)
18:58:22  Caused: org.kohsuke.github.GHFileNotFoundException: https://api.github.com/repos/saltstack/salt 
18:58:22  	at org.kohsuke.github.GitHubClient.interpretApiError(GitHubClient.java:437)
18:58:22  	at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:368)
18:58:22  	at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:312)
18:58:22  	at org.kohsuke.github.Requester.fetch(Requester.java:71)
18:58:22  	at org.kohsuke.github.GitHub.getRepository(GitHub.java:533)
18:58:22  	at org.jenkinsci.plugins.github_branch_source.GitHubSCMSource$DeferredPermissionsSource.fetch(GitHubSCMSource.java:2796)
18:58:22  	at org.jenkinsci.plugins.github_branch_source.GitHubSCMSourceRequest.getPermissions(GitHubSCMSourceRequest.java:473)
18:58:22  	at org.jenkinsci.plugins.github_branch_source.ForkPullRequestDiscoveryTrait$TrustPermission.checkTrusted(ForkPullRequestDiscoveryTrait.java:340)
18:58:22  	at org.jenkinsci.plugins.github_branch_source.ForkPullRequestDiscoveryTrait$TrustPermission.checkTrusted(ForkPullRequestDiscoveryTrait.java:323)
18:58:22  	at jenkins.scm.api.trait.SCMHeadAuthority.isTrusted(SCMHeadAuthority.java:101)
18:58:22  	at jenkins.scm.api.trait.SCMSourceRequest.isTrusted(SCMSourceRequest.java:213)
18:58:22  	at org.jenkinsci.plugins.github_branch_source.GitHubSCMSource.getTrustedRevision(GitHubSCMSource.java:1789)
18:58:22  	at org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.java:102)
18:58:22  	at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:309)
18:58:22  	at hudson.model.ResourceController.execute(ResourceController.java:97)
18:58:22  	at hudson.model.Executor.run(Executor.java:428)
18:58:22  Finished: FAILURE

@arizvisa
Copy link
Contributor Author

arizvisa commented Jun 26, 2020

Not sure where to mention this as the original PR is in the process of being ported to master.

Despite the mered PR passing all of the tests from the original returner, I just realized that it's not "exactly" backwards-compatible with the previous etcd returner schema and this is due to the names for the keys not being plural. The schema for the previous returner was very minimal and not actually linking the ttl between jobs and events. This caused races when using the former returner that could result in orphaned events or jobs when querying them. Albeit verbose, I tried to document how this was fixed in the PR's module documentation where I explain the schema and the relationships between the records.

The aforementioned backwards-compatibility issue, however, is easily remedied as I wrote all the keys that are used into a Schema dictionary (immediately after the module's documentation) so that you guys can adjust it in whatever way you see fit. So the new returner can use the naming from the older returner's schema by changing the value for the Schema["job-cache"] key from "job" to "jobs". Then the Schema["minion-fun"] key's value should be renamed to "minions" instead of the "minion.job" name.

The usage of the "minion.job" name was originally chosen to show that there's a relationship between the job (job-cache) and the minion (event-returner) in that when one of them expires, the other one expires as well. The "lock" that's used is also tied to the events that are generated by the job using the unique index that etcd creates for keys. When the lock is expired (or removed), all the records related to it are then also expired. In the future this can be used to enumerate the events associated with a particular job (as I'm currently doing this w/ the etcd client to help troubleshoot events that are attached to my states)

Other than these naming issues I described how to fix, I think everything else should be backwards compatible with the schema used by the previous implementation of the etcd returner.

Copy link
Contributor

@Ch3LL Ch3LL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not seeing etcd returner tests in the tests directory so this will need test coverage added, but only for the changes you are making.

@arizvisa
Copy link
Contributor Author

This is still waiting for the original PR to get merged as that one's been locked in project #5 for almost 2 years, hence there hasn't been any continued development on this until that occurs. I can write tests for this around then when that base PR gets merged, but since returners don't seem to have been shown much "love" wrt testing.. I really have no idea how to legitimately get to this component since even the local_cache returner is only testing clean_old_jobs.

On another note, none of the returners have implementations for update_endtime and get_endtime for that matter. As local_cache is the only one of the returners that implements these functions, and it seems to only e bused by job runners, is this functionality really a thing, or will it get phased out whenever you guys get to take a look at them?

Which one of the returners would you guys consider a reference implementation that all the returners should be modeled upon?

@waynew
Copy link
Contributor

waynew commented Sep 24, 2020

This is still waiting for the original PR to get merged as that one's been locked in project #5 for almost 2 years, hence there hasn't been any continued development on this until that occurs.

Hey @arizvisa I'm not sure which original PR you were referring to - I couldn't find it looking through these comments or the project board - do you have a PR number of the original PR that this is waiting on?

I can write tests for this around then when that base PR gets merged, but since returners don't seem to have been shown much "love" wrt testing.. I really have no idea how to legitimately get to this component since even the local_cache returner is only testing clean_old_jobs.

I took a glance at the changes here, but there's a lot of them, so a good method of testing was not immediately apparent. I'm happy to take a closer look if it's not obvious based on existing tests (I'm guessing not), as you point out there's not been much love - only 13 returner test files between tests/unit/returners and tests/integration/returners

On another note, none of the returners have implementations for update_endtime and get_endtime for that matter. As local_cache is the only one of the returners that implements these functions, and it seems to only e bused by job runners, is this functionality really a thing, or will it get phased out whenever you guys get to take a look at them?

I'm not super familiar with returners, and I'm not sure what the history is there - I'll put this on our team agenda and make sure that we're on the same page in regards to either fully supporting the functionality or deprecation/removal.

Which one of the returners would you guys consider a reference implementation that all the returners should be modeled upon?

I'll also ask this question.

Thanks for your efforts!

@arizvisa
Copy link
Contributor Author

@waynew, it's this one: #51363

@arizvisa
Copy link
Contributor Author

And I think you can find it on the project boards with this link: https://github.com/saltstack/salt/projects/5?card_filter_query=etcd

Thanks a ton for looking into this. Etcd3 is already out which has a different API and support both the Etcd2/Etcd3 apis, but we're still using Etcd2 at the moment while waiting for that PR to get merged.

@arizvisa
Copy link
Contributor Author

arizvisa commented Sep 24, 2020

I actually tried to write that PR to be a sort-of reference implementation with a ton of docs (and detailed description of its schema), and to take advantage of as many features of etcd as made sense (which is why I pushed for it so hard way back when). As you confirmed, the runners don't receive a lot of love.

For my systems the etcd returners combined with all the other etcd components in salt are a necessity since the systems are heavily CoreOS based w/ etcd seeding everything and their salt-masters always running in a container. Despite it likely being an uncommon configuration, I'd love to see salt's etcd support become more of a thing in similar setups, heh. That old PR is pretty much the first step.

@Ch3LL
Copy link
Contributor

Ch3LL commented Dec 13, 2021

Just following up here. Any bandwidth to finish this one up? Looks like test coverage and changelog needs to be added. Do you need help with the test coverage?

@arizvisa
Copy link
Contributor Author

Just following up here. Any bandwidth to finish this one up? Looks like test coverage and changelog needs to be added. Do you need help with the test coverage?

me?

@Ch3LL
Copy link
Contributor

Ch3LL commented Dec 16, 2021

Yes, that was my understanding of the status of this PR, but please correct me if i'm wrong.

@arizvisa
Copy link
Contributor Author

I attempted to merge this just over 3 years ago, with #51363 which was previously accepted. However, then it dead-ended into project #5 where I've tried to maintain contact with the maintainer to no avail. I tried to accelerate this with this particular PR back in 2019. However, due to a suggestion, I ended up adding more features as a follow up to the original because only one of the other returners was using the particular update_endtime and get_endtime querying features. I thought it'd be a good idea to have a well-documented standardized returner to be used as a reference implementation since all of them are very poorly documented. As you can see in the commits for the original #51363, I've added a ton of docs, made sure that the schema was backwards compatible w/ the previous one, and even used etcd-specific features for ordering revisions and properly expiring the job data that is linked together.

Throughout this entire process, I've pinged you guys multiple times with no responses (see #51363), and even did the same in this PR and was left hanging on this one since September of last year. I only have just now gotten a polite followup as of 4 days ago. Now the etcd v2 api, which is what this code was originally written for, is being deprecated as is noted in #61034. Essentially the 82 commits of maintaining this PR over the past 3 years and trying literally everything to get it merged has been for naught.

So, nah. I'm sorry, I know it's not your fault, but unfortunately I'm over it. It is hellacious to get code merged into this project and I'm completely tired of fighting for it. As much as I preferred saltstack compared to ansible or any of the other solutions (and even still prefer it), I've been forced to move on as this software isn't even in my field of expertise and so I can't even make shit up to my boss saying "oh, but we really need this" because it's entirely unrelated to my normal job and the project that I originally needed this for was from years ago.

As Etcdv2 API is being deprecated, I highly recommend you actually put a dever on this returner as really, the only ones in saltstack that are viable are the "local_cache" and the relational database ones. Etcd has a lot of capabilities that make it an elegant manager for job results which is why I worked on this returner way back when. Also keep in mind the special "req" job id as well (whenever you guys write it), as only my "etcd" returner and your "local_cache" returner actually seem to support it properly.

@arizvisa
Copy link
Contributor Author

See #51809 for the special handling of the "req" job id.

@Ch3LL
Copy link
Contributor

Ch3LL commented Jan 13, 2022

I apologize for the inactivity that you have dealt with on the PR and issue. After reviewing them both I can understand your frustrations. I have personally updated my github notification to ensure I am getting notified of more PRs and issues that I need to be responding to. I understand you do not have the time to continue the efforts on this PR. Did you want to close? If someone else wants to pick it up they can surely start from this PR.

Also, regarding etcd2v2 being deprecated I did notice this issue #61034 which is slated for the next release so that should be handled soon.

@arizvisa
Copy link
Contributor Author

yes, any further maintenance of this PR after 1600 lines of code + documentation, after being previously approved yet lost in some fake project, and attempted continued maintenance.. will definitely not happen. even moreso as it's being completely deprecated in favor of an official saltstack employee's different design, and even most-so because development and administration is not-at-all my profession.

the only other returner that actually has tests is local_cache as the rest are basically smoke tests due to them using a write-only medium. if anybody by sheer luck learns about the deprecated etcd returner, they might be able to use it via the awesome module synchronization system you guys fortunately merged half a decade ago.

just do me a favor make your version of the etcd returner not suck. etcd has a lot of features that make it pretty ideal as a r/w distributed returner with the ability to trigger other forms of automation.. such as being able to link and time out keys containing job information automatically after a given period of time (no need to manually spend cpu time expiring old jids). the majority of the returners are entirely incomplete, and the only one that can be considered a reference implementation just thrashes the disk i/o in default configurations which is entirely not ideal on virtualized infrastructure.

@Ch3LL
Copy link
Contributor

Ch3LL commented Jan 18, 2022

Alrighty, I'll close this PR and if you want to add any other suggestions in this issue #61034 to ensure the etcd3 support is adequate I would appreciate it. Please let me know if you want me to re-open.

@Ch3LL Ch3LL closed this Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has-failing-test master-port Needs-Testcase PR needs test cases written, or the issue is about a bug/feature that needs test cases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants