Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-67681] ProcessTree$Windows#killAll is slow on Windows #6236

Merged
merged 5 commits into from
Sep 27, 2022

Conversation

jtnord
Copy link
Member

@jtnord jtnord commented Jan 31, 2022

before the update of winp the test (buildStabilityReports(hudson.model.FreeStyleProjectTest)) on my machine sometimes times out, sometimes completes in 90+ seconds (it really depends on the amount of processes running at the time - which is normally dependant on the amount of chrome tabs I have open)

After the update using the SNAPSHOT version the test takes 22 seconds.

See JENKINS-67681.

Proposed changelog entries

  • JENKINS-67681: Improve performance when killing processes at the end of a build on Windows.

Proposed upgrade guidelines

N/A

Submitter checklist

  • (If applicable) Jira issue is well described
  • Changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developer, depending on the change). Examples
    • Fill-in the Proposed changelog entries section only if there are breaking changes or other changes which may require extra steps from users during the upgrade
  • Appropriate autotests or explanation to why this change has no tests
  • For dependency updates: links to external changelogs and, if possible, full diffs

Desired reviewers

@mention

Maintainer checklist

Before the changes are marked as ready-for-merge:

  • There are at least 2 approvals for the pull request and no outstanding requests for change
  • Conversations in the pull request are over OR it is explicit that a reviewer does not block the change
  • Changelog entries in the PR title and/or Proposed changelog entries are correct
  • Proper changelog labels are set so that the changelog can be generated automatically
  • If the change needs additional upgrade steps from users, upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the PR title. (example)
  • If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).

@jtnord
Copy link
Member Author

jtnord commented Jan 31, 2022

jenkinsci/winp#69

@timja
Copy link
Member

timja commented Feb 1, 2022

Aw was hoping the tests may have sped up.

Still really slow :(

Linux = ~1hr 30
Windows = 3hr 51

@jtnord
Copy link
Member Author

jtnord commented Feb 1, 2022

the did speed up - just not enough, but the FreeStyleProject Test now takes [approx 27s] ((https://ci.jenkins.io/job/Core/job/jenkins/job/PR-6236/2/testReport/hudson.model/FreeStyleProjectTest/Windows_jdk11___Windows_Build___Test___buildStabilityReports/) so now well within the 180s timeout (disregard the surefire report as that includes the jenkins core startup in the JenkinsRule) (at least I am hoping the timeout excludes starting jenkins - on linux it took ~15 seconds in CI).

Starting Jenkins is much slower in windows than Linux, - not sure exactly why yet - that may still account for some portion of the difference.

that said the cli module takes half the time on linux as it does in windows. I am not sure what the hardware specs are for both Linux agents and windows ones.

forkCount in the test module is 2 , for the other modules it is 0.5C

I am expecting some difference between the OSes - but almost twice across the board seems wrong (but is not caused by this PR).

@timja
Copy link
Member

timja commented Feb 1, 2022

I am not sure what the hardware specs are for both Linux agents and windows ones.

Both are using 4 cores and 8gb ram

Linux hardware if https://aws.amazon.com/ec2/instance-types/m5/

ACI doesn't exactly say but generally good hardware by the looks of it:
https://docs.microsoft.com/en-us/azure/container-instances/container-instances-faq#what-underlying-infrastructure-does-aci-run-on-

@jtnord
Copy link
Member Author

jtnord commented Feb 2, 2022

From @MarkEWaite

The env var technique worked. The Windows agents that were allocated have 4 cores. That's much fewer than the 16 cores and 64 GB RAM on the maven-11 machine.

So it looks like Linux is maybe using k8s or something with requests but maybe no limits hence the massive speed differential.

@timja
Copy link
Member

timja commented Feb 2, 2022

So it looks like Linux is maybe using k8s or something with requests but maybe no limits hence the massive speed differential.

17:51:13      resources:
17:51:13        limits:
17:51:13          memory: "8G"
17:51:13          cpu: "4"
17:51:13        requests:
17:51:13          memory: "8G"
17:51:13          cpu: "4"

https://ci.jenkins.io/job/Core/job/jenkins/job/PR-6236/3/consoleFull

limited to 8gb...

not sure what @MarkEWaite is saying, might be confused with highmem instances which are only used for ATH here...

@MarkEWaite
Copy link
Contributor

MarkEWaite commented Feb 2, 2022

not sure what @MarkEWaite is saying, might be confused with highmem instances which are only used for ATH here...

I reused the "check agent availability" acceptance test to report information on the agents that are allocated based on labels. I don't know how Kubernetes allocation limits affect the data reported in the /proc file system. It could be that the /proc file system is misleading compared to actual resources that Kubernetes allows an agent to use.

@timja
Copy link
Member

timja commented Feb 2, 2022

not sure what @MarkEWaite is saying, might be confused with highmem instances which are only used for ATH here...

I reused the "check agent availability" acceptance test to report information on the agents that are allocated based on labels. I don't know how Kubernetes allocation limits affect the data reported in the /proc file system. It could be that the /proc file system is misleading compared to actual resources that Kubernetes allows an agent to use.

right seems like whatever that is doing is reporting the host compute, not what's actually available in the container

@timja
Copy link
Member

timja commented May 9, 2022

Closing as this has demonstrated what it needed to

@timja timja closed this May 9, 2022
@basil basil reopened this Sep 26, 2022
@basil basil added the rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted label Sep 26, 2022
@basil basil marked this pull request as ready for review September 27, 2022 00:15
@basil basil changed the title demonstrating JENKINS-67681 [JENKINS-67681] ProcessTree$Windows#killAll is slow on Windows Sep 27, 2022
Copy link
Member

@basil basil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is now ready for merge. We will merge it after approximately 24 hours if there is no negative feedback. Please see the merge process documentation for more information about the merge process. Thanks!

@basil basil added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Sep 27, 2022
@basil
Copy link
Member

basil commented Sep 27, 2022

Closing as this has demonstrated what it needed to

But regardless of demonstrating what it needed to, is not the ultimate goal to deliver value to end users? That goal has not yet been achieved.

@timja
Copy link
Member

timja commented Sep 27, 2022

Closing as this has demonstrated what it needed to

But regardless of demonstrating what it needed to, is not the ultimate goal to deliver value to end users? That goal has not yet been achieved.

It was a SNAPSHOT build that was in place for over 3 months with no progress and no sign the submitter was going to adopt it / release it.

@jtnord
Copy link
Member Author

jtnord commented Sep 27, 2022

Closing as this has demonstrated what it needed to

But regardless of demonstrating what it needed to, is not the ultimate goal to deliver value to end users? That goal has not yet been achieved.

It was a SNAPSHOT build that was in place for over 3 months with no progress and no sign the submitter was going to adopt it / release it.

I was waiting for the transfer of the upstream repo as it was not releasable until that happened (at least no one who owned that repo wanted to merge and make a release) - and had missed that the transfer had actually happened.

@basil
Copy link
Member

basil commented Sep 27, 2022

Let us keep our sights focused on the delivery of value to end users through the timely development, review, merge, and release of code.

@basil basil merged commit 5263f20 into jenkinsci:master Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants