Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Jenkins scripts for running system tests on s390x nodes #6354

Merged
merged 13 commits into from
Mar 9, 2022

Conversation

kun-lu20
Copy link
Contributor

@kun-lu20 kun-lu20 commented Feb 11, 2022

Type of change

Select the type of your PR

  • Enhancement / new feature

Description

Please describe your pull request

This PR is to add s390x Jenkins scripts so that system tests could be run on s390x Jenkins nodes.

Currently this PR includes two patches on s390x. However, they will be removed soon once next version of corresponding artifacts are released.

  1. Patch for Kaniko image. We have raised PR to add s390x support. The Kaniko community has merged this PR and v1.8.0 will be released very soon. See Add s390x support to docker images GoogleContainerTools/kaniko#1749
  2. Patch for rocksdbjni jar file. We have raised PRs in rocksdb repo and Apache Kafka repo to add s390x support. The communities have merged our PRs. rocskdbjni-6.27.3.jar has already been released and Kafka v3.2.0 will include it. See Add support for building on s390x platform facebook/rocksdb#8962 and KAFKA-13599: Upgrade RocksDB to 6.27.3 apache/kafka#11690

We've run Jenkins pipeline internally for testing. It went smoothly with STs in acceptance and other profiles. The logs are attached below. Also we're making good progress on the public Jenkins set up and will share the link to show live result once we're done.

run_acceptance_profile_on_s390x.log
run_keycloak_related_STs_on_s390x.log
run_azp_kafka_oauth_profile_on_s390x.log
run_azp_operators_profile_on_s390x.log
run_azp_connect_mirrormaker_profile_on_s390x.log
run_azp_rolling_update_bridge_profile_on_s390x.log
run_azp_security_profile_on_s390x.log
run_azp_dynconfig_listeners_tracing_watcher_profile_on_s390x.log
run_azp_remaining_profile_on_s390x.log

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@strimzi-ci
Copy link

Can one of the admins verify this patch?

@kun-lu20
Copy link
Contributor Author

I noticed that Operator v0.28.0 is on its way. Could you please take a look at this PR? Thanks very much! @scholzj @Frawless

Signed-off-by: Kun-Lu <[email protected]>
@kun-lu20 kun-lu20 force-pushed the add_jenkins_scripts_for_s390x branch from ae48d74 to 6ae06c5 Compare February 11, 2022 21:19
@scholzj scholzj added this to the 0.29.0 milestone Feb 12, 2022
@scholzj scholzj requested a review from Frawless February 13, 2022 21:10
@kun-lu20 kun-lu20 force-pushed the add_jenkins_scripts_for_s390x branch from ec83560 to e85a11c Compare February 14, 2022 17:17
@kun-lu20 kun-lu20 requested a review from scholzj February 14, 2022 17:19
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more minor comments, but LGTM otherwise I guess. I have pretty much zero experience with Groovy or Jenkins. So I think the review from @Frawless ill be more important than mine.

OPERATOR_IMAGE_PULL_POLICY = "IfNotPresent"
COMPONENTS_IMAGE_PULL_POLICY = "IfNotPresent"
JAVA_VERSION = "11"
TEST_HELM2_VERSION = "v2.17.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed? I don't think we use Helm 2 for anything anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not needed. I've removed it.

PATH = "${env.JAVA_HOME}/bin:${env.PATH}"
DOCKER_BUILDX = "buildx"
DOCKER_BUILD_ARGS = "--platform linux/s390x --load"
BRIDGE_IMAGE = "quay.io/strimzi/kafka-bridge:latest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for? I think the system tests should be normally using the pre-defined bridge version from the installation files and not latest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used version latest for testing before Bridge 0.21.4 which supports s390x was released. I've removed this line. Thanks @scholzj !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes sense.

Copy link
Member

@Frawless Frawless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove a lot of duplicated code. For example jenkins-s390x.groovy contains a lot of code which is the same for x86.

You can also share parts of the pipeline itself and based on the parameter (node label) distinguish it.

failure {
println("Build failed")
script {
lib.sendMail(env.STRIMZI_MAILING_LIST, "failed", env.ghprbPullId, env.ghprbActualCommitAuthor, env.ghprbPullTitle, env.ghprbPullLink, env.BUILD_URL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to set STRIMZI_MAILING_LIST env in your Jenkins, otherwise it will fail on it I think

@kun-lu20
Copy link
Contributor Author

Thanks @Frawless ! I'll remove the duplicate code and get back to you soon.

@kun-lu20
Copy link
Contributor Author

kun-lu20 commented Feb 18, 2022

Hi @Frawless , thanks for your advice!

I've refactored the scripts and removed the redundancy. Could you please take a look? Thank you so much!

@kun-lu20 kun-lu20 requested a review from Frawless February 18, 2022 17:08
@kun-lu20
Copy link
Contributor Author

kun-lu20 commented Feb 18, 2022

Below is the log of running STs in acceptance profile in Jenkins pipeline on s390x after refactoring the scripts:
run_acceptance_profile_on_s390x_after_refactoring.log

@kun-lu20
Copy link
Contributor Author

Hi @Frawless , hope all is well.

Could you please have a look at the code change when you have some time? Thanks!

@scholzj
Copy link
Member

scholzj commented Feb 23, 2022

@kun-lu20 Jakub is on PTO this and next week. I'm afraid his review is really needed for this (as he is pretty much the only maintainer who knows Jenkins), so we might need to wait.

@kun-lu20
Copy link
Contributor Author

@scholzj Got it. Thanks for letting me know!

@@ -7,7 +7,7 @@ def DEFAULT_EXCLUDED_GROUPS = "loadbalancer,networkpolicies"
pipeline {
agent {
node {
label "strimzi-pr"
label "strimzi-pr || strimzi-s390x"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea. Which node will use it when jenkins will have both of the labels configured? I think we should have 2 different pipelines, one for s390x and one for amd64, and change specific parts like call different function to deploy minikube.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Frawless for your time and detailed comments!

OK, I'll adjust the scripts to use 2 different pipelines.

Comment on lines 52 to 55
if(env.TEST_ARCH.contains('s390x')){
//Use hardcoding for ghprbCommentBody since we are not setting up GitHub PRB integration.
env.ghprbCommentBody = "ok to test"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you don't want to run the tests from PR directly? If no, what value does this adds to us? Are you going to run some nightly builds? Is the jenkins instance available for the community to see the results?

Copy link
Contributor Author

@kun-lu20 kun-lu20 Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we were going to run nightly builds. But we could also run the tests from PR directly, I'll make adjustments to the pipeline script. The Jenkins link will be available for the community to see the results.

Comment on lines 95 to 99
if(env.TEST_ARCH.contains('s390x')){
env.DOCKER_TAG="latest"
}else{
env.DOCKER_TAG="pr"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you need to change tag here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do not need to change tag here since we're going to run the tests from PR directly as mentioned above.

Comment on lines 121 to 125
if(env.TEST_ARCH.contains('s390x')){
env.TEST_COUNT_RUNNING_IN_PARALLEL = "1"
}else{
env.TEST_COUNT_RUNNING_IN_PARALLEL = "3"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for this?

Copy link
Contributor Author

@kun-lu20 kun-lu20 Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We found that when setting this arg to 3, some test cases would become unstable and it will cost extra time to rerun them. Probably it's because that we've set MINIKUBE_MEMORY to 8192.

So we followed the arg value used in community's Azure Pipelines and the unstable issue disappeared.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ye, that makes sense. @see-quick jfyi

Comment on lines 233 to 238
if(env.TEST_ARCH.contains('s390x')){
sh(script: "minikube stop || true")
sh(script: "minikube delete || true")
sh(script: "docker stop s390x-registry || true")
sh(script: "docker rm s390x-registry || true")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are going to use some static machine that you need this? If so, you should probably also remove all images

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we used a static VM to run the Jenkins pipeline. We could switch to run the pipeline in a docker container so that we do not need these commands.

1. Create a separate pipeline to run STs on s390x
2. Run STs in ub worker containers

Signed-off-by: Kun-Lu <[email protected]>
@kun-lu20
Copy link
Contributor Author

Hi @Frawless , I've updated the code as per your review comments. The main changes are:

  1. Create a separate pipeline (Jenkinsfile-pr-s390x) to run STs on s390x.
  2. Run STs in Jenkins worker containers instead of VMs.
  3. Can run nightly builds or builds triggered by PRs.

Thank you so much for taking the time to do the previous review. We appreciate that. Please have a look at the code changes if you have some time. Thanks again!

@kun-lu20 kun-lu20 requested a review from Frawless February 26, 2022 02:00
minikube addons enable registry
if [ "$ARCH" = "s390x" ]; then
git clone -b v1.9.11 --depth 1 https://github.com/kubernetes/kubernetes.git
cd kubernetes/cluster/addons/registry/images/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is needed because currently the registry addon doesn't have s390x support yet, we need to build kube-registry-proxy image locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the cd command, wouldn't be better just to use relative path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it. I've changed the code to use relative path to build kube-registry-proxy image.

.jenkins/scripts/Dockerfile_base_s390x Show resolved Hide resolved
Comment on lines 170 to 171
sh(script: "sudo ln -s /opt/apache-maven-3.8.4/bin/mvn /usr/bin/mvn")
sh(script: "sudo ln -s /opt/apache-maven-3.8.4/bin/mvn /usr/local/bin/mvn")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there some better solution for this? Maybe on container preparation level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've moved this to container preparation level (Dockerfile_base_s390x).

.jenkins/Jenkinsfile-pr-s390x Outdated Show resolved Hide resolved
@kun-lu20
Copy link
Contributor Author

Hi @Frawless , I've made code changes as per your review comments. Please take a look when you have some time.

Thank you so much!

@kun-lu20 kun-lu20 requested a review from Frawless February 28, 2022 18:45
Comment on lines 200 to 209
if ("${env.TEST_PROFILE}".contains("upgrade")) {
println("[INFO] Update files for upgrade procedure")
sh(script: """
sed -i 's#:latest#:${env.DOCKER_TAG}#g' ${env.WORKSPACE}/systemtest/src/test/resources/upgrade/StrimziUpgradeST.json ${env.WORKSPACE}/install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml
sed -i 's#quay.io/strimzi/test-client:${env.DOCKER_TAG}#${env.DOCKER_REGISTRY}/strimzi/test-client:${env.DOCKER_TAG}#g' ${env.WORKSPACE}/systemtest/src/test/resources/upgrade/StrimziUpgradeST.json
sed -i 's#quay.io/strimzi/#${env.DOCKER_REGISTRY}/strimzi/#g' ${env.WORKSPACE}/install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml
sed -i 's#/opt/${env.DOCKER_REGISTRY}#/opt#g' ${env.WORKSPACE}/install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml
""")
sh(script: "cat ${env.WORKSPACE}/systemtest/src/test/resources/upgrade/StrimziUpgradeST.json")
sh(script: "cat ${env.WORKSPACE}/install/cluster-operator/060-Deployment-strimzi-cluster-operator.yaml")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can put this into a method and call it in this and in x86 pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Frawless
Copy link
Member

@kun-lu20 I think the code is fine now. Now let's talk about access rights:

1 - Can you share jenkins URL?
2 - Who will be able to trigger the jobs?
3 - Who is the admin of the Jenkins? Is this person somehow connected to our community to listen to community needs?
4 - Which Jenkins plugin do you use for it?
5 - Which mailing list are you going to use for results reporting?

For amd64 we basically have everything on azure that is available to everyone. Comment trigger is available only to maintainers due to security reasons set by azure.

1. Use relative path when building kube-registry-proxy image
2. Add function prepareUpgradeSTs() in jenkins.groovy for reuse

Signed-off-by: Kun-Lu <[email protected]>
@kun-lu20
Copy link
Contributor Author

kun-lu20 commented Mar 1, 2022

Hi @Frawless , thanks very much for your review comments. I've made the code change and here is the reply to your questions:

1 - The Jenkins url is https://148.100.84.136/job/kafka-op-build/job/add_jenkins_scripts_for_s390x/. We're still working on configurations and things might change in the future.

2 - We would like to start with running daily builds on acceptance profile. The pipeline will be triggered on a schedule. Since we are using a fixed number of VMs to host our Jenkins, we would like to limit triggering builds to maintainers as well.

3 - Our team will be the admin for Jenkins and we'll monitor the s390x builds and listen to community needs.

4 - We'll use PRB Jenkins plugin, I think the same plugin had been used on community's internal Jenkins.

5 - We're currently waiting for some ports to be opened on our vm's network so we don't have email reporting enabled yet. When it's working, can we use the same Strimzi mailing list as you do for internal Jenkins? Also do you think there are ways other than email that you can accept the results?

@kun-lu20 kun-lu20 requested a review from Frawless March 1, 2022 16:23
1. Replace hard-coded ids with build args in Dockerfile
2. Use $USER instead of env variable $ST_USER

Signed-off-by: Kun-Lu <[email protected]>
@kun-lu20
Copy link
Contributor Author

kun-lu20 commented Mar 7, 2022

Hi @Frawless , could you please take a look? Do let me know if you have any further questions. Thank you so much!

BTW, the Jenkins url has been changed to https://148.100.84.211/job/strimzi-kafka-operator/job/Strimzi_Kafka_Operator_ST/. Please use this new link when accessing our Jenkins instance.

@Frawless
Copy link
Member

Frawless commented Mar 7, 2022

@kun-lu20 Code changes are fine from my POV now. I have a few other points.

1 - The link should be available in some readme (I also expect it will be part of the checks on gh, but this is handled by the plugin itself)
2 - You need a user with contributor status in our organization, otherwise it won't be possible to scan the repo I think. Or have you tried it without credentials? For sure the results_info.sh won't work. Did you maybe think about using webhooks for triggering the jobs? It might be a better solution since your Jenkins is public
3 - mailing list - I think it might be possible, maybe better option to send it to PR author.

@kun-lu20
Copy link
Contributor Author

kun-lu20 commented Mar 8, 2022

Hi @Frawless , according to your questions:

  1. We've added a README-s390x.md in .jenkins folder.
  2. Can we start with nightly builds on acceptance profile? Thus we do not need webhooks at the beginning. Once everything is stable, could you help us to create them later?
  3. We do not have email connectivity yet and do not have an ETA. We're wondering if there are other ways to send you the results for now.

Could you please take a look? Thanks!

@Frawless
Copy link
Member

Frawless commented Mar 9, 2022

1 - Info about which tests, how and when will be executed will be fine
2 - Yes we can.
3 - You can use Strimzi CNCF dev mailing list. However, you own this feature so you should react to any failures/changes needed for it at first.

@kun-lu20
Copy link
Contributor Author

kun-lu20 commented Mar 9, 2022

Thanks @Frawless ! Reg your comments,

1 - Relevant info was added in README-s390x.md.
3 - Sure, we'll look into any failures/changes needed for it at first.

Copy link
Member

@Frawless Frawless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, PR looks good to me now. I assume you tried it and it's working without issues.

@kun-lu20
Copy link
Contributor Author

kun-lu20 commented Mar 9, 2022

Yes, we've tried it and it's working fine. Thanks @Frawless !

@kun-lu20
Copy link
Contributor Author

kun-lu20 commented Mar 9, 2022

Hi @scholzj , hope all is well.

Could you please merge this PR so that we can start to run our Jenkins nightly builds on main branch? Thank you very much!

@scholzj scholzj merged commit 7469c3c into strimzi:main Mar 9, 2022
@scholzj
Copy link
Member

scholzj commented Mar 9, 2022

@kun-lu20 Thanks for the PR.

BTW: I opened #6494 which updates Kaniko to the new version. I believe it has your s390x support.

@kun-lu20
Copy link
Contributor Author

kun-lu20 commented Mar 9, 2022

Thanks @scholzj !

Yes, Kaniko v1.8.0 has s390x support. Once #6494 is merged, I'll open another PR to drop the Kaniko patch from s390x Jenkins script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants