Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia mismatch recovery #915

Merged
merged 11 commits into from
May 11, 2023
55 changes: 54 additions & 1 deletion jenkins-scripts/dsl/_configs_/OSRFUNIXBase.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,59 @@ class OSRFUNIXBase extends OSRFBase
git clone https://github.com/gazebo-tooling/release-tools scripts -b \$RTOOLS_BRANCH
""".stripIndent())
}
publishers {
postBuildScripts {
steps{
conditionalSteps {
condition {
expression("(.)* gpu-nvidia (.)*",'${NODE_LABELS}')
}
steps {
systemGroovyCommand('''\
import hudson.model.Cause.UpstreamCause;
import hudson.model.*;

def node = build.getBuiltOn()
def old_labels = node.getLabelString()

println("# BEGIN SECTION: NVIDIA MISMATCH RECOVERY")
if (!(build.getLog(1000) =~ "nvml error: driver/library version mismatch")) {
println(" NVIDIA driver/library version mismatch not detected in the log - Not performing any recovery automatic recovery step")
return 1;
} else {
try {
println(" PROBLEM: NVIDIA driver/library version mismatch was detected in the log. Try to automatically resolve it:")
println("Removing labels and adding 'recovery-process' label to node")
node.setLabelString("recovery-process")
} catch (Exception ex) {
println("ERROR - CANNOT PERFORM RECOVERY ACTIONS FOR NVIDIA ERROR")
println("Restoring to previous state")
node.setLabelString(old_labels)
throw ex
}
}
println("# END SECTION: NVIDIA MISMATCH RECOVERY")
'''.stripIndent()
)
shell("""sudo shutdown -r +1""")
j-rivero marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
onlyIfBuildSucceeds(false)
onlyIfBuildFails(true)
}
// Manual insertion of xml for Naginator plugin because of this issue https://issues.jenkins.io/browse/JENKINS-66458
configure { project ->
project / publishers / 'com.chikli.hudson.plugin.naginator.NaginatorPublisher' {
regexpForRerun("nvml error: driver/library version mismatch")
checkRegexp(true)
maxSchedule(1)
delay(class: 'com.chikli.hudson.plugin.naginator.FixedDelay') {
delay(70)
}
}
}
}
}
}
}
}