Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP: Fix panic while parsing a specific build job log #1380

Merged
merged 1 commit into from
Apr 29, 2021

Conversation

thozza
Copy link
Member

@thozza thozza commented Apr 27, 2021

The cloudbuildResourcesFromBuildLog() function from the internal GCP
package could cause panic while parsing Build job log which failed early
and didn't create any Compute Engine resources. The function relied on
the Regexp.FindStringSubmatch() method to always return a match
while being used on the build log. Accessing a member of a nil slice
would cause a panic in osbuild-worker, such as:

Stack trace of thread 185316:
#0  0x0000564e5393b5e1 runtime.raise (osbuild-worker)
#1  0x0000564e5391fa1e runtime.sigfwdgo (osbuild-worker)
#2  0x0000564e5391e354 runtime.sigtrampgo (osbuild-worker)
#3  0x0000564e5393b953 runtime.sigtramp (osbuild-worker)
#4  0x00007f37e98e3b20 __restore_rt (libpthread.so.0)
#5  0x0000564e5393b5e1 runtime.raise (osbuild-worker)
#6  0x0000564e5391f5ea runtime.crash (osbuild-worker)
#7  0x0000564e53909306 runtime.fatalpanic (osbuild-worker)
#8  0x0000564e53908ca1 runtime.gopanic (osbuild-worker)
#9  0x0000564e53906b65 runtime.goPanicIndex (osbuild-worker)
#10 0x0000564e5420b36e github.com/osbuild/osbuild-composer/internal/cloud/gcp.cloudbuildResourcesFromBuildLog (osbuild-worker)
#11 0x0000564e54209ebb github.com/osbuild/osbuild-composer/internal/cloud/gcp.(*GCP).CloudbuildBuildCleanup (osbuild-worker)
#12 0x0000564e54b05a9b main.(*OSBuildJobImpl).Run (osbuild-worker)
#13 0x0000564e54b08854 main.main (osbuild-worker)
#14 0x0000564e5390b722 runtime.main (osbuild-worker)
#15 0x0000564e53939a11 runtime.goexit (osbuild-worker)

Add a unit test testing this scenario.

Make the cloudbuildResourcesFromBuildLog() function more robust and
not blindly expect to find matches in the build log. As a result the
cloudbuildBuildResources struct instance returned from the function
may be empty. Subsequently make sure that the CloudbuildBuildCleanup()
method handles an empty cloudbuildBuildResources instance correctly.
Specifically the storageCacheDir.bucket may be an empty string and
thus won't exist. Ensure that this does not result in infinite loop by
checking for storage.ErrBucketNotExist while iterating the bucket
objects.

Signed-off-by: Tomas Hozza [email protected]

This pull request includes:

  • adequate testing for the new functionality or fixed issue
  • adequate documentation informing people about the change such as

The `cloudbuildResourcesFromBuildLog()` function from the internal GCP
package could cause panic while parsing Build job log which failed early
and didn't create any Compute Engine resources. The function relied on
the `Regexp.FindStringSubmatch()` method to always return a match
while being used on the build log. Accessing a member of a nil slice
would cause a panic in `osbuild-worker`, such as:

Stack trace of thread 185316:
 #0  0x0000564e5393b5e1 runtime.raise (osbuild-worker)
 #1  0x0000564e5391fa1e runtime.sigfwdgo (osbuild-worker)
 #2  0x0000564e5391e354 runtime.sigtrampgo (osbuild-worker)
 #3  0x0000564e5393b953 runtime.sigtramp (osbuild-worker)
 #4  0x00007f37e98e3b20 __restore_rt (libpthread.so.0)
 #5  0x0000564e5393b5e1 runtime.raise (osbuild-worker)
 #6  0x0000564e5391f5ea runtime.crash (osbuild-worker)
 #7  0x0000564e53909306 runtime.fatalpanic (osbuild-worker)
 #8  0x0000564e53908ca1 runtime.gopanic (osbuild-worker)
 #9  0x0000564e53906b65 runtime.goPanicIndex (osbuild-worker)
 #10 0x0000564e5420b36e github.com/osbuild/osbuild-composer/internal/cloud/gcp.cloudbuildResourcesFromBuildLog (osbuild-worker)
 #11 0x0000564e54209ebb github.com/osbuild/osbuild-composer/internal/cloud/gcp.(*GCP).CloudbuildBuildCleanup (osbuild-worker)
 #12 0x0000564e54b05a9b main.(*OSBuildJobImpl).Run (osbuild-worker)
 #13 0x0000564e54b08854 main.main (osbuild-worker)
 #14 0x0000564e5390b722 runtime.main (osbuild-worker)
 #15 0x0000564e53939a11 runtime.goexit (osbuild-worker)

Add a unit test testing this scenario.

Make the `cloudbuildResourcesFromBuildLog()` function more robust and
not blindly expect to find matches in the build log. As a result the
`cloudbuildBuildResources` struct instance returned from the function
may be empty. Subsequently make sure that the `CloudbuildBuildCleanup()`
method handles an empty `cloudbuildBuildResources` instance correctly.
Specifically the `storageCacheDir.bucket` may be an empty string and
thus won't exist. Ensure that this does not result in infinite loop by
checking for `storage.ErrBucketNotExist` while iterating the bucket
objects.

Signed-off-by: Tomas Hozza <[email protected]>
@thozza thozza added the bug Something isn't working label Apr 27, 2021
Copy link
Member

@ondrejbudai ondrejbudai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@thozza
Copy link
Member Author

thozza commented Apr 29, 2021

So I would call the CI as "combined" PASS...

The run #6 is stuck on EL8 aarch64 Base and has failed EL8.4 New OSTree. The later one was an infrastructure issue.

The run #4 was run on the same commit and these tests passed. It was stuck only on F32 OSTree test, which passed in the latest run #6.

@thozza thozza merged commit 5e591cc into osbuild:main Apr 29, 2021
@thozza thozza deleted the gcp-cloudbuild-panic branch April 29, 2021 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants