Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if you retry flaky tests and the last attempt succeeded, the step should succeed #95

Merged

Conversation

benbitrise
Copy link
Contributor

@benbitrise benbitrise commented May 15, 2024

Checklist

  • I've read and followed the Contribution Guidelines
  • step.yml and README.md is updated with the changes (if needed)

Version

Requires a MAJOR/MINOR/PATCH version update

Context

Changes

My first ever go code, so be gentle
Based on feedback from a customer (and I agree), if you have flaky retries set and the final attempt succeeded, the step should be marked as success.
Also introduced some unit tests into the project

Investigation details

Decisions

main.go Outdated
func makeSortedCopyOfSteps(steps []*toolresults.Step) []*toolresults.Step {
// Make a copy of the original slice
stepsCopy := make([]*toolresults.Step, len(steps))
copy(stepsCopy, steps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simply sort the original array because nothing else is using it.

main.go Outdated
sortedSteps := makeSortedCopyOfSteps(responseModel.Steps)
doesRetriesForFlakiness := configs.FlakyTestAttempts > 0
for index, step := range sortedSteps {
isLastStep := index == len(responseModel.Steps)-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic good? I mean the steps could contain multiple different test executions and should not we take the last one of the same test execution.?

If that is the case then maybe a simpler logic should be to check if the flaky retry was enabled like you did. Then if it was we could first deduplicate the array by keeping the last execution of the tests. And then iterate over them. This way we could even skip the many conditions in the getNewSuccessValue function.

@benbitrise
Copy link
Contributor Author

Manual tests -

Multiple Devices, Retries Enabled

Only One Device, Retries Enabled

Only One Device, No Retries

Multiple Devices, No Retries

@benbitrise benbitrise marked this pull request as ready for review May 18, 2024 16:05
main.go Outdated
Comment on lines 323 to 333
func getNewSuccessValue(currentOverallSuccess bool, stepWasSuccessful bool, wasLastStep bool, includedFlakyRetries bool) bool {
// Being overly cautious, by only setting success to true if it's the last try and flaky tests were enabled
// Doing this simply because there could be unaccounted for scenarios where it's not desirable to set successful to true
if !stepWasSuccessful {
return false
}
if stepWasSuccessful && wasLastStep && includedFlakyRetries {
return true
}
return currentOverallSuccess
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of this logic if we do the step preprocessing a bit different.

steps := responseModel.Steps
if configs.FlakyTestAttempts > 0 {
	steps, err = filteredSteps(steps)
	if err != nil {
		failf("Failed to filter latest steps, error: %s", err)
	}
}

for _, step := range steps {
...
}

func filteredSteps(steps []*toolresults.Step) ([]*toolresults.Step, error) {
	latestSteps := make(map[string]*toolresults.Step)

	for _, step := range steps {
		bytes, err := json.Marshal(step.DimensionValue)
		if err != nil {
			return nil, err
		}

		key := string(bytes)

		if previousStep, ok := latestSteps[key]; ok {
			if previousStep.CompletionTime != nil && step.CompletionTime != nil {
				if previousStep.CompletionTime.Seconds < step.CompletionTime.Seconds {
					latestSteps[key] = step
				}
			} else {
				// This should not really happen, but there is always a possibility that the API forgets to send a time.
				// I can see two actions here.
				//
				// One is to throw an error and treat it as an invalid case. Logically finished steps should always have
				// a finished at time.
				//
				// The second one is to keep the one without a time as a separate, individual entry.
			}
		} else {
			latestSteps[key] = step
		}
	}

	values := make([]*toolresults.Step, 0, len(latestSteps))
	for _, step := range latestSteps {
		values = append(values, step)
	}

	return values, nil
}

If the flaky retries are turned on then we can first go over the steps and keep only the last run of each step. That way we do not need to keep track of overall success or if this is the last step or if flaky retries were turned on. We can execute the same logic as before just on a different step array if the flaky retry was turned on.

main.go Outdated
if key != nil {
dimensionStr := string(key)
if groupedByDimension[dimensionStr] == nil || groupedByDimension[dimensionStr].Summary != "success" {
groupedByDimension[dimensionStr] = step.Outcome
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous iteration of this PR was being extra careful by only considering the last executed Step for a dimension. Turns out the API doesn't actually include any info about when the step was run in the payload! So instead, this code is simply accepting that if any of the Steps for a dimension were successful, we can count the dimension as successful. This is reasonable to do because it is nearly certain that a successful result is the final attempt--under what scenario would there be a retry if that isn't the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense if the assumption is true. I am not fully aware what you can do on Android but because Apple and Google copy ideas of each other I want to highlight that Xcode has a test running mode called Up until maximum repetitions. In this mode it will retry the test as many times as the user requested and even the successful ones. In this mode the result of the last execution counts.

Can you double check that this is not possible here?

@benbitrise
Copy link
Contributor Author

E2E Tests for latest iteration:

Multiple Devices, Retries Enabled
both devices pass without retry
both devices pass after retry
both devices fail after retry
both devices pass, one retries
one device fails, the other succeeds

Only One Device, Retries Enabled
failure
success on first attempt
success on retry

Only One Device, No Retries
success
failure

Multiple Devices, No Retries
both succeed
one succeeds one fails
both fail

main_test.go Outdated
Comment on lines 23 to 30
expected := true
if err != nil {
t.Errorf("Expected no errors. Go %s", err)
}

if isSuccess != expected {
t.Errorf("Expected success to be %v, got %v", expected, isSuccess)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using the github.com/stretchr/testify to simply the assertions. Once you added it to the project you can simply this with

import (
	...
	"github.com/stretchr/testify/require"
)

func TestGetSuccessOfExecution_AllSucceed(t *testing.T) {
	...
	isSuccess, err := GetSuccessOfExecution(steps)
	require.NoError(t, err)
	require.True(t, isSuccess)
}

You can also add custom messages to the assertions if you want to.

main.go Outdated
if key != nil {
dimensionStr := string(key)
if groupedByDimension[dimensionStr] == nil || groupedByDimension[dimensionStr].Summary != "success" {
groupedByDimension[dimensionStr] = step.Outcome
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense if the assumption is true. I am not fully aware what you can do on Android but because Apple and Google copy ideas of each other I want to highlight that Xcode has a test running mode called Up until maximum repetitions. In this mode it will retry the test as many times as the user requested and even the successful ones. In this mode the result of the last execution counts.

Can you double check that this is not possible here?

main.go Outdated
return outcome
}

func GetSuccessOfExecution(steps []*toolresults.Step) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a public function? It is used in the same package so I think we can make this private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it public for the unit tests to be able to use the function. To keep it public and make sense, I pulled it into a step package

@benbitrise
Copy link
Contributor Author

@tothszabi - I can't reply to a comment you made due to changes I pushed up, so starting a new thread

This makes sense if the assumption is true. I am not fully aware what you can do on Android but because Apple and Google copy ideas of each other I want to highlight that Xcode has a test running mode called Up until maximum repetitions. In this mode it will retry the test as many times as the user requested and even the successful ones. In this mode the result of the last execution counts.
Can you double check that this is not possible here?

These are different levels of abstraction. I ran some tests in FTL with an iOS project to confirm.

I ran flaky test with retries on. The test failed and then passed in the same FTL Step. Firebase set the outcome to FAILURE (which I'm not convinced is the right way to handle this). Keep in mind that this is different from Firebase's retry mechanism. They note:

  • The entire test execution runs again when a failure is detected. There’s no support for retrying only failed test cases.

So in the case of the xcodebuild parameter, if retry is needed, the retry happens for only the affected test case within the Step, and FTL sets outcome to failure. But when setting the FTL retry, it does an entire new rerun of all the tests (a new Step).

toolresults "google.golang.org/api/toolresults/v1beta3"
)

func GetSuccessOfExecution(steps []*toolresults.Step) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this function and the tests are in the same package. You do not need to make it public anymore as private functions of a package can be accessed from the tests.

This will also solve your linting issue.
(Failure on the CI: /bitrise/go/src/github.com/bitrise-steplib/steps-virtual-device-testing-for-android/step/ftl_result_processor.go:9:1: exported function GetSuccessOfExecution should have comment or be unexported)

@benbitrise benbitrise force-pushed the bb/dont_mark_retries_as_failures branch from 35e64a1 to ab217be Compare May 31, 2024 14:26
@tothszabi tothszabi merged commit 4fdef16 into bitrise-steplib:master Jun 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants