if you retry flaky tests and the last attempt succeeded, the step should succeed #95

benbitrise · 2024-05-15T21:39:29Z

Checklist

I've read and followed the Contribution Guidelines
step.yml and README.md is updated with the changes (if needed)

Version

Requires a MAJOR/MINOR/PATCH version update

Context

Changes

My first ever go code, so be gentle
Based on feedback from a customer (and I agree), if you have flaky retries set and the final attempt succeeded, the step should be marked as success.
Also introduced some unit tests into the project

Investigation details

Decisions

tothszabi · 2024-05-16T14:01:14Z

main.go

+func makeSortedCopyOfSteps(steps []*toolresults.Step) []*toolresults.Step {
+	// Make a copy of the original slice
+	stepsCopy := make([]*toolresults.Step, len(steps))
+	copy(stepsCopy, steps)


I think you can simply sort the original array because nothing else is using it.

tothszabi · 2024-05-16T14:11:27Z

main.go

+				sortedSteps := makeSortedCopyOfSteps(responseModel.Steps)
+				doesRetriesForFlakiness := configs.FlakyTestAttempts > 0
+				for index, step := range sortedSteps {
+					isLastStep := index == len(responseModel.Steps)-1


Is this logic good? I mean the steps could contain multiple different test executions and should not we take the last one of the same test execution.?

If that is the case then maybe a simpler logic should be to check if the flaky retry was enabled like you did. Then if it was we could first deduplicate the array by keeping the last execution of the tests. And then iterate over them. This way we could even skip the many conditions in the getNewSuccessValue function.

benbitrise · 2024-05-18T16:05:34Z

Manual tests -

Multiple Devices, Retries Enabled

Only One Device, Retries Enabled

Only One Device, No Retries

success✅
failure✅

Multiple Devices, No Retries

tothszabi · 2024-05-20T12:32:45Z

main.go

+func getNewSuccessValue(currentOverallSuccess bool, stepWasSuccessful bool, wasLastStep bool, includedFlakyRetries bool) bool {
+	// Being overly cautious, by only setting success to true if it's the last try and flaky tests were enabled
+	// Doing this simply because there could be unaccounted for scenarios where it's not desirable to set successful to true
+	if !stepWasSuccessful {
+		return false
+	}
+	if stepWasSuccessful && wasLastStep && includedFlakyRetries {
+		return true
+	}
+	return currentOverallSuccess
+}


I think we can get rid of this logic if we do the step preprocessing a bit different.

steps := responseModel.Steps if configs.FlakyTestAttempts > 0 { steps, err = filteredSteps(steps) if err != nil { failf("Failed to filter latest steps, error: %s", err) } } for _, step := range steps { ... } func filteredSteps(steps []*toolresults.Step) ([]*toolresults.Step, error) { latestSteps := make(map[string]*toolresults.Step) for _, step := range steps { bytes, err := json.Marshal(step.DimensionValue) if err != nil { return nil, err } key := string(bytes) if previousStep, ok := latestSteps[key]; ok { if previousStep.CompletionTime != nil && step.CompletionTime != nil { if previousStep.CompletionTime.Seconds < step.CompletionTime.Seconds { latestSteps[key] = step } } else { // This should not really happen, but there is always a possibility that the API forgets to send a time. // I can see two actions here. // // One is to throw an error and treat it as an invalid case. Logically finished steps should always have // a finished at time. // // The second one is to keep the one without a time as a separate, individual entry. } } else { latestSteps[key] = step } } values := make([]*toolresults.Step, 0, len(latestSteps)) for _, step := range latestSteps { values = append(values, step) } return values, nil }

If the flaky retries are turned on then we can first go over the steps and keep only the last run of each step. That way we do not need to keep track of overall success or if this is the last step or if flaky retries were turned on. We can execute the same logic as before just on a different step array if the flaky retry was turned on.

benbitrise · 2024-05-20T20:00:37Z

main.go

+		if key != nil {
+			dimensionStr := string(key)
+			if groupedByDimension[dimensionStr] == nil || groupedByDimension[dimensionStr].Summary != "success" {
+				groupedByDimension[dimensionStr] = step.Outcome


The previous iteration of this PR was being extra careful by only considering the last executed Step for a dimension. Turns out the API doesn't actually include any info about when the step was run in the payload! So instead, this code is simply accepting that if any of the Steps for a dimension were successful, we can count the dimension as successful. This is reasonable to do because it is nearly certain that a successful result is the final attempt--under what scenario would there be a retry if that isn't the case?

This makes sense if the assumption is true. I am not fully aware what you can do on Android but because Apple and Google copy ideas of each other I want to highlight that Xcode has a test running mode called Up until maximum repetitions. In this mode it will retry the test as many times as the user requested and even the successful ones. In this mode the result of the last execution counts.

Can you double check that this is not possible here?

benbitrise · 2024-05-20T20:16:38Z

E2E Tests for latest iteration:

Multiple Devices, Retries Enabled
both devices pass without retry ✅
both devices pass after retry✅
both devices fail after retry✅
both devices pass, one retries✅
one device fails, the other succeeds✅

Only One Device, Retries Enabled
failure✅
success on first attempt✅
success on retry✅

Only One Device, No Retries
success✅
failure✅

Multiple Devices, No Retries
both succeed✅
one succeeds one fails✅
both fail✅

tothszabi · 2024-05-22T09:46:12Z

main_test.go

+	expected := true
+	if err != nil {
+		t.Errorf("Expected no errors. Go %s", err)
+	}
+
+	if isSuccess != expected {
+		t.Errorf("Expected success to be %v, got %v", expected, isSuccess)
+	}


We are using the github.com/stretchr/testify to simply the assertions. Once you added it to the project you can simply this with

import ( ... "github.com/stretchr/testify/require" ) func TestGetSuccessOfExecution_AllSucceed(t *testing.T) { ... isSuccess, err := GetSuccessOfExecution(steps) require.NoError(t, err) require.True(t, isSuccess) }

You can also add custom messages to the assertions if you want to.

tothszabi · 2024-05-22T09:49:21Z

main.go

+		if key != nil {
+			dimensionStr := string(key)
+			if groupedByDimension[dimensionStr] == nil || groupedByDimension[dimensionStr].Summary != "success" {
+				groupedByDimension[dimensionStr] = step.Outcome


This makes sense if the assumption is true. I am not fully aware what you can do on Android but because Apple and Google copy ideas of each other I want to highlight that Xcode has a test running mode called Up until maximum repetitions. In this mode it will retry the test as many times as the user requested and even the successful ones. In this mode the result of the last execution counts.

Can you double check that this is not possible here?

tothszabi · 2024-05-22T09:51:18Z

main.go

+	return outcome
+}
+
+func GetSuccessOfExecution(steps []*toolresults.Step) (bool, error) {


Does this need to be a public function? It is used in the same package so I think we can make this private.

I made it public for the unit tests to be able to use the function. To keep it public and make sense, I pulled it into a step package

benbitrise · 2024-05-25T01:05:40Z

@tothszabi - I can't reply to a comment you made due to changes I pushed up, so starting a new thread

This makes sense if the assumption is true. I am not fully aware what you can do on Android but because Apple and Google copy ideas of each other I want to highlight that Xcode has a test running mode called Up until maximum repetitions. In this mode it will retry the test as many times as the user requested and even the successful ones. In this mode the result of the last execution counts.
Can you double check that this is not possible here?

These are different levels of abstraction. I ran some tests in FTL with an iOS project to confirm.

I ran flaky test with retries on. The test failed and then passed in the same FTL Step. Firebase set the outcome to FAILURE (which I'm not convinced is the right way to handle this). Keep in mind that this is different from Firebase's retry mechanism. They note:

The entire test execution runs again when a failure is detected. There’s no support for retrying only failed test cases.

So in the case of the xcodebuild parameter, if retry is needed, the retry happens for only the affected test case within the Step, and FTL sets outcome to failure. But when setting the FTL retry, it does an entire new rerun of all the tests (a new Step).

tothszabi · 2024-05-29T13:07:42Z

step/ftl_result_processor.go

+	toolresults "google.golang.org/api/toolresults/v1beta3"
+)
+
+func GetSuccessOfExecution(steps []*toolresults.Step) (bool, error) {


Now this function and the tests are in the same package. You do not need to make it public anymore as private functions of a package can be accessed from the tests.

This will also solve your linting issue.
(Failure on the CI: /bitrise/go/src/github.com/bitrise-steplib/steps-virtual-device-testing-for-android/step/ftl_result_processor.go:9:1: exported function GetSuccessOfExecution should have comment or be unexported)

…uld succeed

tothszabi reviewed May 16, 2024

View reviewed changes

benbitrise marked this pull request as ready for review May 18, 2024 16:05

tothszabi reviewed May 20, 2024

View reviewed changes

benbitrise commented May 20, 2024

View reviewed changes

tothszabi reviewed May 22, 2024

View reviewed changes

tothszabi reviewed May 29, 2024

View reviewed changes

benbitrise added 17 commits May 31, 2024 08:06

if you retry flaky tests and the last attempt succeeded, the step sho…

9507ec8

…uld succeed

distinguish based on dimension

68cf6fe

fix npe?

e61b4b5

remove log line

cb26ccc

debug logging on dimension success

a033451

more logging

9ac617e

fix last step calc

2ed656e

remove logging

b54c492

restructure for simplicity to understand

67e2250

missing completion time debugging

8b68f25

more debugging

818fd0b

stop caring about completion time

d658ead

move to package

4de5711

use testify

78490d6

rename package to not conflict with existing bitrise mores

ebf8b28

rename package

cad5ab5

well i guess i learned a little bout go

ab217be

benbitrise force-pushed the bb/dont_mark_retries_as_failures branch from 35e64a1 to ab217be Compare May 31, 2024 14:26

benbitrise added 2 commits May 31, 2024 09:34

bitrise syntax

1600938

bitrise syntax

dc56b20

tothszabi approved these changes Jun 3, 2024

View reviewed changes

tothszabi merged commit 4fdef16 into bitrise-steplib:master Jun 4, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

if you retry flaky tests and the last attempt succeeded, the step should succeed #95

if you retry flaky tests and the last attempt succeeded, the step should succeed #95

benbitrise commented May 15, 2024 •

edited

Loading

tothszabi May 16, 2024

tothszabi May 16, 2024

benbitrise commented May 18, 2024

tothszabi May 20, 2024

benbitrise May 20, 2024

tothszabi May 22, 2024

benbitrise commented May 20, 2024

tothszabi May 22, 2024

tothszabi May 22, 2024

tothszabi May 22, 2024

benbitrise May 25, 2024

benbitrise commented May 25, 2024

tothszabi May 29, 2024

if you retry flaky tests and the last attempt succeeded, the step should succeed #95

if you retry flaky tests and the last attempt succeeded, the step should succeed #95

Conversation

benbitrise commented May 15, 2024 • edited Loading

Checklist

Version

Context

Changes

Investigation details

Decisions

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benbitrise commented May 18, 2024

Manual tests -

Multiple Devices, Retries Enabled

Only One Device, Retries Enabled

Only One Device, No Retries

Multiple Devices, No Retries

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benbitrise commented May 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benbitrise commented May 25, 2024

Choose a reason for hiding this comment

benbitrise commented May 15, 2024 •

edited

Loading