Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added checks for checking test containers running or not to avoid test flakeness #2303

Merged
merged 5 commits into from
Feb 18, 2025

Conversation

sanyamsinghal
Copy link
Collaborator

Describe the changes in this pull request

Describe if there are any user-facing changes

How was this pull request tested?

Does your PR have changes that can cause upgrade issues?

Component Breaking changes?
MetaDB Yes/No
Name registry json Yes/No
Data File Descriptor Json Yes/No
Export Snapshot Status Json Yes/No
Import Data State Yes/No
Export Status Json Yes/No
Data .sql files of tables Yes/No
Export and import data queue Yes/No
Schema Dump Yes/No
AssessmentDB Yes/No
Sizing DB Yes/No
Migration Assessment Report Json Yes/No
Callhome Json Yes/No
YugabyteD Tables Yes/No
TargetDB Metadata Tables Yes/No

@sanyamsinghal sanyamsinghal self-assigned this Feb 6, 2025
@sanyamsinghal sanyamsinghal marked this pull request as ready for review February 7, 2025 10:28
@sanyamsinghal sanyamsinghal requested review from makalaaneesh and priyanshi-yb and removed request for makalaaneesh February 7, 2025 10:48
@sanyamsinghal sanyamsinghal force-pushed the sanyam/integ-tests-failure branch from 3644ead to fca5843 Compare February 7, 2025 16:23
}

// Store the DB connection for reuse
ms.db = db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing the db field? Can we still keep it and get it from pingDatabase so that we don't need to open it again?
Also, I think pingDatabase with #retries in the Start() function might not be useful that much because if the db is connected in the starting there is still some chance of failure in further Functions/Queries. So, maybe we need a retry at that level?
what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pingDatabase with #retries in the Start() function might not be useful

Yes thats why if you check ExecuteSqls() there i am creating a new connection/db everytime.

The failures we saw were like - unable to connect...
So i thought we create a new connection at that moment then it will helpful.

Previously we were creating and storing the connection pool sql.DB which might go bad after some usage.

Btw this PR changes i have rerun the tests numerous times, didn't get the errors we used to. but still not 100% sure if it can reoccur or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, and that's my point. Since this pingDatabase is not required in Start(), we can remove it and instead have similar logic in the other functions of each container. For example, when we run queries on the database using a connection, we should have some retry logic so that if there is a failure after initial queries, we can retry, and it won't fail directly.
what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added step to ping database in Start() just to ensure that the database container is not just created or up, it is also running properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the retry logic - we can have that.
But i think for that we should do that based on certain error messages, not for every error. So will need to figure out the list of error then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay. For now, I think we can go ahead with this change and revisit the retry logic in the container functions if we start facing intermittent issues in that path.

// It retries for a few times with a delay before giving up.
func pingDatabase(driver string, connStr string) error {
var err error
maxRetries := 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we increase the retries to 5?

utils.ErrExit("mysql container is not started: nil")
}

db, err := sql.Open("mysql", ms.GetConnectionString())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep the DB in the Container struct, you can try it here first. If db Exec is working, then retry the connection.

Copy link
Contributor

@priyanshi-yb priyanshi-yb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

// Store the DB connection for reuse
ms.db = db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay. For now, I think we can go ahead with this change and revisit the retry logic in the container functions if we start facing intermittent issues in that path.

@sanyamsinghal sanyamsinghal merged commit 3dcd367 into main Feb 18, 2025
66 checks passed
@sanyamsinghal sanyamsinghal deleted the sanyam/integ-tests-failure branch February 18, 2025 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants