-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Fix flaky IT test cases #8260
Conversation
@@ -72,6 +72,8 @@ public static void setUp() throws IOException { | |||
instances = new ArrayList<>(); | |||
InstancesSettings instanceSettings = InstancesSettings.newBuilder().build(); | |||
instancesClient = InstancesClient.create(instanceSettings); | |||
|
|||
Util.cleanUpComputeInstances(instancesClient, DEFAULT_PROJECT, DEFAULT_ZONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a bit too aggressive to delete all compute instances? Why isn't the cleanup in tearDown()
sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tearDown() cleanup is sufficient for 99% of the case. There was a significant amount of compute instances that were in there (my guess from previous runs that didn't have the cleanup or when the IT jobs were cancelled halfway).
This project is only used for ITs so it would have 24 hours to report the results before the previous instances are deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that someone runs the test on their own project and causes inadvertent deletion of instances. If we want to do this kind of cleanup, maybe we can have instance name prefix that is consistent and can accurately filter instances created by this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that someone runs the test on their own project and causes inadvertent deletion of instances.
Actually, I was thinking that this would be fine for the ITs. I was thinking the ideal state for gcloud-devel would have 0 compute instances when there are no active ITs running. If Util.cleanUpComputeInstances
(or more filtered version of that function) matches any existing instances, then those should be deleted. 24 hours is 12x as long as the longest GraalVM job.
maybe we can have instance name prefix that is consistent and can accurately filter instances created by this test.
Yep! I was planning on creating another PR that would update the instances for each module to have a more unique name instead test-instance
or gapic-instance
. i.e. Compute would be test-compute-instance-UUID, Container would be test-container-instance-UUID or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok nvm, I just re-read the comment.
test on their own project
@@ -90,7 +91,13 @@ public static void beforeClass() throws Exception { | |||
|
|||
@AfterClass | |||
public static void afterClass() throws Exception { | |||
Thread.sleep(TimeUnit.MINUTES.toMillis(5)); | |||
Operation response = client.getOperation(PROJECT_ID, ZONE, operation.getName()); | |||
// Sleep for one minute until Cluster CREATE operation is complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we waiting for cluster creation completion in afterClass()
? Shouldn't we wait for it right after creation in beforeClass()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to run operations of the client (get/ list) without the operation having to be complete. It was only when deleting the cluster, I got an error with Operation in progress
. I think it would make sense to move this to wait when the operation was run.
I'll change this to to do the busy wait in beforeClass().
@@ -66,6 +66,8 @@ public class ITNotebookServiceClientTest { | |||
public static void setUp() throws IOException, ExecutionException, InterruptedException { | |||
// Create Test Notebook Instance | |||
client = NotebookServiceClient.create(); | |||
Util.cleanUpNotebookInstances(client, PARENT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this delete the one created above before it's even used?
Also, why isn't cleanup in tearDown()
sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would only delete notebooks that have a creation time more than 24 hours earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a prefix for the notebook names or something to avoid accidentally deleting unrelated notebooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it's similar to the comment above about compute and container. I'll add in more specific names for the IT instances.
…oud-java into update_container_IT
@meltsufin @suztomo Could you re-review this PR? |
for (Instance instance : listPagedResponse.iterateAll()) { | ||
if (isCreatedBeforeThresholdTime( | ||
ZonedDateTime.parse(instance.getCreationTimestamp()).toInstant()) | ||
&& instance.getName().startsWith(BaseTest.COMPUTE_PREFIX)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both Util
classes, I think I would rather have the prefix passed in as an argument into this method, but I guess this is OK too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, this seems better. I'll create a new PR for this.
This PR should resolve a bunch of the flaky test cases in the monorepo.
Status:
Quota Metrics after VM cleanup (7 Day Graph):