Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRIVERS-828 comprehensive Atlas testing #98

Merged
merged 173 commits into from
Feb 18, 2021
Merged

Conversation

p-mongo
Copy link
Contributor

@p-mongo p-mongo commented Jan 5, 2021

@p-mongo p-mongo changed the title -828 validate failure handling DRIVERS-828 comprehensive Atlas testing Jan 6, 2021
Copy link
Contributor

@alcaeus alcaeus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit about debug code being left in. I'll leave the Python code review to someone that knows Python, but spec changes LGTM.

astrolabe/runner.py Outdated Show resolved Hide resolved
Copy link
Contributor

@prashantmital prashantmital left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to go through all of the changes yet. Publishing my initial comments so we can start iterating.

.evergreen/config.yml Show resolved Hide resolved
astrolabe/cli.py Outdated Show resolved Hide resolved
astrolabe/runner.py Outdated Show resolved Hide resolved
astrolabe/runner.py Outdated Show resolved Hide resolved
astrolabe/runner.py Outdated Show resolved Hide resolved
astrolabe/runner.py Outdated Show resolved Hide resolved
astrolabe/runner.py Outdated Show resolved Hide resolved
astrolabe/runner.py Outdated Show resolved Hide resolved
deadline = monotonic() + 90

while True:
mc = MongoClient(cluster_config['connectionStrings']['standard'], username='atlasuser', password='mypassword123')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a new MongoClient? Why can we not use one of the clients attached to the context for this operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two other clients in the codebase but neither of them appears to be attached to any contexts, which client did you have in mind exactly?

atlasclient/client.py Outdated Show resolved Hide resolved
astrolabe/runner.py Outdated Show resolved Hide resolved
atlasclient/client.py Outdated Show resolved Hide resolved
astrolabe/runner.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the RST files as I don't have enough context for the other components of this PR.

I have two questions on the documentation for events. If no changes are required (based on your response), you can consider me a LGTM.

docs/source/spec-workload-executor.rst Outdated Show resolved Hide resolved
docs/source/spec-workload-executor.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@prashantmital prashantmital left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can disregard the Python style nits for this PR as that shouldn't block other drivers from implementing this spec. However, there seem to be several other questions that need to be addressed before this can be merged.

@@ -78,11 +83,13 @@ functions:
CLUSTER_NAME_SALT: ${build_id}
ATLAS_API_USERNAME: ${atlas_key}
ATLAS_API_PASSWORD: ${atlas_secret}
ATLAS_API_BASE_URL: ${atlas_url}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you are hardcoding this? We should prefer configuring this value in the corresponding Evergreen variable (via the UI) instead of committing it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticied that we are currently using https://cloud-dev.mongodb.com/api/atlas - has the API base URL changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the private endpoints the last path segment had to be removed, since I didn't have admin access I set the override in the PR. I can edit the project but this would break existing builds, therefore I suggest making this change in project config when the PR is ready to be merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. In that case can you revert this change? We can update the atlas_url evergreen variable before merging this.

@@ -30,6 +30,11 @@ functions:
working_dir: astrolabe-src
command: |
${PYTHON3_BINARY} -m pip install virtualenv
- command: subprocess.exec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to get debugging information for the DNS issue? If so, can we do this in a separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p-mongo what about this?

astrolabe/cli.py Outdated Show resolved Hide resolved
astrolabe/cli.py Outdated Show resolved Hide resolved
astrolabe/cli.py Outdated Show resolved Hide resolved
self.wait_for_idle()

if op_name == 'sleep':
_time.sleep(op_spec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: suggest casting op_spec to a float before passing it to sleep to support durations defined as string x.y.

_time.sleep(float(op_spec))


if op_name == 'restartVms':
rv = self.admin_client.nds.groups[self.project.id].clusters[self.cluster_name].reboot.post(api_version='private')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove newline.

self.cluster_url.processArgs.patch(**process_args)

# Step-4: wait until maintenance completes (cluster is IDLE).
self.wait_for_idle()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to sleep for a couple of seconds before calling waitForIdle otherwise there could be situations in which you end up seeing the IDLE server state prematurely because maintenance has not yet been kicked off by the automation agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reported this for failover tests in https://jira.mongodb.org/browse/PRODTRIAGE-1232 (and the users are currently meant to add the sleeps in their scenarios to compensate), are you saying other actions are also affected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. AFAIK none of the operations on clusters are applied synchronously. Operations are put in a task queue that is consumed by the automation agent. Consequently, this could cause strange failures even for other maintenance scenarios like cluster resizing. Adding a 2-3 second wait before waiting for idle should work.

self.__connection_string = connection_string
uri = re.sub(r'://',
'://%s:%s@' % (self.config.database_username, self.config.database_password),
cluster.srvAddress)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the workload executors cannot add URI options to the connection string, don't we need to add them to the connection string being generated here? This was previously being done (rather simplistically) using

connection_string += urlencode(uri_options)

but now we don't seem to be doing it at all. Can you elaborate on how workload executors are expected to respect URI options if they cannot augment the incoming connection string and if astrolabe doesn't augment the outgoing connection string?

Note that the workload executor:
#. MUST use the input connection string to `instantiate the
unified test runner <https://github.com/mongodb/specifications/blob/master/source/unified-test-format/unified-test-format.rst#id92>`_
of the driver being tested. Note that the workload executor:

* MUST NOT override any of the URI options specified in the incoming connection string.
* MUST NOT augment the incoming connection string with any additional URI options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true? Please see my question above about how we expect to honor URI options.

Copy link
Contributor

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about RST formatting, but content LGTM otherwise.

docs/source/spec-workload-executor.rst Show resolved Hide resolved
- client:
id: &client0 client0
uriOptions:
retryWrites: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the uriOptions actually going to be honored? Astrolabe doesn't seem to use this field and we prohibit the workload executor from using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is read by the unified test runner when it is instantiating clients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in that case we need to update the language in docs/source/spec-workload-executor.rst. Currently, that document states:

Behavioral Description
----------------------

After accepting the inputs, the workload executor:

#. MUST use the input connection string to `instantiate the
   unified test runner <https://github.com/mongodb/specifications/blob/master/source/unified-test-format/unified-test-format.rst#id92>`_
   of the driver being tested. Note that the workload executor:

   * MUST NOT override any of the URI options specified in the incoming connection string.
   * MUST NOT augment the incoming connection string with any additional URI options.

If we want the uriOptions to be honored, we need to update this language?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the language you quoted conflicts. It's saying that the workload executor must pass the connection string it receives without modification to the unified test runner. AFAIK, the workload executor isn't connecting to MongoDB directly, so the only client objects using this connection string will be those created within the unified test runner. IMO, this has no bearing on the unified test runner spec's instructions that a client entity's uriOptions should be respected.

Perhaps a clarification of this interaction would help avoid any misunderstanding.

@p-mongo p-mongo merged commit 6e3c4a4 into mongodb-labs:master Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants