-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRIVERS-828 comprehensive Atlas testing #98
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit about debug code being left in. I'll leave the Python code review to someone that knows Python, but spec changes LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to go through all of the changes yet. Publishing my initial comments so we can start iterating.
astrolabe/runner.py
Outdated
deadline = monotonic() + 90 | ||
|
||
while True: | ||
mc = MongoClient(cluster_config['connectionStrings']['standard'], username='atlasuser', password='mypassword123') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need a new MongoClient? Why can we not use one of the clients attached to the context for this operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two other clients in the codebase but neither of them appears to be attached to any contexts, which client did you have in mind exactly?
* master: PYTHON-2579 Fix Atlas planned maintenance test runner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the RST files as I don't have enough context for the other components of this PR.
I have two questions on the documentation for events. If no changes are required (based on your response), you can consider me a LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can disregard the Python style nits for this PR as that shouldn't block other drivers from implementing this spec. However, there seem to be several other questions that need to be addressed before this can be merged.
@@ -78,11 +83,13 @@ functions: | |||
CLUSTER_NAME_SALT: ${build_id} | |||
ATLAS_API_USERNAME: ${atlas_key} | |||
ATLAS_API_PASSWORD: ${atlas_secret} | |||
ATLAS_API_BASE_URL: ${atlas_url} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you are hardcoding this? We should prefer configuring this value in the corresponding Evergreen variable (via the UI) instead of committing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticied that we are currently using https://cloud-dev.mongodb.com/api/atlas
- has the API base URL changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the private endpoints the last path segment had to be removed, since I didn't have admin access I set the override in the PR. I can edit the project but this would break existing builds, therefore I suggest making this change in project config when the PR is ready to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. In that case can you revert this change? We can update the atlas_url
evergreen variable before merging this.
@@ -30,6 +30,11 @@ functions: | |||
working_dir: astrolabe-src | |||
command: | | |||
${PYTHON3_BINARY} -m pip install virtualenv | |||
- command: subprocess.exec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to get debugging information for the DNS issue? If so, can we do this in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p-mongo what about this?
self.wait_for_idle() | ||
|
||
if op_name == 'sleep': | ||
_time.sleep(op_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: suggest casting op_spec
to a float before passing it to sleep
to support durations defined as string x.y
.
_time.sleep(float(op_spec))
|
||
if op_name == 'restartVms': | ||
rv = self.admin_client.nds.groups[self.project.id].clusters[self.cluster_name].reboot.post(api_version='private') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove newline.
self.cluster_url.processArgs.patch(**process_args) | ||
|
||
# Step-4: wait until maintenance completes (cluster is IDLE). | ||
self.wait_for_idle() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to sleep for a couple of seconds before calling waitForIdle
otherwise there could be situations in which you end up seeing the IDLE server state prematurely because maintenance has not yet been kicked off by the automation agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reported this for failover tests in https://jira.mongodb.org/browse/PRODTRIAGE-1232 (and the users are currently meant to add the sleeps in their scenarios to compensate), are you saying other actions are also affected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. AFAIK none of the operations on clusters are applied synchronously. Operations are put in a task queue that is consumed by the automation agent. Consequently, this could cause strange failures even for other maintenance scenarios like cluster resizing. Adding a 2-3 second wait before waiting for idle should work.
self.__connection_string = connection_string | ||
uri = re.sub(r'://', | ||
'://%s:%s@' % (self.config.database_username, self.config.database_password), | ||
cluster.srvAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the workload executors cannot add URI options to the connection string, don't we need to add them to the connection string being generated here? This was previously being done (rather simplistically) using
connection_string += urlencode(uri_options)
but now we don't seem to be doing it at all. Can you elaborate on how workload executors are expected to respect URI options if they cannot augment the incoming connection string and if astrolabe doesn't augment the outgoing connection string?
Note that the workload executor: | ||
#. MUST use the input connection string to `instantiate the | ||
unified test runner <https://github.com/mongodb/specifications/blob/master/source/unified-test-format/unified-test-format.rst#id92>`_ | ||
of the driver being tested. Note that the workload executor: | ||
|
||
* MUST NOT override any of the URI options specified in the incoming connection string. | ||
* MUST NOT augment the incoming connection string with any additional URI options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still true? Please see my question above about how we expect to honor URI options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment about RST formatting, but content LGTM otherwise.
- client: | ||
id: &client0 client0 | ||
uriOptions: | ||
retryWrites: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are the uriOptions actually going to be honored? Astrolabe doesn't seem to use this field and we prohibit the workload executor from using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is read by the unified test runner when it is instantiating clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in that case we need to update the language in docs/source/spec-workload-executor.rst
. Currently, that document states:
Behavioral Description
----------------------
After accepting the inputs, the workload executor:
#. MUST use the input connection string to `instantiate the
unified test runner <https://github.com/mongodb/specifications/blob/master/source/unified-test-format/unified-test-format.rst#id92>`_
of the driver being tested. Note that the workload executor:
* MUST NOT override any of the URI options specified in the incoming connection string.
* MUST NOT augment the incoming connection string with any additional URI options.
If we want the uriOptions to be honored, we need to update this language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the language you quoted conflicts. It's saying that the workload executor must pass the connection string it receives without modification to the unified test runner. AFAIK, the workload executor isn't connecting to MongoDB directly, so the only client objects using this connection string will be those created within the unified test runner. IMO, this has no bearing on the unified test runner spec's instructions that a client entity's uriOptions
should be respected.
Perhaps a clarification of this interaction would help avoid any misunderstanding.
https://jira.mongodb.com/browse/DRIVERS-828