-
Notifications
You must be signed in to change notification settings - Fork 66
Conversation
def __create_user(self, pool_id: str, node_id: str, username: str, password: str = None, ssh_key: str = None) -> str: | ||
""" | ||
Create a pool user | ||
:param pool: the pool to add the user to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong format of args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are the old deprecated methods, I'm going to leave the docstrings as is (since they will be removed soon).
All of the new user-facing functions have docstrings in the proper format.
from aztk import models | ||
from aztk.utils import constants, helpers | ||
|
||
output_file = constants.TASK_WORKING_DIR + \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path.join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually thinking we shouldn't do path.join since this path is a Linux style path (evaluated on the node, not the client) regardless of what client runs it. So putting "/" explicitly seems better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about then we create a utility for that. I think this is root to many errors in duplicating /
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sure. created a separate issue for that: #630
.vscode/settings.json
Outdated
@@ -14,5 +14,6 @@ | |||
"python.formatting.provider": "yapf", | |||
"python.venvPath": "${workspaceFolder}/.venv/", | |||
"python.pythonPath": "${workspaceFolder}/.venv/Scripts/python.exe", | |||
"python.unitTest.pyTestEnabled": true | |||
"python.unitTest.pyTestEnabled": true, | |||
// "editor.formatOnSave": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can delete this for now. This will autoformat on save, so we only commit formatted code. I didn't enable it here since there were already so many changes.
from aztk.internal import cluster_data | ||
from aztk.utils import ssh as ssh_lib | ||
|
||
from .helpers import (create_user_on_cluster, create_user_on_node, delete_user_on_cluster, delete_user_on_node, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate on each line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what VSCode's "sort imports" spits out. Unfortunately, yapf doesn't do import formatting (or comment/docstring formatting).
I think we should align to some standard import formatter, but not sure the vscode python extension one is the right tool.
aztk/client/base/base_operations.py
Outdated
internal (:obj:`bool`, optional): if True, this will connect to the node using its internal IP. | ||
Only use this if running within the same VNET as the cluster. Defaults to False. | ||
Returns: | ||
ClusterConfiguration: Object representing the cluster's configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't that also be a type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:obj
ClusterConfiguration``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these actually weren't even being published, but I changed that so now they are fixed and are being published
|
||
|
||
def generate_application_task(spark_client, container_id, application, remote=False): | ||
resource_files = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you split this method in multiples smaller ones
|
||
software_metadata_key = "spark" | ||
|
||
vm_image = models.VmImage(publisher='Canonical', offer='UbuntuServer', sku='16.04') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we save this as a constant?
|
||
def get_cluster(spark_cluster_operations, cluster_id: str): | ||
try: | ||
pool, nodes = super(type(spark_cluster_operations), spark_cluster_operations).get(cluster_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to do that super(type thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not it fell into infinite recursion, but maybe there is a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark_cluster_operations.get() would get an infinite loop?
@@ -0,0 +1,14 @@ | |||
import azure.batch.models.batch_error as batch_error | |||
|
|||
import aztk.models # TODO: get rid of this import and use aztk.spark.models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo?
def __app_cmd(): | ||
docker_exec = CommandBuilder("sudo docker exec") | ||
docker_exec.add_argument("-i") | ||
docker_exec.add_option("-e", "AZ_BATCH_TASK_WORKING_DIR=$AZ_BATCH_TASK_WORKING_DIR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you can just do -e MY_ENV
and docker will do the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- this code has just been reshuffled, not updated. That is an easy refactor though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it looks like changing the -e env=$env
to -e env
does not work here. Possibly that syntax only works for the docker run
, not docker exec
. Not sure, but for now I'll just leave it as it works as is.
@@ -6,3 +6,4 @@ aztk.models package | |||
:members: | |||
:show-inheritance: | |||
:imported-members: | |||
:undoc-members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this adds too much noise?
task = batch_client.task.get(cluster_id, application_name) | ||
|
||
if task.state is batch_models.TaskState.active or task.state is batch_models.TaskState.preparing: | ||
# TODO: log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just reshuffled code I believe. We don't currently have a logger in the SDK (we should add one though) so this is outside the scope for this PR I think.
@@ -0,0 +1,9 @@ | |||
def delete_user(self, pool_id: str, node_id: str, username: str) -> str: | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't looks like to be too much of those left. Could change them all now to the new format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring is not published. I can update it, but it will only be discoverable in the source.
node_data = NodeData(cluster_conf).add_core().done() | ||
zip_resource_files = cluster_data.upload_node_data(node_data).to_resource_file() | ||
|
||
start_task = spark_cluster_operations._generate_cluster_start_task(core_cluster_operations, zip_resource_files, cluster_conf.cluster_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hhm shouldn't you not be calling spark_cluster_operations._generate_cluster_start_task
return self.cluster.submit(id=cluster_id, application=application, remote=remote, wait=wait) | ||
|
||
@deprecated("0.10.0") | ||
def submit_all_applications(self, cluster_id: str, applications): # NOT IMPLEMENTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOT IMPLEMENTED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a reminder to us that this function doesn't exist in the new API (i.e. I didn't just forget to do it, it's intentionally not there).
|
||
def list_clusters(core_cluster_operations): | ||
try: | ||
software_metadata_key = "spark" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea -- looks like a couple places that needed to change to models.Software.spark
Fix #591
Fix #627