Skip to content
This repository has been archived by the owner on Feb 3, 2021. It is now read-only.

Feature: SDK refactor #622

Merged
merged 52 commits into from
Aug 3, 2018
Merged

Feature: SDK refactor #622

merged 52 commits into from
Aug 3, 2018

Conversation

jafreck
Copy link
Member

@jafreck jafreck commented Jul 2, 2018

Fix #591
Fix #627

  • Docstrings for ALL public facing fucntions
  • Update sdk_example.py

def __create_user(self, pool_id: str, node_id: str, username: str, password: str = None, ssh_key: str = None) -> str:
"""
Create a pool user
:param pool: the pool to add the user to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong format of args

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the old deprecated methods, I'm going to leave the docstrings as is (since they will be removed soon).

All of the new user-facing functions have docstrings in the proper format.

from aztk import models
from aztk.utils import constants, helpers

output_file = constants.TASK_WORKING_DIR + \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path.join?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually thinking we shouldn't do path.join since this path is a Linux style path (evaluated on the node, not the client) regardless of what client runs it. So putting "/" explicitly seems better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about then we create a utility for that. I think this is root to many errors in duplicating /

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sure. created a separate issue for that: #630

@@ -14,5 +14,6 @@
"python.formatting.provider": "yapf",
"python.venvPath": "${workspaceFolder}/.venv/",
"python.pythonPath": "${workspaceFolder}/.venv/Scripts/python.exe",
"python.unitTest.pyTestEnabled": true
"python.unitTest.pyTestEnabled": true,
// "editor.formatOnSave": true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can delete this for now. This will autoformat on save, so we only commit formatted code. I didn't enable it here since there were already so many changes.

from aztk.internal import cluster_data
from aztk.utils import ssh as ssh_lib

from .helpers import (create_user_on_cluster, create_user_on_node, delete_user_on_cluster, delete_user_on_node,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate on each line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what VSCode's "sort imports" spits out. Unfortunately, yapf doesn't do import formatting (or comment/docstring formatting).

I think we should align to some standard import formatter, but not sure the vscode python extension one is the right tool.

internal (:obj:`bool`, optional): if True, this will connect to the node using its internal IP.
Only use this if running within the same VNET as the cluster. Defaults to False.
Returns:
ClusterConfiguration: Object representing the cluster's configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that also be a type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:objClusterConfiguration``

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these actually weren't even being published, but I changed that so now they are fixed and are being published



def generate_application_task(spark_client, container_id, application, remote=False):
resource_files = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split this method in multiples smaller ones


software_metadata_key = "spark"

vm_image = models.VmImage(publisher='Canonical', offer='UbuntuServer', sku='16.04')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we save this as a constant?


def get_cluster(spark_cluster_operations, cluster_id: str):
try:
pool, nodes = super(type(spark_cluster_operations), spark_cluster_operations).get(cluster_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to do that super(type thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not it fell into infinite recursion, but maybe there is a better way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark_cluster_operations.get() would get an infinite loop?

@@ -0,0 +1,14 @@
import azure.batch.models.batch_error as batch_error

import aztk.models # TODO: get rid of this import and use aztk.spark.models
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo?

def __app_cmd():
docker_exec = CommandBuilder("sudo docker exec")
docker_exec.add_argument("-i")
docker_exec.add_option("-e", "AZ_BATCH_TASK_WORKING_DIR=$AZ_BATCH_TASK_WORKING_DIR")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can just do -e MY_ENV and docker will do the same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- this code has just been reshuffled, not updated. That is an easy refactor though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like changing the -e env=$env to -e env does not work here. Possibly that syntax only works for the docker run, not docker exec. Not sure, but for now I'll just leave it as it works as is.

@@ -6,3 +6,4 @@ aztk.models package
:members:
:show-inheritance:
:imported-members:
:undoc-members:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this adds too much noise?

task = batch_client.task.get(cluster_id, application_name)

if task.state is batch_models.TaskState.active or task.state is batch_models.TaskState.preparing:
# TODO: log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just reshuffled code I believe. We don't currently have a logger in the SDK (we should add one though) so this is outside the scope for this PR I think.

@@ -0,0 +1,9 @@
def delete_user(self, pool_id: str, node_id: str, username: str) -> str:
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't looks like to be too much of those left. Could change them all now to the new format

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is not published. I can update it, but it will only be discoverable in the source.

node_data = NodeData(cluster_conf).add_core().done()
zip_resource_files = cluster_data.upload_node_data(node_data).to_resource_file()

start_task = spark_cluster_operations._generate_cluster_start_task(core_cluster_operations, zip_resource_files, cluster_conf.cluster_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hhm shouldn't you not be calling spark_cluster_operations._generate_cluster_start_task

return self.cluster.submit(id=cluster_id, application=application, remote=remote, wait=wait)

@deprecated("0.10.0")
def submit_all_applications(self, cluster_id: str, applications): # NOT IMPLEMENTED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT IMPLEMENTED?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a reminder to us that this function doesn't exist in the new API (i.e. I didn't just forget to do it, it's intentionally not there).


def list_clusters(core_cluster_operations):
try:
software_metadata_key = "spark"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that a constant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea -- looks like a couple places that needed to change to models.Software.spark

@jafreck jafreck merged commit b18eb69 into Azure:master Aug 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants