Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Expose multipart upload #63

Merged

Conversation

bisgaard-itis
Copy link
Collaborator

@bisgaard-itis bisgaard-itis commented Aug 18, 2023

What do these changes do?

  • Update openapi.json to the latest version from https://github.com/ITISFoundation/osparc-simcore master
  • Expose multipart upload and introduce e2e test which uploads a 10 GB file.
  • Introduce async generator for the pagination iterator and add this to e2e test

Related issue/s

How to test

@bisgaard-itis bisgaard-itis added the enhancement New feature or request label Aug 18, 2023
@bisgaard-itis bisgaard-itis added this to the Baklava milestone Aug 18, 2023
@bisgaard-itis bisgaard-itis self-assigned this Aug 18, 2023
Version(osparc.__version__) < Version("0.6.0"),
reason=f"osparc.__version__={osparc.__version__} is older than 0.6.0",
)
def test_upload_file(tmp_path: Path, cfg: osparc.Configuration) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcrespov should I skip this file until we have the delete operation? As it is it uploads a 10GB file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!
Some style/minor comments

clients/python/test/e2e/test_files_api.py Outdated Show resolved Hide resolved
Version(osparc.__version__) < Version("0.6.0"),
reason=f"osparc.__version__={osparc.__version__} is older than 0.6.0",
)
def test_upload_file(tmp_path: Path, cfg: osparc.Configuration) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please.

clients/python/test/e2e/test_files_api.py Outdated Show resolved Hide resolved
clients/python/test/e2e/test_files_api.py Show resolved Hide resolved
with osparc.ApiClient(cfg) as api_client:
files_api: osparc.FilesApi = osparc.FilesApi(api_client=api_client)
uploaded_file: osparc.File = files_api.upload_file(tmp_file)
downloaded_file = files_api.download_file(uploaded_file.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THOUGHT: it would be nice to have an optional parameter in files_api.download_file to set where I want to download the file.

def download_file(self, file_id, *, destination_folder: Path | None = None) -> Path:
    ...

Would it be easy to do that in the client? This way the user can avoid copy/moving things around.

and the test would be

downloaded_file = files_api.download_file(uploaded_file.id, destination_folder = tmp_path)
assert downloaded_file.parent == temp_path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add this functionality to the client and add the assert to the test

clients/python/client/osparc/_utils.py Outdated Show resolved Hide resolved
clients/python/client/osparc/_utils.py Outdated Show resolved Hide resolved
clients/python/client/osparc/_http_client.py Outdated Show resolved Hide resolved
@@ -53,6 +38,9 @@ def test_get_jobs(configuration: osparc.Configuration):
tmp_iter = solvers_api.get_jobs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIONR: is solvers_api.get_jobs called like that? Listing jobs should be called solvers_api.list_jobs. All listings are by convention called that way.

MINOR: I find interesting the naming tmp. A generation is indeed something of one use but the idea is that the use does not even feel it, i.e. the user would write something like

for job in solvers_api.list_jobs(...):
    print(job)
    # do something with job

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can call it list_jobs if you want. But for me "list" is a loaded term in Python. I would expect such a method to return me a list (i.e. it seems to me to violate the principle of lease surprise 😉).

for job in solvers_api.list_jobs(...):
    print(job)
    # do something with job

works now. However more than that one can also do len(solvers_api.list_jobs(...)) to get the number of elements the iterator can produce. I find that quite useful. That's why I used te tmp_...-thing in the test. To get an object I could call len on.

@@ -63,6 +51,17 @@ def test_get_jobs(configuration: osparc.Configuration):
for elm in tmp_iter:
assert isinstance(elm, osparc.Job)

async def return_iter_content(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this but perhaps you can write a separate async test for clarity?
both tests should like the same except that in one case is asyng test_ and the other is not

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have deleted the async part of the iterator for now. Reason: since I use the links I get from the first call to the api directly, the iterator now needs to hold its own http client. Hence, to have a sync and an async version it makes sense to have two different types of iterators: One holding a sync client and one holding an async client. Until someone tells me we need the async one I will stick with the sync one.

@bisgaard-itis bisgaard-itis requested a review from pcrespov August 24, 2023 12:19
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

client_file: ClientFile = ClientFile(
filename=file.name, filesize=file.stat().st_size
)
client_upload_schema: ClientFileUploadSchema = self._super.get_upload_links(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MINOR: In reality client_upload_schema is NOT a schema but data. Note that pydantic class builds the schema but the instance is a data object. For that reason I try avoiding the word Schema in the classes. This should be modified in the server as well.

BTW, we try to follow a naming convention for request payloads and response bodies. We use the anem of the resource and then a verb at the end. For instance, ProjectGet would be the response body for a GET /project/{id} resource

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, concerning the schema thing I will address this among other small things I need to do on the API server: ITISFoundation/osparc-simcore#4641

f"destination_folder: {destination_folder} must be a directory"
)
downloaded_file: Path = Path(super().download_file(file_id))
if destination_folder is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not using https://docs.python.org/3/library/tempfile.html#tempfile.mkdtemp if `destination_folder is not given?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downloaded file is already located in the temporary directory by the automatically generated client. So that shouldn't be necessary.


tasks: list = []
async with AsyncHttpClient(
exc_req_typ="post", exc_url=links.abort_upload, exc_auth=self._auth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you really like writing acronyms :-) ... do not be shy. You can write long names. what is exc here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry. This should already have been fixed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done now

@pytest.mark.skipif(
Version(osparc.__version__) < Version("0.6.0"),
reason=f"osparc.__version__={osparc.__version__} is older than 0.6.0",
)
def test_get_jobs(configuration: osparc.Configuration):
"""Test the get_jobs method
def test_list_jobs(cfg: osparc.Configuration):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why cfg is better than configuration :-) ? these mathematicians ;-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture has to be used every where. So I am simply trying to save some screen-ink here.

solvers_api: osparc.SolversApi = osparc.SolversApi(api_client)
sleeper: osparc.Solver = solvers_api.get_solver_release(solver, version)

# initial iterator
init_iter = solvers_api.get_jobs(sleeper.id, sleeper.version, limit=3)
init_iter = solvers_api.list_jobs(sleeper.id, sleeper.version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you did not want to prefix it as list_!
I would like the operation ID to be called list since in the API these are denoted List method https://cloud.google.com/apis/design/standard_methods#list but feel free to map it to e.g. solvers_api.iter_jobs or simply solvers.jobs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I like solvers_api.jobs. I will go for that one


final_iter = solvers_api.get_jobs(sleeper.id, sleeper.version, limit=3)
final_iter = solvers_api.list_jobs(sleeper.id, sleeper.version)
assert len(final_iter) > 0, "No jobs were available"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if final_iter is an iterator ... then len() should not work. Probably this return a list.?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final_iter is a PaginationGenerator. I decided to create that intermediary class instead of directly returning an iterator in order for it to have a __len__ method. However, PaginationGenerator has an __iter__ method, so

for elm in final_iter:
    ...

works

@bisgaard-itis bisgaard-itis merged commit ad0c87c into ITISFoundation:master Aug 25, 2023
@bisgaard-itis bisgaard-itis deleted the expose-multipart-upload branch August 25, 2023 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5Gb limit fix bug in API that limits to 2GB OverflowError in public API
2 participants