Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-running a computational service that produce >5GiB file fails when uploading #5102

Closed
1 task done
sanderegg opened this issue Nov 28, 2023 · 4 comments
Closed
1 task done
Assignees
Labels
bug buggy, it does not work as expected

Comments

@sanderegg
Copy link
Member

Is there an existing issue for this?

  • I have searched the existing issues

Which deploy/s?

No response

Current Behavior

Run a python-runner with the following python file:

import os


def generate_large_file(file_path, size_gb):
    with open(file_path, "wb") as f:
        # Generate random data
        chunk_size = 1024 * 1024  # 1 MB chunks
        for _ in range(size_gb * 1024):
            data = os.urandom(chunk_size)
            f.write(data)


def create_file():
    # Example usage:
    desired_file_path = "/outputs/large_file.txt"
    desired_size_gb = 6  # Adjust this to your desired size in gigabytes

    generate_large_file(desired_file_path, desired_size_gb)

    print(f"Large file created: {desired_file_path}")


if __name__ == "__main__":
    create_file()

This will create a 6GB file in the outputs, that will be uploaded to S3.

Run the same computation again, and it fails while uploading

Expected Behavior

Running the same computation should not fail during upload

Steps To Reproduce

  1. use the python code described above as input to an osparc-python-runner:1.2.0
  2. run it once
  3. run it a second time
    --> failure on upload (sometime it goes through when lucky)

Anything else?

There are 2 problems there or 2 options to fix that issue:

  1. the frontend currently throws many calls to the backend to get information about the file being computed on (this happens in Study.js: line 400:
nodeUpdated: function(nodeUpdatedData) {
      const studyId = nodeUpdatedData["project_id"];
      if (studyId !== this.getUuid()) {
        return;
      }
      const nodeId = nodeUpdatedData["node_id"];
      const nodeData = nodeUpdatedData["data"];
      const workbench = this.getWorkbench();
      const node = workbench.getNode(nodeId);
      if (node && nodeData) {
        node.setOutputData(nodeData.outputs);
``` for some reason calling _setOutputData_ triggers a call into the webserver --> storage to get the file information, which in turns gets the file from S3, which exists from the previous run. This triggers the fact that the database is then lazily updated with the wrong file, and then the multipart upload gets cancelled, thus the issue.

Why these calls are done is unclear to me. Removing them and only updating at the end of the computation should be enough.

2. storage could delete the file when the computation is started again, this would ensure that the issue does not happen. Since versioning can be setup, we could even restore the file in case the computation gets aborted.
@sanderegg sanderegg added the bug buggy, it does not work as expected label Nov 28, 2023
@sanderegg
Copy link
Member Author

@jsaq007 @ignapas maybe we could discuss that at some point.

@ignapas
Copy link
Contributor

ignapas commented Nov 29, 2023

@jsaq007 @ignapas maybe we could discuss that at some point.

anytime

@jsaq007
Copy link
Contributor

jsaq007 commented Nov 30, 2023

Sure let's talk @ignapas and @sanderegg

@sanderegg
Copy link
Member Author

this was fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug buggy, it does not work as expected
Projects
None yet
Development

No branches or pull requests

3 participants