Re-running a computational service that produce >5GiB file fails when uploading #5102

sanderegg · 2023-11-28T18:14:40Z

Is there an existing issue for this?

I have searched the existing issues

Which deploy/s?

No response

Current Behavior

Run a python-runner with the following python file:

import os


def generate_large_file(file_path, size_gb):
    with open(file_path, "wb") as f:
        # Generate random data
        chunk_size = 1024 * 1024  # 1 MB chunks
        for _ in range(size_gb * 1024):
            data = os.urandom(chunk_size)
            f.write(data)


def create_file():
    # Example usage:
    desired_file_path = "/outputs/large_file.txt"
    desired_size_gb = 6  # Adjust this to your desired size in gigabytes

    generate_large_file(desired_file_path, desired_size_gb)

    print(f"Large file created: {desired_file_path}")


if __name__ == "__main__":
    create_file()

This will create a 6GB file in the outputs, that will be uploaded to S3.

Run the same computation again, and it fails while uploading

Expected Behavior

Running the same computation should not fail during upload

Steps To Reproduce

use the python code described above as input to an osparc-python-runner:1.2.0
run it once
run it a second time
--> failure on upload (sometime it goes through when lucky)

Anything else?

There are 2 problems there or 2 options to fix that issue:

the frontend currently throws many calls to the backend to get information about the file being computed on (this happens in Study.js: line 400:

nodeUpdated: function(nodeUpdatedData) {
      const studyId = nodeUpdatedData["project_id"];
      if (studyId !== this.getUuid()) {
        return;
      }
      const nodeId = nodeUpdatedData["node_id"];
      const nodeData = nodeUpdatedData["data"];
      const workbench = this.getWorkbench();
      const node = workbench.getNode(nodeId);
      if (node && nodeData) {
        node.setOutputData(nodeData.outputs);
``` for some reason calling _setOutputData_ triggers a call into the webserver --> storage to get the file information, which in turns gets the file from S3, which exists from the previous run. This triggers the fact that the database is then lazily updated with the wrong file, and then the multipart upload gets cancelled, thus the issue.

Why these calls are done is unclear to me. Removing them and only updating at the end of the computation should be enough.

2. storage could delete the file when the computation is started again, this would ensure that the issue does not happen. Since versioning can be setup, we could even restore the file in case the computation gets aborted.

sanderegg · 2023-11-28T18:15:07Z

@jsaq007 @ignapas maybe we could discuss that at some point.

ignapas · 2023-11-29T13:18:54Z

@jsaq007 @ignapas maybe we could discuss that at some point.

anytime

jsaq007 · 2023-11-30T10:31:51Z

Sure let's talk @ignapas and @sanderegg

sanderegg · 2025-01-21T12:27:25Z

this was fixed

sanderegg added the bug buggy, it does not work as expected label Nov 28, 2023

sanderegg assigned ignapas, sanderegg and jsaq007 Nov 28, 2023

sanderegg mentioned this issue Nov 29, 2023

Storage/handle file overwrite #5108

Merged

4 tasks

sanderegg closed this as completed Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-running a computational service that produce >5GiB file fails when uploading #5102

Re-running a computational service that produce >5GiB file fails when uploading #5102

sanderegg commented Nov 28, 2023

sanderegg commented Nov 28, 2023

ignapas commented Nov 29, 2023

jsaq007 commented Nov 30, 2023

sanderegg commented Jan 21, 2025

Re-running a computational service that produce >5GiB file fails when uploading #5102

Re-running a computational service that produce >5GiB file fails when uploading #5102

Comments

sanderegg commented Nov 28, 2023

Is there an existing issue for this?

Which deploy/s?

Current Behavior

Expected Behavior

Steps To Reproduce

Anything else?

sanderegg commented Nov 28, 2023

ignapas commented Nov 29, 2023

jsaq007 commented Nov 30, 2023

sanderegg commented Jan 21, 2025