Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

feat!: Multiple Chunk Upload Support #486

Merged
merged 19 commits into from
Dec 30, 2021

Conversation

danielbankhead
Copy link
Contributor

@danielbankhead danielbankhead commented Dec 13, 2021

Adding support for multiple chunk uploads. Required a sizable refactor and dozens of new tests - potentially breaking change.

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/gcs-resumable-upload API. label Dec 13, 2021
src/index.ts Outdated Show resolved Hide resolved
@ddelgrosso1
Copy link
Contributor

Just a general comment, I took a look at the newly added code in index.ts and it makes sense to me. It looks like as of now all the existing unit and conformance tests still work. Have you verified that you do indeed see multiple HTTP requests when a chunk size is supplied, i.e. against test bench?

@danielbankhead
Copy link
Contributor Author

Working on verification now, along with additional tests. Thanks for the preview review!

@generated-files-bot
Copy link

Warning: This pull request is touching the following templated files:

@danielbankhead danielbankhead marked this pull request as ready for review December 23, 2021 19:32
@danielbankhead danielbankhead requested review from a team as code owners December 23, 2021 19:32
@danielbankhead danielbankhead requested a review from bcoe December 23, 2021 19:33
@danielbankhead danielbankhead changed the title feat: Init multi chunk upload support feat: Multiple Chunk Upload Support Dec 23, 2021
src/index.ts Outdated
request: <T = any>(
opts: GaxiosOptions
) => Promise<GaxiosResponse<T>> | GaxiosPromise<T>;
request: (opts: GaxiosOptions) => Promise<GaxiosResponse> | GaxiosPromise;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change because it was defined as any type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this potentially cause problems for folks who've provided a shim for authClient, like this:

// Use the consumer client to define storageOptions and create a GCS object.
const storageOptions = {
  projectId: 'my_project_id',
  authClient: {
    sign: () => Promise.reject('unsupported'),
    getCredentials: () => Promise.reject(),
    request: (opts, callback) => {
      return oauth2Client.request(opts, callback);
    },
    authorizeRequest: async (opts) => {
      opts = opts || {};
      const url = opts.url || opts.uri;
      const headers = await oauth2Client.getRequestHeaders(url);
      opts.headers = Object.assign(opts.headers || {}, headers);
      return opts;
    },
  },
};

Wondering if it changes the function signature if anyone has attempted doing this in TypeScript?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I reverted this change with a generic <T> (without the any)

@ddelgrosso1
Copy link
Contributor

Thanks for adding all of those tests.

src/index.ts Outdated Show resolved Hide resolved
Copy link

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions, I'm not a node expert so I'll rely on @ddelgrosso1 to give a more comprehensive review!

README.md Outdated Show resolved Hide resolved
src/index.ts Show resolved Hide resolved
src/index.ts Outdated Show resolved Hide resolved
src/index.ts Show resolved Hide resolved
src/index.ts Show resolved Hide resolved
src/index.ts Outdated Show resolved Hide resolved
// be fine for the next request.
this.unshiftChunkBuffer(dataToPrependForResending);
this.numBytesWritten -= missingBytes;
this.lastChunkSent = Buffer.alloc(0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the buffer re-allocated for each chunk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay-- just asking because I know in some languages repeated buffer allocs can be painful. Is there anyway to create one buffer and reuse it for each chunk? Or is that not really how it works in node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understandable - it doesn't really work that way in Node.js (at least for Buffer.alloc(0))

Copy link

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions, I'm not a node expert so I'll rely on @ddelgrosso1 to give a more comprehensive review!

src/index.ts Outdated Show resolved Hide resolved
@ddelgrosso1
Copy link
Contributor

I think overall the refactor looks good. There are a few failing tests in CI.

@danielbankhead
Copy link
Contributor Author

@ddelgrosso1 thanks - just fixed some tests for the #isSuccessfulResponse refactor

Copy link
Contributor

@ddelgrosso1 ddelgrosso1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I don't have any major concerns, I think the code and the tests look good. I would like some other eyes on it before merging though.

Copy link
Contributor

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts:

  1. I would be tempted to call this a breaking change, since you rewrite a lot of the original implementation, along with adding the chunk upload support, I think a major bump will help protect you from any regressions, because you can make the update to @google-cloud/storage explicit, and test throughly.
  2. When you do add this to @google-cloud/storage, I would add a few integration tests -- I think you could just generate junk data, and assert that both types of uploading work effectively.
  3. I left a note about a potential refactor, this is non-blocking, but I'd be curious to see if it makes the logic read easier -- the intermingling of single/multi chunk behavior detracts from what to me looks like a great refactor of the library.

Overall, I really like your implementation, feels like a solid cleanup to the library, and you've done a great job of adding tests.

The only blocker for me, would be I'd like to advocate for the major bump to this library -- curious what others think.

src/index.ts Outdated
request: <T = any>(
opts: GaxiosOptions
) => Promise<GaxiosResponse<T>> | GaxiosPromise<T>;
request: (opts: GaxiosOptions) => Promise<GaxiosResponse> | GaxiosPromise;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this potentially cause problems for folks who've provided a shim for authClient, like this:

// Use the consumer client to define storageOptions and create a GCS object.
const storageOptions = {
  projectId: 'my_project_id',
  authClient: {
    sign: () => Promise.reject('unsupported'),
    getCredentials: () => Promise.reject(),
    request: (opts, callback) => {
      return oauth2Client.request(opts, callback);
    },
    authorizeRequest: async (opts) => {
      opts = opts || {};
      const url = opts.url || opts.uri;
      const headers = await oauth2Client.getRequestHeaders(url);
      opts.headers = Object.assign(opts.headers || {}, headers);
      return opts;
    },
  },
};

Wondering if it changes the function signature if anyone has attempted doing this in TypeScript?

*
* @returns if the request is ok to continue as-is
*/
private async ensureUploadingSameObject() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this edge case likely to happen? and what would have happened in the previous implementation of gcs-resumbable-upload?

If this is an optimization, it might be nice to do this in follow up work just to simplify this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This preserves existing behavior - I don't think this is likely to happen (https://github.com/googleapis/gcs-resumable-upload/blob/main/src/index.ts#L577-L596)

assert.equal(data.byteLength, CHUNK_SIZE);
});

it('should prepare a valid request if the remaining data is less than `chunkSize`', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate how thorough these tests are 🎉

let cachedFirstChunk = this.get('firstChunk');
const firstChunk = chunk.slice(0, 16).valueOf();
// continue uploading next chunk
this.continueUploading();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the logic of continueUploading hasn't changed much. Is the upload endpoint a constant URI, and it's just Content-Range that changes to support multiple chunk uploads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - it's mainly Content-Range and supporting N requests per Upload (required a lot of decoupling throughout the class)

resp.status === RESUMABLE_INCOMPLETE_STATUS_CODE &&
resp.headers.range;

if (shouldContinueWithNextMultiChunkRequest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places like this, where we have almost completely different logic for chunk upload, vs., regular upload, but share a method.

It makes me wonder if the implementation would read slightly cleaner if we had, export class Upload extends Pumpify in index.ts, as it is today, but then introduced an additional class, export class ChunkUpload extends Upload.

It might introduce a bit of repeated logic, but I'm curious if it would make the implementation even more elegant.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think separating out the logic more between the two upload types would definitely make the code easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I figured after the refactor to support multiple requests most of the logic would be the same with the exception of 1 parameter (upstreamIterator in startUploading) and 3 if statements (including this one).

I was hoping shouldContinueWithNextMultiChunkRequest would make it easier to understand what's happening, but seemed to have backfired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, not blocking for me, but if you have the time to experiment with the refactor, I'd be curoius what it looks like.

README.md Outdated Show resolved Hide resolved
@ddelgrosso1
Copy link
Contributor

The only blocker for me, would be I'd like to advocate for the major bump to this library -- curious what others think.

I am onboard with making this a major version bump. I think this library doesn't see much usage as is. I don't think we will have to be porting many fixes backwards, which as I learned can be a painful exercise.

That said, if others object I want to just triple check that the API surface wasn't changed as a result of this refactor?

@danielbankhead danielbankhead changed the title feat: Multiple Chunk Upload Support feat!: Multiple Chunk Upload Support Dec 30, 2021
@danielbankhead
Copy link
Contributor Author

Thanks for the feedback @bcoe, @ddelgrosso1, & @tritone - labeled this as breaking.

@danielbankhead danielbankhead merged commit dba1a39 into main Dec 30, 2021
@danielbankhead danielbankhead deleted the multi-upload-by-chunk-support branch December 30, 2021 15:46
@release-please release-please bot mentioned this pull request Dec 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api: storage Issues related to the googleapis/gcs-resumable-upload API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants