Support for continuous integration data uploads #396

austinkelleher · 2020-12-16T11:46:37Z

Our previous integration flow had two primary phases:

Phase 1: Collect all of the data
Phase 2: Upload all of the data

These changes improve this drastically by mixing the two phases. We will now queue up graph object uploads in each step and ensure that all of the uploads have successfully completed before triggering dependent steps.

ndowmon

Another sweet change. Added some comments for your review.

ndowmon · 2020-12-16T14:57:01Z

packages/integration-sdk-runtime/src/execution/dependencyGraph.ts

+        try {
+          // Failing to upload all integration data should not be considered a
+          // fatal failure. We just want to make this step as a partial success
+          // and move on with our lives!


ndowmon · 2020-12-16T15:02:42Z

packages/integration-sdk-runtime/src/execution/dependencyGraph.ts

@@ -284,6 +297,20 @@ export function executeStepDependencyGraph<
        status = StepResultStatus.FAILURE;
      }

+      await context.jobState.flush();


Interesting. Looks like we now flush after each new step. 😈 advocate: do we want to change the implementation of the maps in localGraphDataStore, which tracks stepId and _key for entities? It seems like we should no longer need to track stepId now.

This is confusing...I had slightly changed the behavior in the previous PR and ended up changing it back to the original in this PR. Should have just rolled back the change in the first PR. See here: https://github.com/JupiterOne/sdk/pull/395/files#diff-8912f1fec8d408545c592a8420e4d837119d01694950f32598718ba7abc57d7aL287

ndowmon · 2020-12-16T15:12:07Z

packages/integration-sdk-runtime/src/execution/uploader.ts

+    async enqueue(graphObjectData) {
+      if (queue.isPaused) {
+        // This step already failed an upload. We do not want to enqueue more
+        // for this step.
+        return;
+      }
+
+      // OPTIMIZATION: We do not want to buffer a lot of graph objects
+      // into memory inside of the queue. If the queue concurrency has been
+      // reached, we wait for the queue to flush so that this step has the
+      // opportunity to upload more data.
+      if (
+        queue.pending >= uploadConcurrency ||
+        queue.size >= uploadConcurrency
+      ) {
+        if (onThrottleEnqueue) {
+          // Mainly just used for testing that our custom throttling works.
+          onThrottleEnqueue();
+        }
+
+        await queue.onIdle();
+      }


Should the enqueue function be async (does it have to be)? I'm thinking from the caller's perspective:

await graphObjectStore.addEntities(stepId, entities, async (entities) => uploader?.enqueue({ entities, relationships: [], }), );

Maybe the caller should only need to wait for entities to hit the graph object store, and not for these to be published to the persister. 🤷

It does have to be async. See this comment: https://github.com/JupiterOne/sdk/pull/396/files#diff-38e59a3db7780e509c55171310fde11a5d109af248e2b859c4f0378ee65e049fR47

ndowmon · 2020-12-16T15:25:57Z

packages/integration-sdk-runtime/src/synchronization/index.ts

-              'Error uploading collected data',
-            );
-          }
+          await uploadGraphObjectData(


I see that uploadCollectedData is still being called from synchronizeCollectedData (which is used in the j1-integration sync command). I haven't looked at the follow-on PRs yet so maybe you will do this, but I think it'd be unsafe to leave this functionality in the off chance that it duplicates uploads to the persister.

I think we still want this. I don't personally use the sync command. run collects & uploads. sync only uploads. The sync command assumes that you already have data on disk that now should be uploaded.

It seems kind of weird to me that there is any interaction with the persister at all in the open-source SDK as that is J1 specific. Am I wrong with the goal of the SDK?

ndowmon · 2020-12-16T15:29:11Z

...-runtime/src/storage/FileSystemGraphObjectStore/__tests__/FileSystemGraphObjectStore.test.ts

+    expect(flushedRelationshipsCollected).toEqual([r1, r2, r3]);
+  });
+
+  test('#flushEntitiesToDisk should call flush callback when buffer threshold reached', async () => {


Seems like copy-paste error? The buffer threshold has not been reached here.

Thanks. Will fix.

ndowmon · 2020-12-16T15:33:26Z

packages/integration-sdk-runtime/src/execution/uploader.ts

+        });
+    },
+
+    async waitUntilUploadsComplete() {


I saw something about queue.isPaused in src/execution/uploader.ts, should that be set somewhere? Here? IDK.

Good question, but no it should not be. We should only pause execution of the queue when we see an error. This function rethrows a collection of errors that we saw.

ndowmon · 2020-12-16T15:36:51Z

packages/integration-sdk-runtime/src/execution/jobState.ts

+      ),
+
+    async waitUntilUploadsComplete() {
+      await uploader?.waitUntilUploadsComplete();


I may not have read the code correctly, but it does seem since uploader.enqueue is async and calls await queue.onIdle(), this will always return right away...

I don't think that would be the case. The enqueue method is, in fact, async, but waiting for the enqueued function to actually settle is not async. The only reason why the enqueue function is async is so that we can throttle how many functions we are pushing into our queue. Throttling the enqueue will prevent us from buffering too many entities and relationships into memory while the uploads haven't actually completed.

See: https://github.com/JupiterOne/sdk/pull/396/files#diff-38e59a3db7780e509c55171310fde11a5d109af248e2b859c4f0378ee65e049fR47

ndowmon · 2020-12-16T15:38:13Z

packages/integration-sdk-runtime/src/execution/__tests__/dependencyGraph.test.ts

@@ -622,6 +630,156 @@ describe('executeStepDependencyGraph', () => {
    expect(spyB).toHaveBeenCalledBefore(spyC);
  });

+  test('should mark steps failed executionHandlers with status FAILURE a dependent steps with status PARTIAL_SUCCESS_DUE_TO_DEPENDENCY_FAILURE when step upload fails', async () => {


I think there are typos here? Hard to understand.

Suggested change

test('should mark steps failed executionHandlers with status FAILURE a dependent steps with status PARTIAL_SUCCESS_DUE_TO_DEPENDENCY_FAILURE when step upload fails', async () => {

test('should mark steps with failed executionHandlers with status FAILURE and dependent steps with status PARTIAL_SUCCESS_DUE_TO_DEPENDENCY_FAILURE when step upload fails', async () => {

I guess this also doesn't seem like a test that's relevant to this PR. The scope seems to be about dependencies here, although the functionality is about creating a failing uploader. Consider trimming this, breaking it apart, or making the test name clearer.

Will fix the grammar. Thanks for the suggestion. The test is relevant to the overall changes though. The test validates that our uploads respect the existing behavior of our internal dependency graph.

ndowmon · 2020-12-16T15:47:50Z

packages/integration-sdk-runtime/src/execution/uploader.test.ts

+          await sleep(100);
+          uploaded.push(d);
+        } else {
+          numQueued++;
+          await sleep(200);


What's the purpose of these sleep()s? It's not very clear to me and makes the test a bit more confusing.

Good question. Will add a comment for clarity. The sleeps validate that the promises are indeed executed concurrently instead of in serial. We enqueue a bunch of uploads and, as long as we haven't reached our concurrency limit, the promises will all be run concurrently.

ndowmon · 2020-12-16T15:49:39Z

packages/integration-sdk-runtime/src/execution/uploader.test.ts

+  n: number,
+) {
+  const flushed = await createAndEnqueueUploads(uploader, n);
+  await uploader.waitUntilUploadsComplete();


I'm not convinced that all data is not uploaded at this point in the process. Can you add a test to prove that un-uploaded data can exist in in the upload queue and that waitUntilUploadsComplete does indeed properly wait for uploads to complete?

Alternatively, explain it to me like I'm a 5 year old.

Perhaps I'm not following, but the function is indeed supposed to wait until uploads are complete. It's used as a utility function throughout these tests, so that we can just assert that we've collected the correct data in the correct order.

From what I can tell, the queue will never be that big, but there is a small amount of time when the final concurrent requests need to drain from the queue. I believe waitUntilUploadsComplete will wait for those final requests to resolve.

packages/integration-sdk-cli/src/commands/run.ts

Only write prettified files to the file system on local collection

Add tests for job state upload calls

Share graph object creation test utils across tests and cleanup

mknoedel · 2020-12-17T18:13:51Z

packages/integration-sdk-runtime/src/execution/uploader.ts

+          onThrottleEnqueue();
+        }
+
+        await queue.onIdle();


We might not want to wait for onIdle here, as I think it loses some concurrency benefits. I think what this means is we won't start on the next batch of uploads until every one from the last group is finished. We should likely wait on onEmpty and instead of checking that the queue.size and queue.pending are both empty, just check that the queue.size is empty. This way the maximum amount of concurrent calls are being made at all times.

aiwilliams · 2020-12-22T15:43:03Z

packages/integration-sdk-cli/src/commands/run.ts

 } from '@jupiterone/integration-sdk-runtime';

 import { loadConfig } from '../config';
 import * as log from '../log';
+import { createPersisterApiStepGraphObjectDataUploader } from '@jupiterone/integration-sdk-runtime/dist/src/execution/uploader';


Why does this have to be this way, digging into the dist/ directory?

ndowmon reviewed Dec 16, 2020

View reviewed changes

ctdio suggested changes Dec 16, 2020

View reviewed changes

packages/integration-sdk-cli/src/commands/run.ts Outdated Show resolved Hide resolved

austinkelleher added 5 commits December 16, 2020 10:58

Initial continuous upload support

6db9779

More tests around continuous uploads and various improvements

8d22caf

Additional test for FileSystemGraphObjectStore callbacks

8196339

Mark step as a failure if uploading fails in a step

1271491

Export relevant functions and types from uploader

b05beb8

austinkelleher force-pushed the 1765-continuous-uploads branch from 9426a2a to b05beb8 Compare December 16, 2020 15:59

austinkelleher added 5 commits December 16, 2020 11:37

Remove old comment, update test descriptions.

5c0ad98

Share graph object creation test utils across tests and cleanup

b1c0804

Fix typo in test function

43e57a9

Add tests for job state upload calls

2b1052c

Fix test function names

47843b7

austinkelleher requested review from aiwilliams and mknoedel December 16, 2020 17:21

austinkelleher and others added 3 commits December 16, 2020 12:39

Test assertion improvements

ca68b8a

Only write prettified files to the file system on local collection

06b3394

Change prettyFile to prettifyFiles

55bb7b5

ctdio approved these changes Dec 16, 2020

View reviewed changes

austinkelleher added 3 commits December 16, 2020 12:50

Merge pull request #399 from JupiterOne/1849-unpretty-local-files

2059ac5

Only write prettified files to the file system on local collection

Merge pull request #398 from JupiterOne/1765-continuous-upload-tests

a627688

Add tests for job state upload calls

Merge pull request #397 from JupiterOne/1848-test-cleanup

6a2c2bf

Share graph object creation test utils across tests and cleanup

austinkelleher merged commit b2b44c3 into 1786-optimize-flushing Dec 16, 2020

austinkelleher deleted the 1765-continuous-uploads branch December 16, 2020 18:28

mknoedel reviewed Dec 17, 2020

View reviewed changes

aiwilliams reviewed Dec 22, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for continuous integration data uploads #396

Support for continuous integration data uploads #396

austinkelleher commented Dec 16, 2020 •

edited

Loading

ndowmon left a comment

ndowmon Dec 16, 2020

ndowmon Dec 16, 2020 •

edited

Loading

austinkelleher Dec 16, 2020

ndowmon Dec 16, 2020

austinkelleher Dec 16, 2020

ndowmon Dec 16, 2020

austinkelleher Dec 16, 2020

mknoedel Dec 17, 2020

ndowmon Dec 16, 2020

austinkelleher Dec 16, 2020

ndowmon Dec 16, 2020

austinkelleher Dec 16, 2020

ndowmon Dec 16, 2020

austinkelleher Dec 16, 2020

ndowmon Dec 16, 2020

ndowmon Dec 16, 2020

austinkelleher Dec 16, 2020

ndowmon Dec 16, 2020

austinkelleher Dec 16, 2020

ndowmon Dec 16, 2020

austinkelleher Dec 16, 2020

mknoedel Dec 17, 2020

mknoedel Dec 17, 2020 •

edited

Loading

aiwilliams Dec 22, 2020

	test('should mark steps failed executionHandlers with status FAILURE a dependent steps with status PARTIAL_SUCCESS_DUE_TO_DEPENDENCY_FAILURE when step upload fails', async () => {
	test('should mark steps with failed executionHandlers with status FAILURE and dependent steps with status PARTIAL_SUCCESS_DUE_TO_DEPENDENCY_FAILURE when step upload fails', async () => {

Support for continuous integration data uploads #396

Support for continuous integration data uploads #396

Conversation

austinkelleher commented Dec 16, 2020 • edited Loading

ndowmon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndowmon Dec 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mknoedel Dec 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

austinkelleher commented Dec 16, 2020 •

edited

Loading

ndowmon Dec 16, 2020 •

edited

Loading

mknoedel Dec 17, 2020 •

edited

Loading