Provide a configurable batch size to reduce memory usage #36
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #25
Looks like there are a couple of reports in #25 of Stitch running out of memory while running this tap causing it to fail with a
-9
exit code. This PR adds a configurablebatch_size
parameter which controls the number of records fetched for each stream. When the batch size is set to a number like 100, or 1000, this tap uses considerably less memory than the default of 2500 records per batch.I tried really hard to make this work natively with the FuelSDK, but I encountered a couple of problems:
getMoreResults
method in the FuelSDK ignores the BatchSize option provided on the cursor objectm_filter
argument instead ofm_options
To make this work, I instead pulled the getMoreResults implementation out of the FuelSDK module and into this tap so that I could change the definition of
ET_Continue
. In theET_Continue
constructor, I added this line of code:All told, this correctly sets the batch size for requests to the Salesforce Marketing Cloud / Exact Target API. I verified this with a new log line that reports the number of rows synced in each batch.
As-is, this PR won't actually fix the memory issues we're seeing in Stitch. I preserved all of the old defaults, so this tap should work 100% exactly the way it did prior to this PR. My hope is that Stitch can either expose the
batch_size
config in the UI, or set it to some lower value (like 1000, or 100) as required on their servers.Last, I did some quick profiling to figure out what kind of impact this change had on memory usage for the tap. The default batch size (2500 rows) peaks around 1gb of memory, which I assume is the limit that Stitch has in place for taps? The lower batch sizes use considerably less memory, but will require more API calls. This is non-scientific, but it helped confirm for me that changing the batch size reduces memory usage:
Please let me know if you have any questions - we're keen to get this tap running successfully in Stitch!