Provide a configurable batch size to reduce memory usage #36

drewbanin · 2019-08-07T22:11:41Z

fixes #25

Looks like there are a couple of reports in #25 of Stitch running out of memory while running this tap causing it to fail with a -9 exit code. This PR adds a configurable batch_size parameter which controls the number of records fetched for each stream. When the batch size is set to a number like 100, or 1000, this tap uses considerably less memory than the default of 2500 records per batch.

I tried really hard to make this work natively with the FuelSDK, but I encountered a couple of problems:

The getMoreResults method in the FuelSDK ignores the BatchSize option provided on the cursor object
There are serious logic errors in the FuelSDK codebase, like this one which incorrectly checks the m_filter argument instead of m_options

To make this work, I instead pulled the getMoreResults implementation out of the FuelSDK module and into this tap so that I could change the definition of ET_Continue. In the ET_Continue constructor, I added this line of code:

        ws_continueRequest.Options.BatchSize = batch_size

All told, this correctly sets the batch size for requests to the Salesforce Marketing Cloud / Exact Target API. I verified this with a new log line that reports the number of rows synced in each batch.

As-is, this PR won't actually fix the memory issues we're seeing in Stitch. I preserved all of the old defaults, so this tap should work 100% exactly the way it did prior to this PR. My hope is that Stitch can either expose the batch_size config in the UI, or set it to some lower value (like 1000, or 100) as required on their servers.

Last, I did some quick profiling to figure out what kind of impact this change had on memory usage for the tap. The default batch size (2500 rows) peaks around 1gb of memory, which I assume is the limit that Stitch has in place for taps? The lower batch sizes use considerably less memory, but will require more API calls. This is non-scientific, but it helped confirm for me that changing the batch size reduces memory usage:

Please let me know if you have any questions - we're keen to get this tap running successfully in Stitch!

tap_exacttarget/endpoints/data_extensions.py

KAllan357 · 2019-08-21T17:01:26Z

👍 thanks

reduce memory usage by providing a configurable batch size

eb80e37

KAllan357 reviewed Aug 21, 2019

View reviewed changes

tap_exacttarget/endpoints/data_extensions.py Outdated Show resolved Hide resolved

pr feedback: cast batch_size config to int

dd74c1f

KAllan357 merged commit 22a9b9e into singer-io:master Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a configurable batch size to reduce memory usage #36

Provide a configurable batch size to reduce memory usage #36

drewbanin commented Aug 7, 2019

KAllan357 commented Aug 21, 2019

Provide a configurable batch size to reduce memory usage #36

Provide a configurable batch size to reduce memory usage #36

Conversation

drewbanin commented Aug 7, 2019

KAllan357 commented Aug 21, 2019