Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in fetch #675

Closed
dsanghan opened this issue Nov 15, 2019 · 9 comments
Closed

Memory leak in fetch #675

dsanghan opened this issue Nov 15, 2019 · 9 comments

Comments

@dsanghan
Copy link

We're using exchangelib for a large number of exchange accounts and last month updated our fork to latest master and started seeing a massive spike in virtual memory consumption (several GB over a few hours). On profiling it, we isolated it to fetch. We wrote a small test case to verify that this is indeed the case and profiled it with memory_profiler using the latest release of exchangelib. The leak increases in like with the number of emails fetched.

With 10 emails:

Line #    Mem usage    Increment   Line Contents
================================================
    20     44.6 MiB     44.6 MiB   @profile
    21                             def run_test():
    22     44.6 MiB      0.0 MiB       config = Configuration(service_endpoint='https://outlook.office365.com/ews/exchange.asmx', credentials=credentials)
    23     48.2 MiB      3.6 MiB       exchange = Account(primary_smtp_address=user, config=config, autodiscover=False, access_type=DELEGATE)
    24     48.2 MiB      0.0 MiB       ids = []
    25     48.4 MiB      0.2 MiB       for saved in exchange.inbox.all().order_by('-datetime_received').only('id', 'changekey')[:10]:
    26     48.4 MiB      0.0 MiB           ids.append((saved.id, saved.changekey))
    27     51.2 MiB      2.8 MiB       msgs = list(exchange.fetch(ids, only_fields=['mime_content']))
    28     50.9 MiB      0.0 MiB       del msgs, ids

With 20 emails:

Line #    Mem usage    Increment   Line Contents
================================================
    20     44.6 MiB     44.6 MiB   @profile
    21                             def run_test():
    22     44.6 MiB      0.0 MiB       config = Configuration(service_endpoint='https://outlook.office365.com/ews/exchange.asmx', credentials=credentials)
    23     49.0 MiB      4.3 MiB       exchange = Account(primary_smtp_address=user, config=config, autodiscover=False, access_type=DELEGATE)
    24     49.0 MiB      0.0 MiB       ids = []
    25     49.1 MiB      0.1 MiB       for saved in exchange.inbox.all().order_by('-datetime_received').only('id', 'changekey')[:20]:
    26     49.1 MiB      0.0 MiB           ids.append((saved.id, saved.changekey))
    27     69.5 MiB     20.4 MiB       msgs = list(exchange.fetch(ids, only_fields=['mime_content']))
    28     69.5 MiB      0.0 MiB       del msgs, ids

To be clear, the spike is in virtual memory and not committed memory, but once you hit something like ~20GB in virtual memory, the scripts crash. Any suggestions?

@dsanghan
Copy link
Author

Fetching a single email with a 10Mb attachment seems to make the problem worse:

Line #    Mem usage    Increment   Line Contents
================================================
    21     44.6 MiB     44.6 MiB   @profile
    22                             def run_test():
    23     44.6 MiB      0.0 MiB       config = Configuration(service_endpoint='https://outlook.office365.com/ews/exchange.asmx', credentials=credentials)
    24     48.5 MiB      3.8 MiB       exchange = Account(primary_smtp_address=user, config=config, autodiscover=False, access_type=DELEGATE)
    25     48.5 MiB      0.0 MiB       ids = []
    26     48.6 MiB      0.1 MiB       for saved in exchange.inbox.all().order_by('-datetime_received').only('id', 'changekey')[:1]:
    27     48.6 MiB      0.0 MiB           ids.append((saved.id, saved.changekey))
    28    125.9 MiB     77.3 MiB       msgs = list(exchange.fetch(ids, only_fields=['mime_content']))
    29    125.9 MiB      0.0 MiB       del msgs, ids

@ecederstrand
Copy link
Owner

ecederstrand commented Nov 15, 2019

Hmm. I don't have any concrete ideas, but of course this needs to be fixed.

It could be caused by some global variable in the exchangelib package ballooning, but I'm hard pressed so see which it could be.

It would be nice to see how memory is distributed in the exchange object but I don't know of a Python package that can do that visualization.

@dsanghan
Copy link
Author

Some further insight:

   604     49.9 MiB      0.0 MiB           ret = []
   605     49.9 MiB      0.0 MiB           for i in self._consume_item_service(service_cls=GetItem, items=ids, chunk_size=chunk_size, kwargs=dict(
   606     49.9 MiB      0.0 MiB                   additional_fields=additional_fields,
   607     99.9 MiB     40.7 MiB                   shape=ID_ONLY,
   608                                     )):
   609     90.6 MiB      0.0 MiB               if isinstance(i, Exception):
   610                                             ret.append(i)
   611                                         else:
   612     99.9 MiB      9.3 MiB                   item = validation_folder.item_model_from_tag(i.tag).from_xml(elem=i, account=self)
   613     99.9 MiB      0.0 MiB                   ret.append(item)
   614     99.9 MiB      0.0 MiB           return ret

inside _consume:

   314     90.6 MiB     40.7 MiB           for i in service_cls(account=self, chunk_size=chunk_size).call(**kwargs):
   315     90.6 MiB      0.0 MiB               ret.append(i)

inside get_item:

    23     49.9 MiB      0.0 MiB           return self._pool_requests(payload_func=self.get_payload, **dict(
    24     49.9 MiB      0.0 MiB               items=items,
    25     49.9 MiB      0.0 MiB               additional_fields=additional_fields,
    26     90.6 MiB     40.7 MiB               shape=shape,
    27                                     ))

inside post_rate_limited:

for api_version in api_versions:
   131     50.5 MiB      0.0 MiB               log.debug('Trying API version %s for account %s', api_version, account)
   132     50.5 MiB      0.0 MiB               r, session = post_ratelimited(
   133     50.5 MiB      0.0 MiB                   protocol=self.protocol,
   134     50.5 MiB      0.0 MiB                   session=self.protocol.get_session(),
   135     50.5 MiB      0.0 MiB                   url=self.protocol.service_endpoint,
   136     50.5 MiB      0.0 MiB                   headers=extra_headers(account=account),
   137     50.5 MiB      0.0 MiB                   data=wrap(content=payload, version=api_version, account=account),
   138     50.5 MiB      0.0 MiB                   allow_redirects=False,
   139     81.1 MiB     30.6 MiB                   stream=self.streaming,
   140                                         )
   141     81.1 MiB      0.0 MiB               if self.streaming:
   142                                             # Let 'requests' decode raw data automatically
   143                                             r.raw.decode_content = True
   144                                         else:
   145                                             # If we're streaming, we want to wait to release the session until we have consumed the stream.
   146     81.1 MiB      0.0 MiB                   self.protocol.release_session(session)
   147     81.1 MiB      0.0 MiB               try:
   148     90.8 MiB      9.6 MiB                   header, body = self._get_soap_parts(response=r, **parse_opts)
   149                                         except ParseError as e:
   150                                             raise SOAPError('Bad SOAP response: %s' % e)

Maybe requests has a memory leak?

psf/requests#4601

@ecederstrand
Copy link
Owner

That's a possible explanation. A lot is happening in post_ratelimited() so it could also be an issue buried there.

@ecederstrand
Copy link
Owner

@dsanghan Did you get anywhere with this issue? I haven't had time to debug this yet.

@dsanghan
Copy link
Author

@ecederstrand I tried replacing requests with urllib3 but didn't see any improvement in memory. So for now, I've replaced fetching the mime_content with just fetching the text content and disabled certain features - so while it's still leaking, we're seeing smaller spikes / smaller virtual memory consumption and the micro-service is able to run for a few days before restarting from memory pressure.

Another thing I noticed was that ElementTree seems to be using a lot of memory when parsing large soap responses, so that might be worth exploring.

@dsanghan
Copy link
Author

@ecederstrand Figured it out - it's the thread pool that's leaking. In a server environment that is running 1000's of accounts, the pool is creating sessions_per_account * threads_per_session = 16 new threads per account, basically causing it to leak ~150MB every few minutes as a new account spools up. Removing the thread_pool from _pool_requests as well as removing all its references solves the massive leak (~30GB) and the usage is now down to a few hundred MB's in total.

The 1000's of threads create their own context's that don't show up in memory_profiler, or heapy, or mprof so it was fairly hard to debug.

A simple way to reproduce the issue is to use the above example. A single fetch ends up creating 20 threads. Otherwise, I didn't find any major leaks - almost all objects are properly reclaimed apart from the few cached properties. Might be worth considering adding an option to not use thread_pool's.

@ecederstrand
Copy link
Owner

Great work, @dsanghan!

I'm reopening since this will affect a lot of users. We need to find a solution for this problem in exchangelib.

@ecederstrand ecederstrand reopened this Feb 13, 2020
@ecederstrand
Copy link
Owner

ecederstrand commented Feb 13, 2020

The ThreadPool memory leak may actually have been fixed in Python in 3.8.0: python/cpython@3766f18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants