Mmap model shared between process , memory use is not stable #2883

raphaelauv · 2020-07-14T23:38:01Z

Problem description

I'm exposing a word2vec model with Gunicorn sync workers.

(and I run this gunicorn process inside one docker container )

With 1 worker there is no problem. But with multiples workers ( forks from the main gunicorn process and with preload activated ) :

mmap work well at statup , the memory use is almost the same than if I'm using only one process.
But when I start to query the API ( each call execute a indexer.model.wv.most_similar ) with a loadtest program, the memory usage grow until it stabilize.

This is with 8 sync workers (memory is growing):

with only one worker the lines stay perfectly flat :

Steps/code/corpus to reproduce

Load a word2vec Gensim model with mmap activated and fork N times , then query the model in each subprocesses.

The problem is already in stackoverflow , but no solution proposed :

https://stackoverflow.com/questions/51616074/sharing-memory-for-gensims-keyedvectors-objects-between-docker-containers/51722607#51722607

What I tried :

This is the loading code of the model

from gensim.similarities.index import AnnoyIndexer
from gensim.models import KeyedVectors
from gensim.similarities import index as gensim_index
from my_project.patch_annoy import annoy_load

gensim_index.AnnoyIndexer.load = annoy_load # monkeypatch to put self.index.load(fname, prefault=True)

indexer = AnnoyIndexer()
indexer.load(LOCAL_MODEL_PATH)
wv = KeyedVectors.load(LOCAL_VECTORS_NORMED, mmap='r')
indexer.model = wv

Also i get this message when I load the previously normalized ( init_sims(replace=True) ) model is :

setting ignored attribute vectors_norm to None

Versions

Linux-5.0.0-38-generic-x86_64-with-Ubuntu-19.04-disco
Python 3.7.3 (default, Oct  7 2019, 12:56:13) 
[GCC 8.3.0]
NumPy 1.17.1
SciPy 1.4.1
gensim 3.8.3
FAST_VERSION 1

The text was updated successfully, but these errors were encountered:

gojomo · 2020-07-15T04:31:58Z

As this is likely some behavior specific to your use, and not necessarily a bug in gensim, it'd be better to discuss at the discussion list, unless/until some bug/feature-request emerges.

But, a few thoughts:

if you've already pre-normalized the loaded set of vectors, and then after a mmap='r'load (from a not-compressed save) you've done that assignment into .vectors_norm, then actual KeyedVectors.most_similar() operations won't allocate a new cached array, and indeed any number of lookups should only cause temporary memory usage. And other processes should share the same mmapped memory (though there will still be some extra overhead for each process's word->data lookup dict).
But, you're using the AnnoyIndexer, and I'm less familiar with its memory usage patterns under traffic. I don't see any allowance in its load() to share an mmapped file like KeyedVectors supports. (And, you're patching it with extra project-specific code – I don't see 'prefault' in the AnnoyIndexer code so don't know what your patch is doing.) Are you sure you need to use AnnoyIndexer? It adds overhead - and how much performance benefit are you seeing versus direct full-scan queries? Even if you think it necessary – do you see the same memory behavior if you leave AnnoyIndexer out entirely? (It may be the main source of the extra-memory-usage-per-process.)
A lot will depend on the gunicorn mode. I haven't looked at it in ages, but if it must fork processes, it might be better to defer any word-vector loading until after a process is forked. (Even though immediately after the right kinds of process fork, the processes might share relevant only-read memory, I'm not sure further operations, including GC object relocations, would necessarily preserve that.)

piskvorky · 2020-07-15T10:02:51Z

Another lead is just plain memory fragmentation from Python. The large numpy array will only live in RAM once (assuming nothing went wrong with mmap='r', as per @gojomo 's comment), but then there's a number of other data structures in KeyedVectors that will be specific to each process: key dictionaries and the like.

So even if you managed to fork processes after loading the object, so that you have just one KeyedVector copy in RAM thanks to Linux's copy-on-write, Python will gradually force separate page copies in each process because it touches the objects during reference counting.

TL;DR: Make sure you're actually mmaping, so the big array is in RAM just once. Then check where the extra memory is creeping in from. You can use my smaps.py script, for example.

raphaelauv · 2020-09-10T09:14:34Z

Hi thanks for your answers , yes the keyedVector is loaded before to fork

I tried to use gc.freeze() after the preload in gunicorn but it's not better.
benoitc/gunicorn#1640 (comment)

[2020-09-10 09:08:28,409] INFO in serve: Starting serving predict
[2020-09-10 09:08:31,831] INFO in utils: loading Word2VecKeyedVectors object from /opt/ml/model/keyedvector
[2020-09-10 09:08:31,959] INFO in utils: setting ignored attribute vectors_norm to None
[2020-09-10 09:08:31,959] INFO in utils: loaded /opt/ml/model/keyedvector
[2020-09-10 09:08:31,959] INFO in keyedvectors: precomputing L2-norms of word weight vectors
[2020-09-10 09:08:32,058] INFO in predictor: Keyed Vectors loaded.
[2020-09-10 09:08:32,059] INFO in predictor: Loading Annoy Index...
[2020-09-10 09:08:32,059] INFO in patch_annoy: Custom Annoy loader : prefault=True
[2020-09-10 09:08:32,063] INFO in predictor: Annoy Index loaded.
[2020-09-10 09:08:32,105] INFO in predictor: Warm index
[2020-09-10 09:08:34 +0000] [46] [INFO] Starting gunicorn 20.0.4
[2020-09-10 09:08:34 +0000] [46] [INFO] Listening at: unix:/tmp/gunicorn.sock (46)
[2020-09-10 09:08:34 +0000] [46] [INFO] Using worker: sync
Objects frozen in perm gen:  165434
[2020-09-10 09:08:34 +0000] [56] [INFO] Booting worker with pid: 56
[2020-09-10 09:08:34 +0000] [57] [INFO] Booting worker with pid: 57
[2020-09-10 09:08:34 +0000] [58] [INFO] Booting worker with pid: 58
[2020-09-10 09:08:34 +0000] [59] [INFO] Booting worker with pid: 59

gojomo · 2020-09-11T01:24:51Z

As noted in my response, you might get more-assured mmap-sharing by loading the KeyedVectors into each separate process.

It's hard to understand your logging without more of your full code - what code is loading the vectors & index that's not wholly-before the gunicorn startup? How are the worker processes after "booting" sharing that earlier work?

The way the increase happens after loading, gradually over use, means it could easily be other code (including your unshown "monkey patch" to the AnnoyIndexer) that's responsible. Or, caused by the AnnoyIndexer that might not even be strictly necessary. So again: do you see the same memory behavior if you leave AnnoyIndexer out entirely?

raphaelauv · 2020-09-24T07:35:30Z

@gojomo again big thank for your help.

So I tried without the AnnoyIndexer and it's working ! There is no memory leak when multiprocessing with only gensim.

Sorry for my wrong interpretations.

raphaelauv closed this as completed Sep 24, 2020

gojomo mentioned this issue Sep 30, 2020

[MRG] Migrate tutorials & how-tos to 4.0.0 #2968

Merged

gojomo mentioned this issue Oct 20, 2020

Support multiple most_similar() queries in one call #2987

Open

mpenkov mentioned this issue Oct 28, 2020

Update changelog for 4.0.0 release #2981

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mmap model shared between process , memory use is not stable #2883

Mmap model shared between process , memory use is not stable #2883

raphaelauv commented Jul 14, 2020 •

edited

Loading

gojomo commented Jul 15, 2020 •

edited

Loading

piskvorky commented Jul 15, 2020 •

edited

Loading

raphaelauv commented Sep 10, 2020 •

edited

Loading

gojomo commented Sep 11, 2020

raphaelauv commented Sep 24, 2020

Mmap model shared between process , memory use is not stable #2883

Mmap model shared between process , memory use is not stable #2883

Comments

raphaelauv commented Jul 14, 2020 • edited Loading

Problem description

Steps/code/corpus to reproduce

Versions

gojomo commented Jul 15, 2020 • edited Loading

piskvorky commented Jul 15, 2020 • edited Loading

raphaelauv commented Sep 10, 2020 • edited Loading

gojomo commented Sep 11, 2020

raphaelauv commented Sep 24, 2020

raphaelauv commented Jul 14, 2020 •

edited

Loading

gojomo commented Jul 15, 2020 •

edited

Loading

piskvorky commented Jul 15, 2020 •

edited

Loading

raphaelauv commented Sep 10, 2020 •

edited

Loading