Move specific import, speedup importing time. Fix #1614. #1615

menshikh-iv · 2017-10-06T04:02:21Z

Before:

ivan@P50:~/release/gensim$ time python -c "import gensim"
Using TensorFlow backend.

real	0m1,008s
user	0m0,924s
sys	0m0,556s

After:

ivan@P50:~/release/gensim$ time python -c "import gensim"

real	0m0,299s
user	0m0,332s
sys	0m0,400s

gojomo · 2017-10-06T05:17:00Z

I don't think the original complaint was just a 0.75s delay, so doesn't seem like the numbers here show the real loading cost.

I believe the best fix would move the Keras-specific convenience method elsewhere entirely. (Also, while I believe the effective re-import every time get_embedding_layer() is called is 'mostly harmless', it doesn't seem ideal to me.)

menshikh-iv · 2017-10-06T05:21:27Z

@gojomo unfortunately, if we moved this method to another place - we will lose this function. For this reason, I think we shouldn't move this method to another place.
FYI - "real import" will be run only once, with the first call of get_embedding_layer(), next call will be used already imported keras (re-import will not happen).

gojomo · 2017-10-06T05:34:51Z

Why would moving it elsewhere lose the functionality?

In my opinion it'd be fine to even just define inline in the two files that use it (one test file, one notebook). It's a 1-liner – really just a way to remind people how syn0 can be supplied to the keras Embedding constructor.

menshikh-iv · 2017-10-06T05:41:27Z

@gojomo

Why would moving it elsewhere lose the functionality?

Because it becomes difficult to find it -> nobody will use this (for example, we move this to gensim.utls)

Or your propose removing this method, replace get_embeding_layer() everywhere to direct call (in tests and notebook), like

    layer = Embedding(
        input_dim=kv.syn0.shape[0], output_dim=kv.syn0.shape[1],
        weights=[kv.syn0], trainable=False
    )

gojomo · 2017-10-06T07:36:02Z

If it's in some specific keras_utils-like module, then exactly the Keras users who need it will find it – guided there by demo code like that in the notebook.

Or, it just gets inlined or defined where needed. Or is just a documentation FAQ answer somewhere, to the question "How do I convert my KeyedVectors word-vectors into a Keras Embedding?"

It just seems awkward to me for KeyedVectors to have this trivial 1-liner that's strictly a keras-ism. It'd be better, for both modularity and avoiding the import-of-something-that-might-not-be-there-tap-dance, for keras-specific knowledge of how to turn a raw array like syn0 into the keras type Embedding to be somewhere else.

(And even if it does stay here, its name should probably include keras – say, get_keras_embedding() – to reflect its specificity here, not returning a more generic or local idea of an 'embedding'. )

menshikh-iv · 2017-10-06T07:41:14Z

Distinct submodule for one function - bad idea (especially considering that I'm currently doing refactoring and simplifying gensim).
About the name - I completely agree, this is missing information that must figured in a method name.

gojomo · 2017-10-06T18:20:22Z

My thinking would be: if keras users are important enough to add in special supporting methods, of which this could just be the first, and which introduce a noticeable import-time overhead, then I'd see a distinct module as plausible, even if it starts with just a single important function. The mere act of starting that module might encourage more contributions along that theme – keras-specific helpers/wrappers/glue – by displaying that there's a welcome space for them.

On the other hand, if keras users are a smallish special-case, and this method isn't that important, then again I'd see a method like this better handled in docs/demos, or defined-where-needed, or inlined – rather than a single special-case added to KeyedVector. (That is, I see a similar aesthetic objection applying to a single bolt-on method as with a single-function module.)

It's no big deal either way; I'm just clarifying I personally find the extremes (dedicated place in APIs, or no place in APIs) more logical than a mushy middle course (just let functionality fall into an existing place).

Move Keras import to get_embedding_layer.

8968d85

menshikh-iv requested review from gojomo and piskvorky October 6, 2017 04:02

rename get_embedding_layer as get_keras_embedding

18ad834

menshikh-iv added breaks backward-compatibility Change breaks backward compatibility performance Issue related to performance (in HW meaning) labels Oct 6, 2017

menshikh-iv merged commit c220166 into develop Oct 6, 2017

menshikh-iv deleted the fix-import branch October 6, 2017 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move specific import, speedup importing time. Fix #1614. #1615

Move specific import, speedup importing time. Fix #1614. #1615

menshikh-iv commented Oct 6, 2017

gojomo commented Oct 6, 2017

menshikh-iv commented Oct 6, 2017 •

edited

Loading

gojomo commented Oct 6, 2017

menshikh-iv commented Oct 6, 2017

gojomo commented Oct 6, 2017

menshikh-iv commented Oct 6, 2017

gojomo commented Oct 6, 2017 •

edited

Loading

Move specific import, speedup importing time. Fix #1614. #1615

Move specific import, speedup importing time. Fix #1614. #1615

Conversation

menshikh-iv commented Oct 6, 2017

gojomo commented Oct 6, 2017

menshikh-iv commented Oct 6, 2017 • edited Loading

gojomo commented Oct 6, 2017

menshikh-iv commented Oct 6, 2017

gojomo commented Oct 6, 2017

menshikh-iv commented Oct 6, 2017

gojomo commented Oct 6, 2017 • edited Loading

menshikh-iv commented Oct 6, 2017 •

edited

Loading

gojomo commented Oct 6, 2017 •

edited

Loading