Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move specific import, speedup importing time. Fix #1614. #1615

Merged
merged 2 commits into from
Oct 6, 2017

Conversation

menshikh-iv
Copy link
Contributor

Before:

ivan@P50:~/release/gensim$ time python -c "import gensim"
Using TensorFlow backend.

real	0m1,008s
user	0m0,924s
sys	0m0,556s

After:

ivan@P50:~/release/gensim$ time python -c "import gensim"

real	0m0,299s
user	0m0,332s
sys	0m0,400s

@gojomo
Copy link
Collaborator

gojomo commented Oct 6, 2017

I don't think the original complaint was just a 0.75s delay, so doesn't seem like the numbers here show the real loading cost.

I believe the best fix would move the Keras-specific convenience method elsewhere entirely. (Also, while I believe the effective re-import every time get_embedding_layer() is called is 'mostly harmless', it doesn't seem ideal to me.)

@menshikh-iv
Copy link
Contributor Author

menshikh-iv commented Oct 6, 2017

@gojomo unfortunately, if we moved this method to another place - we will lose this function. For this reason, I think we shouldn't move this method to another place.
FYI - "real import" will be run only once, with the first call of get_embedding_layer(), next call will be used already imported keras (re-import will not happen).

@gojomo
Copy link
Collaborator

gojomo commented Oct 6, 2017

Why would moving it elsewhere lose the functionality?

In my opinion it'd be fine to even just define inline in the two files that use it (one test file, one notebook). It's a 1-liner – really just a way to remind people how syn0 can be supplied to the keras Embedding constructor.

@menshikh-iv
Copy link
Contributor Author

@gojomo

Why would moving it elsewhere lose the functionality?

Because it becomes difficult to find it -> nobody will use this (for example, we move this to gensim.utls)

Or your propose removing this method, replace get_embeding_layer() everywhere to direct call (in tests and notebook), like

    layer = Embedding(
        input_dim=kv.syn0.shape[0], output_dim=kv.syn0.shape[1],
        weights=[kv.syn0], trainable=False
    )

@gojomo
Copy link
Collaborator

gojomo commented Oct 6, 2017

If it's in some specific keras_utils-like module, then exactly the Keras users who need it will find it – guided there by demo code like that in the notebook.

Or, it just gets inlined or defined where needed. Or is just a documentation FAQ answer somewhere, to the question "How do I convert my KeyedVectors word-vectors into a Keras Embedding?"

It just seems awkward to me for KeyedVectors to have this trivial 1-liner that's strictly a keras-ism. It'd be better, for both modularity and avoiding the import-of-something-that-might-not-be-there-tap-dance, for keras-specific knowledge of how to turn a raw array like syn0 into the keras type Embedding to be somewhere else.

(And even if it does stay here, its name should probably include keras – say, get_keras_embedding() – to reflect its specificity here, not returning a more generic or local idea of an 'embedding'. )

@menshikh-iv
Copy link
Contributor Author

Distinct submodule for one function - bad idea (especially considering that I'm currently doing refactoring and simplifying gensim).
About the name - I completely agree, this is missing information that must figured in a method name.

@menshikh-iv menshikh-iv added breaks backward-compatibility Change breaks backward compatibility performance Issue related to performance (in HW meaning) labels Oct 6, 2017
@menshikh-iv menshikh-iv merged commit c220166 into develop Oct 6, 2017
@menshikh-iv menshikh-iv deleted the fix-import branch October 6, 2017 10:26
@gojomo
Copy link
Collaborator

gojomo commented Oct 6, 2017

My thinking would be: if keras users are important enough to add in special supporting methods, of which this could just be the first, and which introduce a noticeable import-time overhead, then I'd see a distinct module as plausible, even if it starts with just a single important function. The mere act of starting that module might encourage more contributions along that theme – keras-specific helpers/wrappers/glue – by displaying that there's a welcome space for them.

On the other hand, if keras users are a smallish special-case, and this method isn't that important, then again I'd see a method like this better handled in docs/demos, or defined-where-needed, or inlined – rather than a single special-case added to KeyedVector. (That is, I see a similar aesthetic objection applying to a single bolt-on method as with a single-function module.)

It's no big deal either way; I'm just clarifying I personally find the extremes (dedicated place in APIs, or no place in APIs) more logical than a mushy middle course (just let functionality fall into an existing place).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaks backward-compatibility Change breaks backward compatibility performance Issue related to performance (in HW meaning)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants