A question on vectorizer using word2vec #5

bdqnghi · 2017-09-27T15:30:28Z

Hey, thanks for this awesome implementation, this is exactly what i'm looking for since the details of the paper is not trivial to understand.

In the vectorizer part, you adopt word2vec technique to train the embedding for the AST, that's great. But I don't understand the intuition behind this, is there any reference?.

In word2vec, the embedding look up serves as a look up table, and the input is a one-hot encoding vector, if we multiply the one-hot encoding input with the embedding matrix, it will effectively just select the matrix row corresponding to the "1" in the input.

But in this case, seems not the same, after learning the embeddings, you save the embeddings along with NODE_MAP((the dictionary to store index of token in your implementation) into the pickle. how can we know that the index of the vector in the embedding table will match with the index in the NODE_MAP?

lolongcovas · 2017-12-03T21:15:41Z

Yes, the original paper uses the paper Building Program Vector Representations for Deep Learning to embed the AST node into a feature vector. This approach is quite similar to the word2vec, where the contextual information is the children in the case of AST. The source code of this implementation is found here. Looking at the code (it is a bit hard to understand...), it seems that for each AST, they build a new neural network (NN) with the same parameter W and b (for example, the NN of a AST of 2 and 3 levels will be different in terms of forward pass, but they have the same W and b).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question on vectorizer using word2vec #5

A question on vectorizer using word2vec #5

bdqnghi commented Sep 27, 2017 •

edited

Loading

lolongcovas commented Dec 3, 2017

A question on vectorizer using word2vec #5

A question on vectorizer using word2vec #5

Comments

bdqnghi commented Sep 27, 2017 • edited Loading

lolongcovas commented Dec 3, 2017

bdqnghi commented Sep 27, 2017 •

edited

Loading