You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, thanks for this awesome implementation, this is exactly what i'm looking for since the details of the paper is not trivial to understand.
In the vectorizer part, you adopt word2vec technique to train the embedding for the AST, that's great. But I don't understand the intuition behind this, is there any reference?.
In word2vec, the embedding look up serves as a look up table, and the input is a one-hot encoding vector, if we multiply the one-hot encoding input with the embedding matrix, it will effectively just select the matrix row corresponding to the "1" in the input.
But in this case, seems not the same, after learning the embeddings, you save the embeddings along with NODE_MAP((the dictionary to store index of token in your implementation) into the pickle. how can we know that the index of the vector in the embedding table will match with the index in the NODE_MAP?
The text was updated successfully, but these errors were encountered:
Yes, the original paper uses the paper Building Program Vector Representations for Deep Learning to embed the AST node into a feature vector. This approach is quite similar to the word2vec, where the contextual information is the children in the case of AST. The source code of this implementation is found here. Looking at the code (it is a bit hard to understand...), it seems that for each AST, they build a new neural network (NN) with the same parameter W and b (for example, the NN of a AST of 2 and 3 levels will be different in terms of forward pass, but they have the same W and b).
Hey, thanks for this awesome implementation, this is exactly what i'm looking for since the details of the paper is not trivial to understand.
In the vectorizer part, you adopt word2vec technique to train the embedding for the AST, that's great. But I don't understand the intuition behind this, is there any reference?.
In word2vec, the embedding look up serves as a look up table, and the input is a one-hot encoding vector, if we multiply the one-hot encoding input with the embedding matrix, it will effectively just select the matrix row corresponding to the "1" in the input.
But in this case, seems not the same, after learning the embeddings, you save the embeddings along with NODE_MAP((the dictionary to store index of token in your implementation) into the pickle. how can we know that the index of the vector in the embedding table will match with the index in the NODE_MAP?
The text was updated successfully, but these errors were encountered: