Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Add Sparse embedding support #164

Merged
merged 7 commits into from
Sep 28, 2018
Merged

Add Sparse embedding support #164

merged 7 commits into from
Sep 28, 2018

Conversation

kalyc
Copy link

@kalyc kalyc commented Sep 5, 2018

Summary

Add minimal test for testing sparse embedding operator support

Related Issues

Missing Sparse operator support

PR Overview

  • [y] This PR requires new unit tests [y/n] (make sure tests are included)
  • [n] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
  • [y] This PR is backwards compatible [y/n]
  • [n] This PR changes the current API [y/n]

k_S = K.embedding(test_sparse_data, test_sparse_weight, 4, 5)
k_D = K.embedding(test_dense_data, test_dense_weight, 4, 5)

assert k_S.shape == k_D.shape

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value verification?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, also see related issue about using contrib API - apache/mxnet#12465

# Use mxnet.sym.contrib.SparseEmbedding API - https://mxnet.apache.org/api/python/symbol/contrib.html
sym = mx.sym.contrib.SparseEmbedding(data, weight=weight, input_dim=input_dim, output_dim=output_dim,
deterministic=True)
sym = mx.sym.Embedding(data, weight=weight, input_dim=input_dim, output_dim=output_dim)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if sparse_grad, you overwrite the sym here.

@@ -104,6 +105,43 @@ def test_sparse_dot(self):
assert k_s.shape == k_d.shape
assert_allclose(k_s, k_d, atol=1e-05)

def _forward_pass(self, x):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably rename to get_value() or get_data() something like that?

Copy link

@roywei roywei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! nice to see this progress!
One concern here:
K.embedding() in mxnet_backend.py is only used in keras layers Embedding class. It's not used anywhere else. So to test its functionality, you need to test on Embedding class instead of this K.embedding single operator, and use the param sparse_grad in Embedding layer. The example usage from keras users can be as following:

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
input = Input(..., sparse=True, ...)
embedding = Embedding(..., ... ) (input)
x = Dense(...)(embedding)
predictions = Dense(...)(x)
model = Model(inputs=inputs, outputs=predictions)
model.compile(...)
model.fit(...)

see reference: https://keras.io/getting-started/functional-api-guide/

Also there are many example using Embedding layer class(imdb examples). Sparse embedding should produce same result as normal embedding.

@@ -1202,13 +1202,18 @@ def gather(reference, indices):


@keras_mxnet_symbol
def embedding(data, weight, input_dim, output_dim):
def embedding(data, weight, input_dim, output_dim, sparse_grad=False):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K.embedding() is called in keras.layers.Embedding class for mxnet backend. Please also update the function signature there, and test in a end to end example that use Embedding layer. (e.g. examples using imdb dataset)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are passing default value as False for sparse_grad in the API signature so making a change there is not necessary

outputs = executor.forward(is_train=K.learning_phase())
return outputs

def test_sparse_embedding(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test similar to tests/keras/layers/embedding_test.py? use layer_test to test Embedding layer class

@kalyc
Copy link
Author

kalyc commented Sep 7, 2018

As per this issue - we will need to wait for MXNet v1.3 to release to be able to use the new API signature of mx.sym.embedding for sparse. Will update this PR when the new PIP package for MXNet is available.

@sandeep-krishnamurthy
Copy link

@kalyc - Can we move ahead with this, as we discussed to use mxnet --preview package?

@kalyc
Copy link
Author

kalyc commented Sep 27, 2018

Updated PR and tested with end-to-end imdb_lstm model with sparse_grad set to True
Removed embedding unit test as there is no data binded to the embedding symbol to test with.

Model -

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
from keras import backend as K

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()

print(K.backend())
# MXNet backend does not support dropout in LSTM and cannot automatically infer shape
if K.backend() == 'mxnet':
    # specifying input_length and removed dropout params
    model.add(Embedding(max_features, 128, input_length=maxlen, sparse_grad=True))
    model.add(LSTM(128, unroll=True))
else:
    model.add(Embedding(max_features, 128))
    model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=1,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Result -

Using MXNet backend
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
x_train shape: (25000, 80)
x_test shape: (25000, 80)
Build model...
mxnet
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/1
/anaconda2/envs/mxnet/lib/python3.4/site-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.03125). Is this intended?
  force_init=force_init)
[14:57:19] src/operator/nn/../../common/utils.h:450: Optimizer with lazy_update = True detected. Be aware that lazy update with row_sparse gradient is different from standard update, and may lead to different empirical results. See https://mxnet.incubator.apache.org/api/python/optimization/optimization.html for more details.
25000/25000 [==============================] - 242s 10ms/step - loss: 0.4519 - acc: 0.7784 - val_loss: 0.3670 - val_acc: 0.8384
25000/25000 [==============================] - 60s 2ms/step
Test score: 0.36697145671844483
Test accuracy: 0.83836

Copy link

@roywei roywei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, additional few comments

@@ -140,7 +141,10 @@ def call(self, inputs):
# K.gather is not working with Embedding layer using MXNet backend
# Refer to this issue: https://github.com/awslabs/keras-apache-mxnet/issues/63
if K.backend() == "mxnet":
out = K.embedding(inputs, self.embeddings, self.input_dim, self.output_dim)
if self.sparse_grad:
out = K.embedding(inputs, self.embeddings, self.input_dim, self.output_dim, sparse_grad=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it alway True? sparse_grad=self. sparse_grad ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the condition will pass only when self.sparse_grad=True, updated the function call for more clarity

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simplify to not use if else:
out = K.embedding(inputs, self.embeddings, self.input_dim, self.output_dim, sparse_grad=self.sparse_grad)

@@ -78,14 +78,14 @@ def __init__(self, input_dim, output_dim,
embeddings_constraint=None,
mask_zero=False,
input_length=None,
sparse_grad=False,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this in doc string explaining the usage and note it's only for mxnet backend

@@ -160,6 +160,5 @@ def test_sparse_concat_axis_non_zero(self):
assert k_s_d.shape == k_d.shape
assert_allclose(k_s_d, k_d, atol=1e-05)


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add sparse test here, add a case where sparse_grad is true: https://github.com/awslabs/keras-apache-mxnet/blob/master/tests/keras/layers/embeddings_test.py

@kalyc kalyc changed the title Add Sparse embedding operator test Add Sparse embedding support Sep 28, 2018
Copy link

@roywei roywei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@roywei roywei merged commit e7d3849 into awslabs:dev Sep 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants