Add Sparse embedding support #164

kalyc · 2018-09-05T00:26:51Z

Summary

Add minimal test for testing sparse embedding operator support

Related Issues

Missing Sparse operator support

PR Overview

[y] This PR requires new unit tests [y/n] (make sure tests are included)
[n] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
[y] This PR is backwards compatible [y/n]
[n] This PR changes the current API [y/n]

sandeep-krishnamurthy · 2018-09-05T00:39:04Z

tests/keras/backend/mxnet_sparse_test.py

+        k_S = K.embedding(test_sparse_data, test_sparse_weight, 4, 5)
+        k_D = K.embedding(test_dense_data, test_dense_weight, 4, 5)
+
+        assert k_S.shape == k_D.shape


value verification?

Added, also see related issue about using contrib API - apache/mxnet#12465

sandeep-krishnamurthy · 2018-09-05T19:22:53Z

keras/backend/mxnet_backend.py

+        # Use mxnet.sym.contrib.SparseEmbedding API - https://mxnet.apache.org/api/python/symbol/contrib.html
+        sym = mx.sym.contrib.SparseEmbedding(data, weight=weight, input_dim=input_dim, output_dim=output_dim,
+                                             deterministic=True)
+    sym = mx.sym.Embedding(data, weight=weight, input_dim=input_dim, output_dim=output_dim)


if sparse_grad, you overwrite the sym here.

sandeep-krishnamurthy · 2018-09-05T19:24:48Z

tests/keras/backend/mxnet_sparse_test.py

@@ -104,6 +105,43 @@ def test_sparse_dot(self):
        assert k_s.shape == k_d.shape
        assert_allclose(k_s, k_d, atol=1e-05)

+    def _forward_pass(self, x):


nit: probably rename to get_value() or get_data() something like that?

roywei

Thank you for your contribution! nice to see this progress!
One concern here:
K.embedding() in mxnet_backend.py is only used in keras layers Embedding class. It's not used anywhere else. So to test its functionality, you need to test on Embedding class instead of this K.embedding single operator, and use the param sparse_grad in Embedding layer. The example usage from keras users can be as following:

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
input = Input(..., sparse=True, ...)
embedding = Embedding(..., ... ) (input)
x = Dense(...)(embedding)
predictions = Dense(...)(x)
model = Model(inputs=inputs, outputs=predictions)
model.compile(...)
model.fit(...)

see reference: https://keras.io/getting-started/functional-api-guide/

Also there are many example using Embedding layer class(imdb examples). Sparse embedding should produce same result as normal embedding.

roywei · 2018-09-05T19:39:32Z

keras/backend/mxnet_backend.py

@@ -1202,13 +1202,18 @@ def gather(reference, indices):


 @keras_mxnet_symbol
-def embedding(data, weight, input_dim, output_dim):
+def embedding(data, weight, input_dim, output_dim, sparse_grad=False):


K.embedding() is called in keras.layers.Embedding class for mxnet backend. Please also update the function signature there, and test in a end to end example that use Embedding layer. (e.g. examples using imdb dataset)

We are passing default value as False for sparse_grad in the API signature so making a change there is not necessary

roywei · 2018-09-05T19:41:56Z

tests/keras/backend/mxnet_sparse_test.py

+        outputs = executor.forward(is_train=K.learning_phase())
+        return outputs
+
+    def test_sparse_embedding(self):


Could you add a test similar to tests/keras/layers/embedding_test.py? use layer_test to test Embedding layer class

kalyc · 2018-09-07T21:25:03Z

As per this issue - we will need to wait for MXNet v1.3 to release to be able to use the new API signature of mx.sym.embedding for sparse. Will update this PR when the new PIP package for MXNet is available.

sandeep-krishnamurthy · 2018-09-19T18:18:12Z

@kalyc - Can we move ahead with this, as we discussed to use mxnet --preview package?

kalyc · 2018-09-27T22:24:33Z

Updated PR and tested with end-to-end imdb_lstm model with sparse_grad set to True
Removed embedding unit test as there is no data binded to the embedding symbol to test with.

Model -

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
from keras import backend as K

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()

print(K.backend())
# MXNet backend does not support dropout in LSTM and cannot automatically infer shape
if K.backend() == 'mxnet':
    # specifying input_length and removed dropout params
    model.add(Embedding(max_features, 128, input_length=maxlen, sparse_grad=True))
    model.add(LSTM(128, unroll=True))
else:
    model.add(Embedding(max_features, 128))
    model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=1,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Result -

Using MXNet backend
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
x_train shape: (25000, 80)
x_test shape: (25000, 80)
Build model...
mxnet
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/1
/anaconda2/envs/mxnet/lib/python3.4/site-packages/mxnet/module/bucketing_module.py:408: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.03125). Is this intended?
  force_init=force_init)
[14:57:19] src/operator/nn/../../common/utils.h:450: Optimizer with lazy_update = True detected. Be aware that lazy update with row_sparse gradient is different from standard update, and may lead to different empirical results. See https://mxnet.incubator.apache.org/api/python/optimization/optimization.html for more details.
25000/25000 [==============================] - 242s 10ms/step - loss: 0.4519 - acc: 0.7784 - val_loss: 0.3670 - val_acc: 0.8384
25000/25000 [==============================] - 60s 2ms/step
Test score: 0.36697145671844483
Test accuracy: 0.83836

roywei

Thanks for the contribution, additional few comments

roywei · 2018-09-28T17:58:53Z

keras/layers/embeddings.py

@@ -140,7 +141,10 @@ def call(self, inputs):
        # K.gather is not working with Embedding layer using MXNet backend
        # Refer to this issue: https://github.com/awslabs/keras-apache-mxnet/issues/63
        if K.backend() == "mxnet":
-            out = K.embedding(inputs, self.embeddings, self.input_dim, self.output_dim)
+            if self.sparse_grad:
+                out = K.embedding(inputs, self.embeddings, self.input_dim, self.output_dim, sparse_grad=True)


why is it alway True? sparse_grad=self. sparse_grad ?

the condition will pass only when self.sparse_grad=True, updated the function call for more clarity

can we simplify to not use if else:
out = K.embedding(inputs, self.embeddings, self.input_dim, self.output_dim, sparse_grad=self.sparse_grad)

roywei · 2018-09-28T18:07:11Z

keras/layers/embeddings.py

@@ -78,14 +78,14 @@ def __init__(self, input_dim, output_dim,
                 embeddings_constraint=None,
                 mask_zero=False,
                 input_length=None,
+                 sparse_grad=False,


Add this in doc string explaining the usage and note it's only for mxnet backend

roywei · 2018-09-28T18:08:08Z

tests/keras/backend/mxnet_sparse_test.py

@@ -160,6 +160,5 @@ def test_sparse_concat_axis_non_zero(self):
        assert k_s_d.shape == k_d.shape
        assert_allclose(k_s_d, k_d, atol=1e-05)

-


Please add sparse test here, add a case where sparse_grad is true: https://github.com/awslabs/keras-apache-mxnet/blob/master/tests/keras/layers/embeddings_test.py

roywei

LGTM! Thanks!

kalyc requested review from sandeep-krishnamurthy and roywei September 5, 2018 00:26

sandeep-krishnamurthy reviewed Sep 5, 2018

View reviewed changes

roywei reviewed Sep 5, 2018

View reviewed changes

kalyc force-pushed the sparse-embedding branch from 32423b9 to ca183dc Compare September 27, 2018 22:20

roywei reviewed Sep 28, 2018

View reviewed changes

kalyc added 6 commits September 28, 2018 13:26

Update tests

0752908

Add check for values in sparse embedding test

e3cb730

Update embedding API support in the layers/embeddings class

f76b9f8

Add support in Embedding layer class for sparse data

c452537

Remove unused import and variable

8742eec

Add layer_test for sparse embedding

f1b5d23

kalyc force-pushed the sparse-embedding branch from 47887bf to f1b5d23 Compare September 28, 2018 20:48

Add file encoding

120acd7

kalyc changed the title ~~Add Sparse embedding operator test~~ Add Sparse embedding support Sep 28, 2018

roywei approved these changes Sep 28, 2018

View reviewed changes

roywei merged commit e7d3849 into awslabs:dev Sep 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sparse embedding support #164

Add Sparse embedding support #164

kalyc commented Sep 5, 2018

sandeep-krishnamurthy Sep 5, 2018

kalyc Sep 5, 2018

sandeep-krishnamurthy Sep 5, 2018

sandeep-krishnamurthy Sep 5, 2018

roywei left a comment

roywei Sep 5, 2018

kalyc Sep 5, 2018

roywei Sep 5, 2018

kalyc commented Sep 7, 2018

sandeep-krishnamurthy commented Sep 19, 2018

kalyc commented Sep 27, 2018

roywei left a comment

roywei Sep 28, 2018

kalyc Sep 28, 2018

roywei Sep 28, 2018

roywei Sep 28, 2018

roywei Sep 28, 2018

roywei left a comment

		@@ -160,6 +160,5 @@ def test_sparse_concat_axis_non_zero(self):
		assert k_s_d.shape == k_d.shape
		assert_allclose(k_s_d, k_d, atol=1e-05)

Add Sparse embedding support #164

Add Sparse embedding support #164

Conversation

kalyc commented Sep 5, 2018

Summary

Related Issues

PR Overview

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roywei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kalyc commented Sep 7, 2018

sandeep-krishnamurthy commented Sep 19, 2018

kalyc commented Sep 27, 2018

roywei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roywei left a comment

Choose a reason for hiding this comment