2bit gradient compression #8662

rahul003 · 2017-11-15T08:22:48Z

Description

Implements 2bit gradient compression by quantizing each value in gradient array to 2bits using user specified threshold. Shows about 2x speedup on large models with components like fully connected layers, and LSTM layers.

@eric-haibin-lin @cjolivier01 @anirudh2290 @reminisce

Important files to review

GC : gc-inl.h, gc.cc
KVStore local: comm.h
KVStore dist : kvstore_dist.h, kvstore_dist_server.h
Documentation about gradient compression: kvstore.py

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
For user-facing API changes, API doc string has been updated.
To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Gradient compression class
Reduce operation in kvstore_local / comm.h
Distributed kvstore changes at worker and server
Tests for local kvstore, distributed kvstore with predefined and random data. The results have been compared with expected values by implementing this logic in python.
API changes for Kvstore, Module and Trainer in python
Addressed comments from last PR

Comments

Problem

When training large scale deep learning models especially with distributed training, communication becomes a bottleneck for networks whose computation is not high compared to the communication.

Approach

We can compress the gradients by considering only those elements that exceed a threshold. Only these elements are encoded and sent. The elements of the gradient that are near zero can safely be delayed by aggregating them in a residual array. When the updated residual with gradient of next iterations exceed the threshold, these values are sent. Effectively these values are updated at a lower frequency.
On the receiver's end we decompress the data and use the decompressed weights.
Specifically in this PR, 2bit quantization has been implemented.

Two bit quantization

Any positive value greater than or equal to the threshold is set to one value (say 11), any negative value whose absolute value is greater or equal to the threshold is set to second value (say 10), and others are set to third value (say 0). We need three values to represent data in this fashion and hence two bits. We understand this leads to one bit going waste, but that's an optimization to be done later. The error in quantization is accumulated as residual and carried over to the next iterations. This is added in the next iteration to the gradient before quantizing.
An example below with thresholds of -2.0 and 2.0.
This format leads to the reduction of gradient size by 1/16th.
Quantization at work

Format of compressed gradient

Eac element, represents upto 16 elements in the original array. For the example above, we get an element whose binary representation is
00 11 00 10 11 00 00 10 0000000000000000

Local kvstore

When using local kvstore, gradients compression only happens when using device communication. When gradients are pushed, before summing them up (Reduce), quantization and dequantization happen.
Example: Say we have 4 GPUs, and the gradients are being summed up on GPU0. Each device quantizes gradients, then sends quantized gradient to GPU0, which performs dequantization of this data before merging it with values from other GPUs. Note that here, there is no need to quantize gradients from GPU0 itself, but it is still being done so that there is no bias for the samples which were processed by GPU0.

Dist kvstore

When the set_gradient_compression method for kvstore is called, each worker sets those compress params and one worker sends these params to all servers. From then on, when before each value is pushed to the server, it is quantized. The server dequantizes the data and stores it as an array of the original size. When values are pulled from the server, it returns an array of the original size. The same happens when each server is handling shards of the data.

Usage

The reason I used a dictionary compress_params for the arguments was to ensure uniformity when we extend this to other quantization techniques. This is because each technique might take different type and number of parameters.

KVstore

kv = mx.kv.create('dist_sync')
kv.set_gradient_compression({'type':'2bit', 'threshold':0.5})

Module

mod = mx.mod.Module(net, compression_params={'type':'2bit', 'threshold':0.5})

Gluon Trainer

trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .1}, 
                        compression_params={'type':'2bit', 'threshold':0.5})

Results

Summary
Shows about 2x speedup when models are large, have fully connected components, for distributed training. On local training, speedup is about 1.2x when there is no P2P communication.

For MLP with 4 fully connected layers of 1500 size and one fully connected layer of 3000 size

input dim	model size	speedup	batch size(on each gpu)
300	50MB	1.7x	256
300	50MB	1.4x	1024
150000		2x	64
270000	900MB	2x	128

For smaller models, the overhead of launching OMP threads is costing a bit, to get around it (if training using GPUs), setting OMP_NUM_THREADS=1 results in gradient compression is needed.

Shows speedup when communication is expensive. The above speedup was seen on g2.8x machines which have lower network bandwidth than p2.16x machines. p2.16x didn't see as much speedup.
Network types
On models for imagenet input (input dim: 3,299,299), on g2.8x large, 15 node cluster, used all 4 gpus on each node

network type	speedup
LSTM, BiLSTM	about 1.25-1.5x
VGG11	1.8x
MLP, Alexnet	2x

Accuracy
LSTM on PennTreeBank with 200dim 2 layers

MNIST on MLP

CIFAR with resnet

Accuracy starts off slow, but the network converges to similar accuracy.
Accuracies at a few epochs
epoch 101 :
2bit: 0.80645, none: 0.83572, difference: 0.029
epoch153
2bit: 0.841, none: 0.851, difference: 0.0108

CIFAR resnet with pretraining

Pre-training without gradient compression for some time(2 epochs), leads to better convergence
We see that in this case, we start off much closer and reaches similar accuracies earlier. In general, the graphs are much closer. Let's look at epoch 33, Earlier, without pretraining, 2bit compression had an accuracy degradation of 0.154 when compared to the case without gradient compression. Now, when both models start with a pretrained network which didn't use gradient compression, it has a degradation of only 0.04.

Reference (although compressed representation is different http://nikkostrom.com/publications/interspeech2015/strom_interspeech2015.pdf )

piiswrong · 2017-11-16T18:29:12Z

python/mxnet/kvstore.py

@@ -349,6 +349,77 @@ def row_sparse_pull(self, key, out=None, priority=0, row_ids=None):
            check_call(_LIB.MXKVStorePullRowSparse(
                self.handle, mx_uint(len(ckeys)), ckeys, cvals, crow_ids, ctypes.c_int(priority)))

+    def set_gradient_compression(self, compression_params=(('compression', '2bit'),)):


I don't think there should be a default value at all.

rename key compression to type

piiswrong · 2017-11-16T18:30:50Z

src/kvstore/comm.h

 protected:
  Context pinned_ctx_;
+
+  std::shared_ptr<GradientCompression> gc_;
+  bool gc_set_ = false;


Not necessary. gc_ defaults to nullptr

piiswrong · 2017-11-16T18:33:05Z

src/kvstore/gradient_compression.h

+namespace mxnet {
+namespace kvstore {
+
+enum CompressionType {


Use scoped enum.
enum class CompressionType{
kNone,
kTwoBit
};

piiswrong · 2017-11-16T18:34:31Z

src/kvstore/kvstore_dist_server.h

@@ -41,8 +41,10 @@ namespace kvstore {

 static const int kRowSparsePushPull = 1;


Use enum for this

piiswrong · 2017-11-16T18:36:11Z

python/mxnet/kvstore.py

+            elif compression_params['compression'] not in ['none', '2bit']:
+                raise ValueError('Unsupported type of compression')
+
+            if compression_params['compression'] == '2bit':


These parsing should be done in backend with dmlc::Parameter

The frontend should pass strings of key value pairs.

piiswrong · 2017-11-16T18:37:24Z

include/mxnet/c_api.h

+ */
+MXNET_DLL int MXKVStoreSetGradientCompression(KVStoreHandle handle,
+                                   const char *compression,
+                                   const float threshold);


API should be
MXKVStoreSetGradientCompression(KVStoreHandle handle, mx_uint num_params, const char **keys, const char **vals)
The values should be parsed in backend with dmlc::Parameter

Signed-off-by: Rahul <[email protected]>

rahul003 · 2017-11-16T22:14:51Z

@piiswrong Updated to use scoped enums, and DMLC param

Wanted to add that tests are all in nightly because this affects either distributed kvstore dist_* or device both of which can't be tested in unittests

Signed-off-by: Rahul <[email protected]>

eric-haibin-lin · 2017-11-16T22:31:05Z

include/mxnet/kvstore.h

+   * Used if SetGradientCompression sets the type.
+   * Currently there is no support for un-setting gradient compression
+   */
+  std::shared_ptr<kvstore::GradientCompression> gradient_compression_;


no support for un-setting gradient compression ? What happens if an user tries to unset it?

if user uses kvstore.set_gradient_compression({'type':'none'} after setting it to 2bit, it throws an error because none can't be a type.
If users sets 2bit again with different threshold, then new threshold will be used from then on, but there might be a period in transition when gradients quantized with old threshold will be dequantized with new threshold, because of delay in sychronization.

Signed-off-by: Rahul <[email protected]>

frontend was sending command with id=stopServer in old enum Signed-off-by: Rahul <[email protected]>

rahul003 · 2017-11-18T02:38:45Z

Does this look ready to be merged now?
The build had passed but status wasn't communicated to GitHub because of some Jenkins issues I guess. It will run again soon.

I've updated the results section with more details.

I'll hopefully be updating gradient compression with more optimizations before the v1.0 release. But it would be better if we merge this, so next PRs aren't this large. There's no known bug right now.

This reverts commit a499f89.

* update two bit compression * Update trainer.py * Update test_operator.py * update two bit compression * update two bit compression * update two bit compression * update * update * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * Update comm.h * add original size in comrpessed array * update comm.h * update distributed training * update distributed training * Update ndarray_function.cu * Update kvstore_dist.h * Update kvstore_dist.h * update * update * update * fix bug * fix * add GC test * fix bug in push * fix push and pull * fix * fix * uncompiled * kvstore dist changes. added cpp_package. changed strtof function calls * fix usage of keys in dict * fix push and pull * fix * fix_test * fix_test * fix_test * add print statements * more print statements and move send command to server * set compress handling * kvstore dist changes * working kvstore push and pull. not sure if I commited that. from this commit removing mutable variable changes for residual array gives working push and pull * cleanup test * debug prints * working kvstore dist. includes mutation of inputs and setting threshold array dtype properly * fix operator * kvstore dist changes * fix compress kvstore issues. non compress is broken * fix sparse push issue * fix read lock issue * optimizer is the only issue now? * fix all issues with gc dist * fix read lock issue * pushing sharded data works * works most times. sometimes val instead of 0 has parts of 1 or 1.5... * fix read lock issue * prev commit fixed seg fault issue on pull without push in a server * add waittowrite to fix pull before push problems * refactor quantizing for sharded data * redo break up of data across servers,clearer split * refactor to use param for thresholds. also cleans up code * Added many checks for 0 * cmake changes * formatting issues for easier merge * fix rate * fix compilation errors after merge * fix compile error and ndarray thresholds in dequantize * fix compile error and ndarray thresholds in dequantize * fix compile error * fix compile error, and add comments * update operator comments * comment checks * comment checks * compile error * working on local kvstore compress test * fix module api compressparams, and change quantize tblob to inside engine * 2bit arg wrong kvstore * remove log * fix gpu dequantize and tests * fix seg fault in quantize and test indent * tests print more info order of params corrected * assert almost equal * more debug stuff correct profiler message * intermediate test rewrite * small change in pushing op to engineh * fix concurrency of quantization * wait on kernel * updated tests and removed prints * comment unnecessary stuff * fix test * remove print * Update dist_sync_kvstore.py fix random dist sync test * remove slow kernel launch init * cleanup * undo changes in submodule * submodule reset * remove files * undo changes unrelated to project * undo changes unrelated to project * Comments and cleanup. Remaining are src/kvstore, src/operator and tests * more cleanup and comments * comments for tests * lint changes and comments * speed up operator test by reducing asnumpy() calls * random data for test_kvstore_local * fix variable confusion error in test * fix randomized data test for local kvstore * add nrepeat for test_kvstore * change keys after merge from master introduced same keys * correct test which fails because grad changes * change to bit ops * change to bit ops * use bit array and revert sign changes * correct bits setting to 10 as 2 * remove switch in dequantize * image classification example changes and remove cpp-api * merge all quantize, and new type in dist server * fix ndarray dequantize * debug stuff * fix bug * trying merge dequntize * Frmework and validation tests for operator validation and performance-testing in C++ Normally used for gtest tests. * Remove obsolete file * Fix compile error for non-CUDA build * tweaks in quantize * Allow for no backward pass * Remove unused var * making quantize all compatible as operators * separate mshadow and loop operators * working profiler, dequantize mshadow is slow * fix mshadow dequantize * fix quantize call by kvdist * making quantize all compatible as operators * add profile to measure.py * minor profiler changes * timing print in cpp operator * time quantize * saving data feature added * cleanup test * small updates * cleanup * minor fix * passing additional environment variables through launch.py * update local test * update dmlc with pass-env * fix launch pass env issue * update with pass-env changes * fix operator increment of block, remove unncessary commented code * fix operator increment of block, remove unncessary commented code * fix operator increment of block, remove unncessary commented code * fix operator increment of block, remove unncessary commented code * bring back quantize Signed-off-by: Rahul <[email protected]> * fix test * fix bug with increment of char pointer * fix bug with increment of char pointer * debug module * update test * comment all debug statements * change init to normal for now * remove debug changes * reorg to create gc class, add delayed start to gc, untested: causing segfault * redo header files * remove ps * remove unused header * fix compile issues * remove multiple delete of gc * add expected to local kvstore test * fix operator compile issues * fix operator compile issues * fix operator compile and link issues * remove gc.cpp * add split function * move setting of active gc * move all to gc.cpp, compile works for cpu * WIP gpu compile * compiles and links on both cpu and gpu * move prototypes to header * add split function * undo changes from master * remove cpp perf quantize * undo more changes * add inactive function so that multiple kvstore dist inits have no compression fix tests * undo some formatting changes * make sharding same when inactive and active * remove counts and get_active_type * remove print * add train caltech * increase size of mlp * update to alexa mlp * pass-env changes * add bucketing module compression * attempts for alexnet training * prepare for merge * fix lint issues * fix lint issues * remove caltech * address some comments: shared_ptr, documentation, indentaion, new functions, check_eq * move header * include header corrected * include header corrected * indents, documentation and test update * lint * pylint * rename class, fix local kvstore test, remove confusing active method * fix importing of compute expected in test_kvstore * fix bug in device kvstore * remove active comment in pull * docstring * use dmlc params, enums, Signed-off-by: Rahul <[email protected]> * doc updates Signed-off-by: Rahul <[email protected]> * lint Signed-off-by: Rahul <[email protected]> * typo Signed-off-by: Rahul <[email protected]> * rename field to type Signed-off-by: Rahul <[email protected]> * fix distributed kvstore stopping issue. frontend was sending command with id=stopServer in old enum Signed-off-by: Rahul <[email protected]> * Trigger CI * trigger CI

This reverts commit a499f89.

* update two bit compression * Update trainer.py * Update test_operator.py * update two bit compression * update two bit compression * update two bit compression * update * update * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * update two bit compression * Update comm.h * add original size in comrpessed array * update comm.h * update distributed training * update distributed training * Update ndarray_function.cu * Update kvstore_dist.h * Update kvstore_dist.h * update * update * update * fix bug * fix * add GC test * fix bug in push * fix push and pull * fix * fix * uncompiled * kvstore dist changes. added cpp_package. changed strtof function calls * fix usage of keys in dict * fix push and pull * fix * fix_test * fix_test * fix_test * add print statements * more print statements and move send command to server * set compress handling * kvstore dist changes * working kvstore push and pull. not sure if I commited that. from this commit removing mutable variable changes for residual array gives working push and pull * cleanup test * debug prints * working kvstore dist. includes mutation of inputs and setting threshold array dtype properly * fix operator * kvstore dist changes * fix compress kvstore issues. non compress is broken * fix sparse push issue * fix read lock issue * optimizer is the only issue now? * fix all issues with gc dist * fix read lock issue * pushing sharded data works * works most times. sometimes val instead of 0 has parts of 1 or 1.5... * fix read lock issue * prev commit fixed seg fault issue on pull without push in a server * add waittowrite to fix pull before push problems * refactor quantizing for sharded data * redo break up of data across servers,clearer split * refactor to use param for thresholds. also cleans up code * Added many checks for 0 * cmake changes * formatting issues for easier merge * fix rate * fix compilation errors after merge * fix compile error and ndarray thresholds in dequantize * fix compile error and ndarray thresholds in dequantize * fix compile error * fix compile error, and add comments * update operator comments * comment checks * comment checks * compile error * working on local kvstore compress test * fix module api compressparams, and change quantize tblob to inside engine * 2bit arg wrong kvstore * remove log * fix gpu dequantize and tests * fix seg fault in quantize and test indent * tests print more info order of params corrected * assert almost equal * more debug stuff correct profiler message * intermediate test rewrite * small change in pushing op to engineh * fix concurrency of quantization * wait on kernel * updated tests and removed prints * comment unnecessary stuff * fix test * remove print * Update dist_sync_kvstore.py fix random dist sync test * remove slow kernel launch init * cleanup * undo changes in submodule * submodule reset * remove files * undo changes unrelated to project * undo changes unrelated to project * Comments and cleanup. Remaining are src/kvstore, src/operator and tests * more cleanup and comments * comments for tests * lint changes and comments * speed up operator test by reducing asnumpy() calls * random data for test_kvstore_local * fix variable confusion error in test * fix randomized data test for local kvstore * add nrepeat for test_kvstore * change keys after merge from master introduced same keys * correct test which fails because grad changes * change to bit ops * change to bit ops * use bit array and revert sign changes * correct bits setting to 10 as 2 * remove switch in dequantize * image classification example changes and remove cpp-api * merge all quantize, and new type in dist server * fix ndarray dequantize * debug stuff * fix bug * trying merge dequntize * Frmework and validation tests for operator validation and performance-testing in C++ Normally used for gtest tests. * Remove obsolete file * Fix compile error for non-CUDA build * tweaks in quantize * Allow for no backward pass * Remove unused var * making quantize all compatible as operators * separate mshadow and loop operators * working profiler, dequantize mshadow is slow * fix mshadow dequantize * fix quantize call by kvdist * making quantize all compatible as operators * add profile to measure.py * minor profiler changes * timing print in cpp operator * time quantize * saving data feature added * cleanup test * small updates * cleanup * minor fix * passing additional environment variables through launch.py * update local test * update dmlc with pass-env * fix launch pass env issue * update with pass-env changes * fix operator increment of block, remove unncessary commented code * fix operator increment of block, remove unncessary commented code * fix operator increment of block, remove unncessary commented code * fix operator increment of block, remove unncessary commented code * bring back quantize Signed-off-by: Rahul <[email protected]> * fix test * fix bug with increment of char pointer * fix bug with increment of char pointer * debug module * update test * comment all debug statements * change init to normal for now * remove debug changes * reorg to create gc class, add delayed start to gc, untested: causing segfault * redo header files * remove ps * remove unused header * fix compile issues * remove multiple delete of gc * add expected to local kvstore test * fix operator compile issues * fix operator compile issues * fix operator compile and link issues * remove gc.cpp * add split function * move setting of active gc * move all to gc.cpp, compile works for cpu * WIP gpu compile * compiles and links on both cpu and gpu * move prototypes to header * add split function * undo changes from master * remove cpp perf quantize * undo more changes * add inactive function so that multiple kvstore dist inits have no compression fix tests * undo some formatting changes * make sharding same when inactive and active * remove counts and get_active_type * remove print * add train caltech * increase size of mlp * update to alexa mlp * pass-env changes * add bucketing module compression * attempts for alexnet training * prepare for merge * fix lint issues * fix lint issues * remove caltech * address some comments: shared_ptr, documentation, indentaion, new functions, check_eq * move header * include header corrected * include header corrected * indents, documentation and test update * lint * pylint * rename class, fix local kvstore test, remove confusing active method * fix importing of compute expected in test_kvstore * fix bug in device kvstore * remove active comment in pull * docstring * use dmlc params, enums, Signed-off-by: Rahul <[email protected]> * doc updates Signed-off-by: Rahul <[email protected]> * lint Signed-off-by: Rahul <[email protected]> * typo Signed-off-by: Rahul <[email protected]> * rename field to type Signed-off-by: Rahul <[email protected]> * fix distributed kvstore stopping issue. frontend was sending command with id=stopServer in old enum Signed-off-by: Rahul <[email protected]> * Trigger CI * trigger CI

This reverts commit a499f89.

aksnzhy added 30 commits August 29, 2017 17:42

update two bit compression

407c01a

Update trainer.py

8cbb7f6

Update test_operator.py

0dd1874

update two bit compression

bbd21e4

update two bit compression

72640e9

update two bit compression

aaafa84

update

2d85430

update

5a99e6a

update two bit compression

861fca5

update two bit compression

54c6f06

update two bit compression

03e47a4

update two bit compression

fedd4b4

update two bit compression

13ff1bc

update two bit compression

b75d7ca

update two bit compression

b84b762

update two bit compression

260b606

update two bit compression

7d78e3a

Merge branch 'master' into master

1b550eb

update two bit compression

2a90dae

update two bit compression

baba1d8

update two bit compression

aac5292

Update comm.h

b63673a

Merge branch 'master' into master

ce0f3b2

add original size in comrpessed array

3d3ac92

update comm.h

f797271

update distributed training

5807469

update distributed training

7dbce8b

Merge branch 'master' into master

d1fdfc4

Update ndarray_function.cu

112b683

Update kvstore_dist.h

fe10b7a

piiswrong reviewed Nov 16, 2017

View reviewed changes

use dmlc params, enums,

f41e102

Signed-off-by: Rahul <[email protected]>

doc updates

5acbc9a

Signed-off-by: Rahul <[email protected]>

eric-haibin-lin reviewed Nov 16, 2017

View reviewed changes

rahul003 and others added 7 commits November 16, 2017 14:45

lint

3c1bacb

Signed-off-by: Rahul <[email protected]>

update from master

18d6a90

typo

dfe7a7d

Signed-off-by: Rahul <[email protected]>

rename field to type

4b6f34a

Signed-off-by: Rahul <[email protected]>

fix distributed kvstore stopping issue.

30a197b

frontend was sending command with id=stopServer in old enum Signed-off-by: Rahul <[email protected]>

Trigger CI

3073bf7

trigger CI

d5e4b2e

cjolivier01 merged commit a499f89 into apache:master Nov 19, 2017

szha added a commit that referenced this pull request Nov 19, 2017

Revert "2bit gradient compression (#8662)"

eff6bb6

This reverts commit a499f89.

szha mentioned this pull request Nov 19, 2017

Revert "2bit gradient compression" #8711

Merged

szha added a commit that referenced this pull request Nov 19, 2017

Revert "2bit gradient compression (#8662)"

2e58c0e

This reverts commit a499f89.

szha added a commit that referenced this pull request Nov 19, 2017

Revert "2bit gradient compression (#8662)" (#8711)

504f42f

This reverts commit a499f89.

rahul003 mentioned this pull request Nov 20, 2017

2bit gradient compression #8728

Merged

rahul003 deleted the gc-pr branch November 20, 2017 22:01

eric-haibin-lin pushed a commit to eric-haibin-lin/mxnet that referenced this pull request Dec 3, 2017

Revert "2bit gradient compression (apache#8662)" (apache#8711)

116b059

This reverts commit a499f89.

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

Revert "2bit gradient compression (apache#8662)" (apache#8711)

9b91b9a

This reverts commit a499f89.

shuo-ouyang mentioned this pull request Apr 1, 2020

[v1.x][KVStore]1Bit gradient compression #17952

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2bit gradient compression #8662

2bit gradient compression #8662

rahul003 commented Nov 15, 2017 •

edited

Loading

piiswrong Nov 16, 2017

piiswrong Nov 16, 2017

piiswrong Nov 16, 2017

piiswrong Nov 16, 2017

piiswrong Nov 16, 2017

piiswrong Nov 16, 2017

piiswrong Nov 16, 2017

rahul003 commented Nov 16, 2017 •

edited

Loading

eric-haibin-lin Nov 16, 2017

rahul003 Nov 16, 2017 •

edited

Loading

rahul003 commented Nov 18, 2017 •

edited

Loading

		@@ -41,8 +41,10 @@ namespace kvstore {

		static const int kRowSparsePushPull = 1;

2bit gradient compression #8662

2bit gradient compression #8662

Conversation

rahul003 commented Nov 15, 2017 • edited Loading

Description

Important files to review

Checklist

Essentials

Changes

Comments

Problem

Approach

Two bit quantization

Format of compressed gradient

Local kvstore

Dist kvstore

Usage

KVstore

Module

Gluon Trainer

Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul003 commented Nov 16, 2017 • edited Loading

Choose a reason for hiding this comment

rahul003 Nov 16, 2017 • edited Loading

Choose a reason for hiding this comment

rahul003 commented Nov 18, 2017 • edited Loading

rahul003 commented Nov 15, 2017 •

edited

Loading

rahul003 commented Nov 16, 2017 •

edited

Loading

rahul003 Nov 16, 2017 •

edited

Loading

rahul003 commented Nov 18, 2017 •

edited

Loading