Summarization #9

2Rings · 2018-06-14T04:54:01Z

Description

(Brief description on what this PR is about)

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

2Rings · 2018-06-14T04:54:51Z

Mainly about data preprocessing.

2Rings · 2018-06-14T14:25:36Z

Add data2idx

cgraywang · 2018-06-16T21:00:27Z

scripts/text_summarization/data/make_datafiles.py

+
+
+
+# NotImplemented


delete the unused code

cgraywang · 2018-06-16T21:01:54Z

scripts/text_summarization/data/make_datafiles.py

+
+    return article, abstract
+
+def write2file(url_file, out_file):


delete the processed datasets in the PR. eventually it will goes to S3.

cgraywang · 2018-06-16T21:03:51Z

scripts/text_summarization/encoder_decoder.py

+class Seq2SeqEncoder(Block):
+    pass
+
+class SUMEncoder(S2Seq2SeqEncoder):


This class is the seq2seq+attention, right?

2Rings · 2018-06-18T14:24:03Z

Already deleted used codes and add loss.py as well as decode.py.

…mmarization

2Rings · 2018-06-25T01:19:44Z

Removed datafile and seq2seq debug

2Rings · 2018-07-12T23:58:49Z

change data transformer

2Rings · 2018-07-26T00:25:46Z

fixed context vector shape and beamsearch

2Rings · 2018-08-04T00:14:12Z

Update seq2seq + attention

cgraywang · 2018-08-04T00:33:50Z

scripts/text_summarization/encoder_decoder.py

+
+
+__all__ = ['Seq2SeqEncoder', 'Seq2Seq2SeqDecoder', 'SUMEncoder', 'SUMDecoder', 'get_summ_encoder_decoder']
+class Seq2SeqEncoder(Block):


Remove Seq2SeqEncoder

cgraywang · 2018-08-04T00:36:54Z

scripts/text_summarization/encoder_decoder.py

+                prefix = None, params = None):
+        super(SUMEncoder, self).__init__(prefix = prefix, params = params)
+        self.hidden_size = hidden_size
+        with self.name_scope():


Use LSTM layer instead: http://mxnet.incubator.apache.org/versions/master/api/python/gluon/rnn.html?highlight=rnn#mxnet.gluon.rnn.LSTM , and use http://mxnet.incubator.apache.org/versions/master/api/python/gluon/gluon.html?highlight=sequential#mxnet.gluon.nn.Sequential

cgraywang · 2018-08-04T00:37:17Z

scripts/text_summarization/encoder_decoder.py

+        """
+        _, length, _ = inputs.shape
+
+        outputs, new_state = self.rnn_cells[0].unroll(


layout = 'TNC'

instead of unroll, use rnn forward, and the output and output_states will be generated accordingly

cgraywang · 2018-08-04T00:39:22Z

scripts/text_summarization/encoder_decoder.py

+
+        return [outputs, new_state]
+
+class Attention(HybridBlock):


Ignore convolution in the encoder for attention for now

cgraywang · 2018-08-04T00:39:42Z

scripts/text_summarization/encoder_decoder.py

+
+
+
+class Seq2SeqDecoder(Block):


Remove Seq2SeqDecoder

cgraywang · 2018-08-04T00:40:31Z

scripts/text_summarization/encoder_decoder.py

+    def forward(self, step_input, states):
+        raise NotImplementedError
+
+class SUMDecoder(Seq2SeqDecoder):


Change the name

Decode function: use attention and encoder output to compute output

cgraywang · 2018-08-04T00:41:25Z

scripts/text_summarization/encoder_decoder.py

+
+        print("vocab_lenght: ", len(self.vocab))
+        self.attention_cell = MLPAttentionCell(units=2*self._hidden_size, normalized=False, prefix= 'attention_')
+        with self.name_scope():


change to LSTM layer similar to encoder

cgraywang · 2018-08-04T00:46:47Z

scripts/text_summarization/encoder_decoder.py

+                )
+            )
+
+        with self.name_scope():


merge the name_scope together

cgraywang · 2018-08-04T00:51:36Z

scripts/text_summarization/loss.py

@@ -0,0 +1,43 @@
+import numpy as np


Remove sequence loss

cgraywang

We need to improve the code

cgraywang · 2018-08-04T00:52:03Z

scripts/text_summarization/run_summarization.py

+
+loss_function = SoftmaxCELoss()
+loss_function.initialize(init = mx.init.Uniform(0.02), ctx = ctx)
+loss_function.hybridize()


remove hybridize

cgraywang · 2018-08-04T00:52:14Z

scripts/text_summarization/run_summarization.py

+loss_function.hybridize()
+# print "#56"
+model.initialize(init=mx.init.Uniform(0.02), ctx=ctx)
+model.hybridize()


remove hybridize

cgraywang · 2018-08-04T00:53:56Z

scripts/text_summarization/run_summarization.py

+                model.save_params(save_path)
+                # raise Exception("Save Model!")
+
+    # ## TODO: evaluation and rouge


add evaluation

cgraywang · 2018-08-04T00:55:25Z

scripts/text_summarization/run_summarization.py

+
+model = SummarizationModel(vocab = my_vocab, encoder = encoder, decoder = decoder, hidden_dim = args.hidden_dim, embed_size = args.embedding_dim, prefix = 'summary_')
+
+loss_function = SoftmaxCELoss()


change to SoftmaxCrossEntropy

http://mxnet.incubator.apache.org/versions/master/api/python/gluon/loss.html?highlight=softmaxcrossentropy#mxnet.gluon.loss.SoftmaxCrossEntropyLoss

remove bucketing, use own dataloader

cgraywang · 2018-08-04T00:59:39Z

scripts/text_summarization/encoder_decoder.py

@@ -0,0 +1,343 @@
+import mxnet as mx


Add test cases for encoder and decoder following https://github.com/dmlc/gluon-nlp/blob/master/tests/unittest/test_convolutional_encoder.py

2Rings · 2018-08-10T17:32:34Z

update

* Loader (#9) * use DatasetLoader * fix lint * fix bug * fix lint * fix bug * fix bug * fix lint * fix argument * skip test * Update test_scripts.py * fix bug * fix a bug * move glob to utils * remove amp monkey patch * remove .DS_Store from repo * fix glue test filename * remove root option in the Glue Task interface * lint fix * fix lint

SineWave1 added 4 commits June 13, 2018 15:38

Add Summarization(Lin)cgraywang#1

1e7a33e

Modified Summarization#2

c196434

Modified make_datafiles.py#3(2rings)

99f1659

Add README.md#4(2rings)

baf6bc3

SineWave1 added 2 commits June 14, 2018 13:04

Delete test2.py#5(2rings)

ed9d250

Add data2idx#5(2rins)

8467165

cgraywang reviewed Jun 16, 2018

View reviewed changes

Add loss.py decode.py#7(2rings)

459ea7b

SineWave1 and others added 13 commits June 24, 2018 16:41

Debug seq2seq

c727e44

Seq2seq Debug#8

3c176af

Seq2seq Debug

d9095a3

Seq2seq cgraywang#8

7c1194b

Seq2seq Debug0624#9

7f0a35b

Delete story1.story

6f17f8d

Delete test2.py

b463da3

Seq2seq Debug

ef40366

Merge branch 'summarization' of github.com:2Rings/gluon-nlp-1 into su…

8089fbd

…mmarization

Seq2seq Debug#10

84ff53f

Seq2seq Debug#11

693dd38

Removed datafiles#12

1cd969d

Removed Data Files#12

e906cb3

SineWave1 added 2 commits July 1, 2018 01:10

Add softmaxCELoss and Debug

f82f807

Train with the whole dataset

c0afed7

add Beamsearch“

c7a575c

Add new attention

70de9a3

cgraywang reviewed Aug 4, 2018

View reviewed changes

scripts/text_summarization/encoder_decoder.py Outdated

class Seq2SeqDecoder(Block):

Copy link

Owner

cgraywang Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Seq2SeqDecoder

cgraywang reviewed Aug 4, 2018

View reviewed changes

scripts/text_summarization/encoder_decoder.py Outdated

)

)

with self.name_scope():

Copy link

Owner

cgraywang Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge the name_scope together

cgraywang reviewed Aug 4, 2018

View reviewed changes

scripts/text_summarization/loss.py

@@ -0,0 +1,43 @@

import numpy as np

Copy link

Owner

cgraywang Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove sequence loss

cgraywang reviewed Aug 4, 2018

View reviewed changes

Clean model

e9c968b

add attention decoder

19a47ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summarization #9

Summarization #9

2Rings commented Jun 14, 2018

2Rings commented Jun 14, 2018

2Rings commented Jun 14, 2018

cgraywang Jun 16, 2018

cgraywang Jun 16, 2018

cgraywang Jun 16, 2018

2Rings commented Jun 18, 2018

2Rings commented Jun 25, 2018

2Rings commented Jul 12, 2018

2Rings commented Jul 26, 2018

2Rings commented Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang left a comment

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

cgraywang Aug 4, 2018

2Rings commented Aug 10, 2018



		__all__ = ['Seq2SeqEncoder', 'Seq2Seq2SeqDecoder', 'SUMEncoder', 'SUMDecoder', 'get_summ_encoder_decoder']
		class Seq2SeqEncoder(Block):


		model = SummarizationModel(vocab = my_vocab, encoder = encoder, decoder = decoder, hidden_dim = args.hidden_dim, embed_size = args.embedding_dim, prefix = 'summary_')

		loss_function = SoftmaxCELoss()

Summarization #9

Are you sure you want to change the base?

Summarization #9

Conversation

2Rings commented Jun 14, 2018

Description

Checklist

Essentials

Changes

Comments

2Rings commented Jun 14, 2018

2Rings commented Jun 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2Rings commented Jun 18, 2018

2Rings commented Jun 25, 2018

2Rings commented Jul 12, 2018

2Rings commented Jul 26, 2018

2Rings commented Aug 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgraywang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2Rings commented Aug 10, 2018