Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Add Intro for batch + upload squad traininng command #1305

Merged
merged 5 commits into from
Aug 22, 2020

Conversation

zheyuye
Copy link
Member

@zheyuye zheyuye commented Aug 20, 2020

Description

  • Upload all commands for the current available pre-trained models to ./scripts/question_answering/commands/ as well as their corresponding log and results file to S3.
  • A detailed description of submitting a batch Job
  • Upload the results of uncased_bert_large
  • Vertify the results of google_uncased_mobilebert (might need an another issue)

Comments

Please dont merge this pr before #1302

cc @dmlc/gluon-nlp-team

@codecov
Copy link

codecov bot commented Aug 20, 2020

Codecov Report

Merging #1305 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1305   +/-   ##
=======================================
  Coverage   84.45%   84.45%           
=======================================
  Files          42       42           
  Lines        6422     6422           
=======================================
  Hits         5424     5424           
  Misses        998      998           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8b68c6...925eb73. Read the comment docs.

@zheyuye zheyuye requested a review from a team as a code owner August 21, 2020 03:26
commit 7525618
Author: ZheyuYe <[email protected]>
Date:   Fri Aug 21 11:25:38 2020 +0800

    Squashed commit of the following:

    commit d8b68c6
    Author: Xingjian Shi <[email protected]>
    Date:   Thu Aug 20 08:47:56 2020 -0700

        [Numpy] Fix AWS Batch + Add Docker Support (dmlc#1302)

        * Update submit-job.py

        Add LICESE + Examples for batch

        Update docker image

        update

        Update README.md

        Update README.md

        Update ubuntu18.04-devel.Dockerfile

        Update ubuntu18.04-devel.Dockerfile

        Update ubuntu18.04-devel.Dockerfile

        update

        Update ubuntu18.04-devel-gpu.Dockerfile

        fix

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update submit-job.py

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        update

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        update

        update

        Update submit-job.py

        Update submit-job.py

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        try to fix

        fix batch

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update submit-job.py

        Update ubuntu18.04-devel-gpu.Dockerfile

        simplify bert test

        add files

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        fix

        Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * try to add back mxnet support

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * update

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * fix issues

        * update

    commit 6ae558e
    Author: ht <[email protected]>
    Date:   Thu Aug 20 23:47:30 2020 +0800

        [FEATURE]Horovod support for training transformer (PART 2) (dmlc#1301)

        * set default shuffle=True for boundedbudgetsampler

        * fix

        * fix log condition

        * use horovod to train transformer

        * fix

        * add mirror wmt dataset

        * fix

        * rename wmt.txt to wmt.json and remove part of urls

        * fix

        * tuning params

        * use get_repo_url()

        * update average checkpoint cli

        * paste result of transformer large

        * fix

        * fix logging in train_transformer

        * fix

        * fix

        * fix

        * add transformer base config

        * fix

        * change to wmt14/full

        * print more sacrebleu info

        * fix

        * add test for num_parts and update behavior of boundedbudgetsampler with even_size

        * fix

        * fix

        * fix

        * fix logging when using horovd

        * udpate doc of train transformer

        * add test case for fail downloading

        * add a ShardedIterator

        * fix

        * fix

        * fix

        * change mpirun to horovodrun

        * make the horovod command complete

        * use print(sampler) to cover the codes of __repr__ func

        * empty commit

        * add test case test_sharded_iterator_even_size

        Co-authored-by: Hu <[email protected]>

commit 1403c6e
Author: ZheyuYe <[email protected]>
Date:   Fri Aug 21 11:15:44 2020 +0800

    update uncased_bert_large

commit 733a4b6
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 20:16:39 2020 +0800

    adjust uncased_bert_large

commit 770f079
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 15:10:57 2020 +0800

    Revert "merge xingjian's"

    This reverts commit ea1f1aa.

commit fe74dda
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 14:07:36 2020 +0800

    update electra small

commit 8972343
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 14:00:57 2020 +0800

    add command to readme

commit 8fcde49
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 12:30:47 2020 +0800

    revise

commit 7a625c4
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 12:21:58 2020 +0800

    update reamde

commit 071c6dd
Author: ZheyuYe <[email protected]>
Date:   Wed Aug 19 17:14:53 2020 +0800

    update bert squad command

commit ea1f1aa
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 18:07:01 2020 +0800

    merge xingjian's

commit 859ab4d
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 17:47:01 2020 +0800

    dummy example

commit 633e683
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 17:36:31 2020 +0800

    list_backbone_names

commit b4aac59
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 17:32:51 2020 +0800

    update readme

commit 54301d9
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 13:59:06 2020 +0800

    revise batch squad

commit e019e27
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 13:58:49 2020 +0800

    bash convert

commit e01eda0
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 11:10:51 2020 +0800

    update roberta

commit 1730ff7
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 10:15:27 2020 +0800

    revise submit

commit de0b4c9
Author: ZheyuYe <[email protected]>
Date:   Mon Aug 17 16:07:58 2020 +0800

    upload batch files

commit 175de01
Author: ZheyuYe <[email protected]>
Date:   Mon Aug 17 16:05:02 2020 +0800

    fix

commit 0460ed3
Author: ZheyuYe <[email protected]>
Date:   Mon Aug 17 15:48:52 2020 +0800

    upload commands
@szha szha requested a review from sxjscience August 21, 2020 04:10
@sxjscience sxjscience merged commit 99b35d8 into dmlc:master Aug 22, 2020
@zheyuye zheyuye deleted the batch branch September 4, 2020 07:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants