This repository has been archived by the owner on Jan 15, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 532
GPT2 HybridBlock #1165
Labels
bug
Something isn't working
Comments
If I unset --use-top-k, the gluonnlp result make sense.
In addition,
why does generated_token idx start from 1, this will delete the first output token in the result, in this case, 'pretty' will be deleted, the results became:
|
Please test with ebfc920 and the previous commit to check if that PR introduced a regression in |
As far as I know, the big difference between the previous version and the current version is that it uses fused GELU. Perhaps because of this operator, the results look somewhat different. At the application level, however, the results are not expected to be very different but more fast. |
I also saw a difference happens to RoBERTa model as well, as comparing to fairseq releases. might be related issue: #1183 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Description
It seems hybridized gpt2 in V0.9.0 generate different results with previous versions (not as a hybridblock).
I compared the result between the sequence_sampling.py script in v0.9.0
(https://github.com/dmlc/gluon-nlp/blob/v0.9.0/scripts/text_generation/sequence_sampling.py)
and https://github.com/sxjscience/gluonnlp-gpt2/blob/master/sampling_demo.py from @sxjscience (which I originally used)
Error Message
None
To Reproduce
in https://github.com/dmlc/gluon-nlp/blob/v0.9.0/scripts/text_generation/sequence_sampling.py
I used following parameters:
in https://github.com/sxjscience/gluonnlp-gpt2/blob/master/sampling_demo.py
I used following parameters:
both these two scripts used model, vocab and bpe files downloaded from gluonnlp model lib.
(gpt2_117m_openai_webtext-26416f2e.params, openai_webtext-f917dc78.vocab, openai_webtext_bpe_ranks-396d4d8e.json)
in addition, to reproduce the result, I set random seed by
in both scriptes.
in gluonnlp v0.9.0, I got following results:
in @sxjscience script, I got following results:
I believe the result from @sxjscience script make more sense.
I simply printed out the state result from these two lines respectively:
gluon-nlp/scripts/text_generation/sequence_sampling.py
Line 172 in ef57ca0
https://github.com/sxjscience/gluonnlp-gpt2/blob/5393325d10b449cf7b263336e5dc2a5b647ad346/sampling_demo.py#L134
these are the initial model state after inputing the words ('I think this work' in this case), and there is no random factors during the state generation. However, I get slightly different outputs from these two scripts.
in gluonnlp v0.9.0, I got following results:
in @sxjscience script, I got following results:
It can be noticed that the first two list items in the initial state list are the same between two results, while the last two are diverse.
Any reason to cause this?
Environment
----------pip list----------
certifi 2019.11.28
chardet 3.0.4
Cython 0.29.15
gluonnlp 0.9.0
graphviz 0.8.4
idna 2.8
mxnet 1.6.0b20200128
numpy 1.18.1
packaging 20.1
pip 19.0.3
pyparsing 2.4.6
regex 2020.2.18
requests 2.22.0
setuptools 40.8.0
six 1.14.0
urllib3 1.25.8
----------Python Info----------
Version : 3.7.3
Compiler : GCC 7.3.0
Build : ('default', 'Mar 27 2019 22:11:17')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.0.3
Directory : /extend_disk1/gluonnlp/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /extend_disk1/gluonnlp/lib/python3.7/site-packages/mxnet
Num GPUs : 0
Commit Hash : a15e1b900f3f6e1fbdce62176ab3d8c806a1c2bf
----------System Info----------
Platform : Linux-4.15.0-65-generic-x86_64-with-debian-buster-sid
system : Linux
node : test3
release : 4.15.0-65-generic
version : #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
Stepping: 7
CPU MHz: 3000.000
BogoMIPS: 6000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 30976K
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities
The text was updated successfully, but these errors were encountered: