Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ocaml5-issue] Out_channel Lin test takes very long on macOS #321

Closed
jmid opened this issue Mar 28, 2023 · 6 comments
Closed

[ocaml5-issue] Out_channel Lin test takes very long on macOS #321

jmid opened this issue Mar 28, 2023 · 6 comments
Labels
ocaml5-issue A potential issue in the OCaml5 compiler/runtime

Comments

@jmid
Copy link
Collaborator

jmid commented Mar 28, 2023

As part of #318 I spotted an Out_channel test taking very long on macOS with trunk to the point of causing a timeout:
https://github.com/ocaml-multicore/multicoretests/actions/runs/4540532958/jobs/8001562877?pr=318

random seed: 419543217
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain
[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain (generating)
[✓]   54    0    1   53 / 1000     6.2s Lin DSL In_channel test with Domain

[ ]    0    0    0    0 / 1000     0.0s Lin DSL Out_channel test with Domain
[ ]  215    0    0  215 / 1000    92.8s Lin DSL Out_channel test with Domain
[ ]  302    0    0  302 / 1000   154.7s Lin DSL Out_channel test with Domain
[ ]  384    0    0  384 / 1000   215.6s Lin DSL Out_channel test with Domain
[ ]  407    0    0  407 / 1000   349.9s Lin DSL Out_channel test with Domain
[ ]  417    0    0  417 / 1000   741.2s Lin DSL Out_channel test with Domain
[ ]  487    0    0  487 / 1000  3128.8s Lin DSL Out_channel test with Domain
[ ]  567    0    0  567 / 1000  9321.4s Lin DSL Out_channel test with Domain
[ ]  585    0    0  585 / 1000  9552.4s Lin DSL Out_channel test with Domain
Error: The operation was canceled.

That's 2h40m - to test 585 random triples (modulo some missing output flushing) without any shrinking going on.
Of those, 80 inputs took ~100m (~6000s) from input 487 to 567. This looks like a runtime issue, doesn't it?

@jmid
Copy link
Collaborator Author

jmid commented Mar 28, 2023

I had previously observed a problem shrinking Out_channel tests.
That this happens without any shrinking, indicates a runtime issue instead 🤔

@jmid
Copy link
Collaborator Author

jmid commented Mar 28, 2023

Also spotted on macOS 5.0.0
https://github.com/ocaml-multicore/multicoretests/actions/runs/4542931038/jobs/8006976703?pr=316

random seed: 423960020
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain
[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain (generating)
[✓]   15    0    1   14 / 1000    13.8s Lin DSL In_channel test with Domain

[ ]    0    0    0    0 / 1000     0.0s Lin DSL Out_channel test with Domain
[ ]   61    0    0   61 / 1000   227.6s Lin DSL Out_channel test with Domain
[ ]   90    0    0   90 / 1000  9110.8s Lin DSL Out_channel test with Domain
[ ]  141    0    0  141 / 1000  9190.7s Lin DSL Out_channel test with Domain
[ ]  203    0    0  203 / 1000  9427.5s Lin DSL Out_channel test with Domain
[✓]  226    0    1  225 / 1000  9440.3s Lin DSL Out_channel test with Domain

This has a jump of nearly 9000s on testing 29 inputs (from 61 to 90)

@jmid jmid added the ocaml5-issue A potential issue in the OCaml5 compiler/runtime label Mar 29, 2023
@jmid
Copy link
Collaborator Author

jmid commented Apr 13, 2023

I tried giving ocaml/ocaml#11903 a spin to see whether it would solve this issue.
Apparently not: https://github.com/ocaml-multicore/multicoretests/actions/runs/4688674935/jobs/8309467768

random seed: 478012744
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain
[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain (generating)
[✓]  131    0    1  130 / 1000    17.8s Lin DSL In_channel test with Domain

[ ]    0    0    0    0 / 1000     0.0s Lin DSL Out_channel test with Domain
[ ]   23    0    0   23 / 1000   466.5s Lin DSL Out_channel test with Domain
[ ]  135    0    0  135 / 1000   586.1s Lin DSL Out_channel test with Domain
[ ]  199    0    0  199 / 1000   646.2s Lin DSL Out_channel test with Domain
[ ]  348    0    0  348 / 1000   715.3s Lin DSL Out_channel test with Domain
[ ]  349    0    0  349 / 1000  2769.2s Lin DSL Out_channel test with Domain
[ ]  491    0    0  491 / 1000  2830.5s Lin DSL Out_channel test with Domain
[ ]  655    0    0  655 / 1000  2890.5s Lin DSL Out_channel test with Domain
[ ]  675    0    0  675 / 1000  3105.7s Lin DSL Out_channel test with Domain
[ ]  695    0    0  695 / 1000  3240.3s Lin DSL Out_channel test with Domain
[ ]  823    0    0  823 / 1000  3302.3s Lin DSL Out_channel test with Domain
[ ]  862    0    0  862 / 1000  3532.9s Lin DSL Out_channel test with Domain
[ ]  980    0    0  980 / 1000  3592.9s Lin DSL Out_channel test with Domain
[✗] 1000    0    0 1000 / 1000  3601.6s Lin DSL Out_channel test with Domain

Notice the jump between input 348 and 349, taking 2000 seconds...

@shym
Copy link
Collaborator

shym commented Apr 20, 2023

There is definitely something wrong in there. Trying to look into a specific case of this Out_channel test entering seemingly a livelock in the debug runtime, I finally ran into:

$ _build/default/src/io/lin_tests_dsl_domain.exe -v -s 160687358
### OCaml runtime: debug mode ###
random seed: 160687358
generated error fail pass / total     time test name
[✓]    3    0    1    2 / 1000    16.2s Lin DSL In_channel test with Domain
[ ]    8    0    0    8 / 1000    18.7s Lin DSL Out_channel test with Domain (shrinking:   26.0008)
[02] file runtime/domain.c; line 310 ### Assertion failed: s->running

(on Linux, OCaml 5.0.0; the same seed seems to get stuck (one way or another) regularly also on trunk)

@jmid
Copy link
Collaborator Author

jmid commented Jun 5, 2023

Just observed a similar behaviour on Windows bytecode trunk:
https://github.com/ocaml-multicore/multicoretests/actions/runs/5174368273/jobs/9320602093

random seed: 324902162
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain
[ ]    0    0    0    0 / 1000     0.0s Lin DSL In_channel test with Domain (generating)
[✓]    6    0    1    5 / 1000    22.6s Lin DSL In_channel test with Domain

[ ]    0    0    0    0 / 1000     0.0s Lin DSL Out_channel test with Domain
[ ]   16    0    0   16 / 1000    43.6s Lin DSL Out_channel test with Domain
[ ]   47    0    0   47 / 1000   107.5s Lin DSL Out_channel test with Domain
[ ]   65    0    0   65 / 1000   169.3s Lin DSL Out_channel test with Domain
[ ]  104    0    0  104 / 1000   231.8s Lin DSL Out_channel test with Domain
[ ]  129    0    0  129 / 1000   291.9s Lin DSL Out_channel test with Domain
[ ]  166    0    0  166 / 1000   353.2s Lin DSL Out_channel test with Domain
[ ]  211    0    0  211 / 1000   418.4s Lin DSL Out_channel test with Domain
[ ]  238    0    0  238 / 1000   519.7s Lin DSL Out_channel test with Domain
[ ]  298    0    0  298 / 1000   579.8s Lin DSL Out_channel test with Domain
[ ]  367    0    0  367 / 1000   641.3s Lin DSL Out_channel test with Domain
[ ]  427    0    0  427 / 1000   701.4s Lin DSL Out_channel test with Domain
[ ]  509    0    0  509 / 1000   762.3s Lin DSL Out_channel test with Domain
[ ]  585    0    0  585 / 1000   822.7s Lin DSL Out_channel test with Domain
[ ]  619    0    0  619 / 1000   912.1s Lin DSL Out_channel test with Domain
[ ]  670    0    0  670 / 1000  1000.2s Lin DSL Out_channel test with Domain
[ ]  709    0    0  709 / 1000  1475.1s Lin DSL Out_channel test with Domain (shrinking:    0)
[ ]  709    0    0  709 / 1000  3952.3s Lin DSL Out_channel test with Domain (shrinking:    1)
[ ]  709    0    0  709 / 1000  4320.7s Lin DSL Out_channel test with Domain (shrinking:    1.0002)
[ ]  709    0    0  709 / 1000  8293.3s Lin DSL Out_channel test with Domain (shrinking:    1.0003)
^CFatal error: exception User interruption
[ ]  709    0    0  709 / 1000  8503.8s Lin DSL Out_channel test with Domain (shrinking:    2)Terminate batch job (Y/N)? 
Error: The operation was canceled.

The next to last shrinking step (from 1.0002 to 1.0003) spends 3972s on 1 successful shrink step (there may be several unsuccessful ones though) - that's 66 minutes - over an hour!

@jmid
Copy link
Collaborator Author

jmid commented Jan 23, 2024

Closing by replacing the Lin test with an STM one in #431

@jmid jmid closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ocaml5-issue A potential issue in the OCaml5 compiler/runtime
Projects
None yet
Development

No branches or pull requests

2 participants