Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub: enforce-clang-format is sporadically hanging #257

Closed
StephanTLavavej opened this issue Nov 5, 2019 · 16 comments
Closed

GitHub: enforce-clang-format is sporadically hanging #257

StephanTLavavej opened this issue Nov 5, 2019 · 16 comments
Labels
bug Something isn't working infrastructure Related to repository automation not reproducible We can’t reproduce the described behavior

Comments

@StephanTLavavej
Copy link
Member

StephanTLavavej commented Nov 5, 2019

For #20191105.2, I observed BION-TR-GH-4 hanging on the x64 clang-format task. Rerunning this, #20191105.3 scheduled x64/clang-format on BION-TR-GH-2 which completed quickly.

For #20191104.6, I observed a hang, but it appears that Azure Pipelines deleted the build after it was cancelled, and I don't seem to be able to see which agent was responsible for running clang-format.

@StephanTLavavej StephanTLavavej added bug Something isn't working infrastructure Related to repository automation labels Nov 5, 2019
@StephanTLavavej
Copy link
Member Author

@BillyONeal, your GH-n machine names are autolinking issues when mentioned.

@BillyONeal
Copy link
Member

BahahahHahah

@StephanTLavavej StephanTLavavej added the help wanted Extra attention is needed label Nov 5, 2019
@StephanTLavavej
Copy link
Member Author

I suspect that parallelize may be involved. Charlie asked whether it's necessary: #255 (comment) I think that a possible improvement could be to remove parallelize and split the support tools, clang-format, and validation into a separate job so it doesn't block the main build.

@BillyONeal
Copy link
Member

Splitting it into a separate job won't help unless we have more than 4 build machines. With 4 machines moving it out will increase build time because it's an extra repository download.

@StephanTLavavej
Copy link
Member Author

Copying your comment from #255 (comment) to ensure it's not lost:

It is not fast, without parallelization it is the most expensive part of the build. Considering the parallelism tech is also going to be used in the test harness if that's broken we need to fix it.

@BillyONeal
Copy link
Member

I think this problem might just be due to the hardware it's running on or the dogfood Windows it is running at this time.

@StephanTLavavej
Copy link
Member Author

Can we use production Windows for production infrastructure? (I realize that this will be moot very soon after we switch to using VMs.)

@BillyONeal
Copy link
Member

When the machine I had running production Windows died this is the machine I had to use :)

@StephanTLavavej
Copy link
Member Author

#20191107.13 triggered the 5-minute timeout, but it looks like it was making progress:

464 scheduled; 400 completed; 8 running
464 scheduled; 401 completed; 8 running
464 scheduled; 402 completed; 8 running
464 scheduled; 403 completed; 8 runningTerminate batch job (Y/N)? 
If your build fails here, you need to format the following files with:
clang-format version 9.0.0 (tags/RELEASE_900/final)
clang-format will produce the following diff:
##[error]The task has timed out.
Finishing: Enforce clang-format

I don't know if it was making rapid progress and then started hanging, or if it was making very slow progress and then encountered the timeout. Should parallelize print timestamps?

@robert-andrzejuk
Copy link

robert-andrzejuk commented Nov 8, 2019

Don't know if it's relevant for this case, as this is related to MSVC:
I have the latest llvm clang-format configured to be used by MSVC for all formatting scenarios.
Sometimes formatting hangs, when it tries to format a file which is in the process of been saved (in efect a race condition). After a timeout I have to repeat the process and format again.
Unfortunatly I don't have a reproducable example :-(
This could be an issue with clang-format not handling access to files correctly.

@BillyONeal
Copy link
Member

Interesting. Maybe I could add timeout and retry to parallelize.exe....

@BillyONeal
Copy link
Member

Hmmm this has not happened in the last week or so :/

@StephanTLavavej StephanTLavavej added not reproducible We can’t reproduce the described behavior and removed help wanted Extra attention is needed labels Nov 12, 2019
@StephanTLavavej
Copy link
Member Author

Let’s keep an eye out for timeout failures. Otherwise, let’s close this as No Repro.

@BillyONeal
Copy link
Member

Spooky action at a distance :)

@StephanTLavavej StephanTLavavej removed the not reproducible We can’t reproduce the described behavior label Nov 13, 2019
@StephanTLavavej
Copy link
Member Author

StephanTLavavej commented Nov 13, 2019

This happened again with #20191113.1 on BION-TR-GH-1. Billy found that parallelize.exe didn't launch, but my aggressive 5 minute timeout prevented him from getting a memory dump. We're going to increase the timeout to 60 minutes and investigate the next hang.

@StephanTLavavej StephanTLavavej added the not reproducible We can’t reproduce the described behavior label Dec 18, 2019
@StephanTLavavej
Copy link
Member Author

We haven't seen this since increasing the timeout and migrating our infrastructure to VMs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working infrastructure Related to repository automation not reproducible We can’t reproduce the described behavior
Projects
None yet
Development

No branches or pull requests

3 participants