-
-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run plugins' test suites with server in the same process #1628
Conversation
Tests on Windows seemd got stuck again: Isn't it enough to set |
You also need to build with |
Mmm, haskell-language-server/plugins/hls-eval-plugin/hls-eval-plugin.cabal Lines 95 to 100 in 1f12a61
|
d29c7dd
to
405635c
Compare
So far, tests of plugins on linux and macOS look great, but on Windows, they seem to hang100%. I don't have Windows machine with development environment setup, no idea how to debug this. |
I'll try to take a look in my local win 10 |
Unfortunately tests pass in my local windows 10, after a first try with some |
Ugh, it's still getting stuck... BTW definition and hover tests of ghcide is fairly unstable, and it even fails on my machine constantly: #1626 (comment) |
But this is the same problem that CI shows, isn't it? Tests are not stuck, they are failing in Windows GHC 8.10.4. "resource vanished - broken pipe" means that HLS stopped running. Either it was killed, it crashed, or it decided to exit on its own. Why is HLS crashing in Windows 8.10.4 all of a sudden? Has anything changed? |
Yeah, i thought they stuck without no response until a ci timeout, without reviewing the logs,thanks for pointing it out.
It is failing since some time ago, although it seems in this pr is worse. I have been reviewing the gh actions and dont see a pattern. There are recent builds with succesful test run at first like https://github.com/haskell/haskell-language-server/runs/2197782538?check_suite_focus=true |
Sometimes a dying server (after the test session finished successfully) throws this error:
Does it mean the server received two exit messages? |
This PR didn't make any changes to the ghcide test suite, so I don't see the connection. Unfortunately master has the getting-stuck problem which means we cannot see if the Windows issue is specific to this PR or not |
Yes, I think so. Perhaps this is a new feature of lsp-test. Great find! Please fix the crash by replacing the |
One difference between local machine and Windows CI I noticed (by checking lsp messages log one by one) was that in CI, there were no hiedb related messages sent from the server, such as |
But according to the lsp messages log, the client sends exactly one exit message 🤔 Anyway, I don't see the reason that we need |
You might want to investigate/patch lsp-test. I believe the resource vanished error comes from this line: https://github.com/haskell/lsp/blob/6dd6ad630ddb4a08a2bbcd6f71588f0864a555b6/lsp-test/src/Language/LSP/Test/Session.hs#L440 It might be good to add a catch there, or at least print out more information for debugging (like the message we are trying to send). |
hls-test-utils/src/Test/Hls.hs
Outdated
timeout 3 (wait server) >>= \case | ||
Just () -> pure () | ||
Nothing -> putStrLn "Server does not exit on time, canceling the async task..." >> cancel server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests on Windows passed with this. We should investigate why the server didn't exit properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just looking at the exit code in ghcide and I think it might be time to update it. I'm not sure that the waitAnyCancel
construction is needed any more, and it could be hiding exceptions coming from the lsp
library:
haskell-language-server/ghcide/src/Development/IDE/LSP/LanguageServer.hs
Lines 114 to 120 in a4dee2d
void $ waitAnyCancel =<< traverse async | |
[ void $ LSP.runServerWithHandles | |
inH | |
outH | |
serverDefinition | |
, void $ waitBarrier clientMsgBarrier | |
] |
EDIT: according to the docs, exceptions will be re-thrown to the parent thread, so maybe it's harmless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another problem is that defaultMain
does not call setupLogger
so we might be missing lsp errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- These barriers are signaled when the threads reading from these chans exit. | |
-- This should not happen but if it does, we will make sure that the whole server | |
-- dies and can be restarted instead of losing threads silently. | |
clientMsgBarrier <- newBarrier |
These comments match the old code:
haskell-language-server/ghcide/src/Development/IDE/LSP/Server.hs
Lines 73 to 99 in fa671cb
void $ waitAnyCancel =<< traverse async | |
[ void $ LSP.runWithHandles | |
stdin | |
newStdout | |
( const $ Right () | |
, handleInit (signalBarrier clientMsgBarrier ()) clientMsgChan | |
) | |
(handlers clientMsgChan) | |
options | |
Nothing | |
, void $ waitBarrier clientMsgBarrier | |
] | |
where | |
handleInit :: IO () -> TChan LSP.FromClientMessage -> LSP.LspFuncs () -> IO (Maybe LSP.ResponseError) | |
handleInit exitClientMsg clientMsgChan lspFuncs@LSP.LspFuncs{..} = do | |
Handlers{..} <- getHandlers lspFuncs | |
let requestHandler' (req, reqId) = requestHandler | |
(\res -> ResponseMessage "2.0" (responseId reqId) (Just res) Nothing) | |
(\err -> ResponseMessage "2.0" (responseId reqId) Nothing (Just $ ResponseError err "" Nothing)) | |
req | |
_ <- flip forkFinally (const exitClientMsg) $ forever $ do | |
msg <- atomically $ readTChan clientMsgChan | |
case convClientMsg msg of | |
Nothing -> Logger.logSeriousError loggerH $ "Unknown client msg: " <> T.pack (show msg) | |
Just (Left notif) -> notificationHandler notif | |
Just (Right req) -> sendFunc =<< requestHandler' req | |
pure Nothing |
but now this barrier will be signalled when the server receives exit message instead.
I think we could try removing the waitAnyCancel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now we signal the barrier from two places - the exit handler and also if the initialisation fails, which probably explains the error that you were seeing earlier.
I'm not convinced that removing it will actually fix anything. runServerWithHandles
doesn't actually exit until the input stream is closed, and I don't know whether lsp-test
closes the stream until the server exits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that lsp-test
won't close the handle for us (in our case mServerProc
is Nothing
):
haskell-language-server/lsp-test/src/Language/LSP/Test/Session.hs
Lines 282 to 294 in 7a2ff3a
let cleanup | |
| Just sp <- mServerProc = do | |
-- Give the server some time to exit cleanly | |
-- It makes the server hangs in windows so we have to avoid it | |
#ifndef mingw32_HOST_OS | |
timeout msgTimeoutMs (waitForProcess sp) | |
#endif | |
cleanupProcess (Just serverIn, Just serverOut, Nothing, sp) | |
| otherwise = pure () | |
finally (timeout msgTimeoutMs (runSession' exitServer)) | |
-- Make sure to kill the listener first, before closing | |
-- handles etc via cleanupProcess | |
(killThread tid >> cleanup) |
But I didn't find any manual close in the test of lsp-test
. Maybe @wz1000 could help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure the handle closes when the server exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handle closes when the server exits, sure, but as I said runServerWithHandles
doesn't exit until the stream ends, so it won't exit.
I observed some snippets from debug log, like this:
|
That is not necessarily an issue, helper threads like the reactor thread or the hiedb thread would fail like that when the main thread exits the lsp loop and no longer keeps a reference to the shared state |
The server sends "Finishing build session..." using haskell-language-server/ghcide/src/Development/IDE/Core/Shake.hs Lines 726 to 735 in 252c500
haskell-language-server/ghcide/src/Development/IDE/Core/Shake.hs Lines 673 to 677 in 252c500
And it seems that the server still reports progress after shutting down the shake. Should we call haskell-language-server/ghcide/src/Development/IDE/Core/Shake.hs Lines 618 to 624 in 252c500
Also, is it possible to disable lsp capabilities when we want server die? A piece of log from Windows CI:
|
This output is caused by So it's probably not a big deal |
The barrier did be signaled two times. Is this exit what we expected?
Full test log:
|
I thought you replaced this barrier with an MVar at some point - that change makes sense to me, since we call it from two places and therefore it shouldn't be a barrier. The "thread blocked indefinitely in an MVar operation" error is ugly - it means that the main loop has finished and no longer holds a reference to the shared state, but there are background threads still waiting on MVars. It would be good to identify which threads and clean them up properly, or just make sure we use |
@@ -107,22 +107,6 @@ library | |||
TypeOperators, | |||
ViewPatterns | |||
|
|||
|
|||
executable test-server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent change. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks!
Branch protection feels unhappy with the skipped nix jobs :( |
Umm, i have to change it to skip steps instead the entire job, like #1656 |
silenceStderr :: IO a -> IO a | ||
silenceStderr action = withTempFile $ \temp -> | ||
bracket (openFile temp ReadWriteMode) hClose $ \h -> do | ||
old <- hDuplicate stderr | ||
buf <- hGetBuffering stderr | ||
h `hDuplicateTo'` stderr | ||
action `finally` do | ||
old `hDuplicateTo'` stderr | ||
hSetBuffering stderr buf | ||
hClose old |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we can no longer use LSP_TEST_LOG_STDERR=1
to get more info about a failing test, since silencing stderr is now hard coded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//cc @berberman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I attempted this for all the ghcide testsuite, I installed a logger that sends output to the LSP channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably check for the environment variable manually and pretend that it was lsp-test that did it, to minimize disturbance of people's workflows.
(Here's lsp-test's code for it: https://github.com/haskell/lsp/blob/e707cbf5ca7077f70884ae0d2a8d016aa30ced5a/lsp-test/src/Language/LSP/Test.hs#L267-L275)
That way the current uses, like in ci doesn't have to change. Regardless of what we do, it should probably be documented together with the LSP_TEST_LOG_MESSAGES
env variable in the contributing section, which it is currently not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, pr's welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that we can no longer use
LSP_TEST_LOG_STDERR=1
to get more info about a failing test, since silencing stderr is now hard coded?
Yes, output from lsp server is completely removed in test suites which use this function since that change, because there were too many logs were printed not through the logger, but were directly written to stderr, messing up testing status. So setting environment variable LSP_TEST_LOG_MESSAGES actually won't work. However, this function is a temporary workaround, and we should cleanup such code in server.
Following #1617
Hope this would make tests more stable