-
-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition that caused LaunchAsync to never resolve for chrome #2214
Conversation
From my tests, FirefoxTargetManager doesn't seem affected by the same race condition. |
I'm taking a few days off. I'll take a look at this on Monday. Thanks! |
I spent some hours trying to understand why some unit tests were failing with my previous commit. It looks like the await inside of the I think this is caused by the I tried making In my opinion, the code in |
I'm looking at your findings in #2202.
The problem here is that I think it's not the only place where we send a message inside an event. Thank you for your time! |
I know it's tricky, I tried changing the code to be synchronous and hit a few deadlocks. But today I was able to change |
Ok, I compared the code with the upstream puppeteer, and I believe the upstream code does not await on the The call to I also have a clearer understanding of why this might not be an issue in upstream. There are some difference in semantics between async in js and .net. One key difference is that js is single-threaded, so it's not possible in js to have a thread switch between the execution of As I'm feeling more confident, and seeing that it improves the test results, I've included the changes for not awaiting |
I'm not being able to get a full green in AppVeyor :( |
I know. I'm trying to reproduce the failures in my local machine, but so far haven't been able to. |
I've been running test cases over and over to detect race conditions / deadlocks. I'll document what I find here. Note that I've added logging in multiple places, and line numbers might not match exactly what's committed. Today I found the following exception led to a deadlock when executing PuppeteerSharp.Tests.LauncherTests.PuppeteerConnectTests.ShouldBeAbleToCloseRemoteBrowser:
This is what seems to have happened.
Stack of the deadlocked thread (TaskQueue appearing twice):
My thoughts on this:
|
Another race condition.
Thoughts:
|
The race-condition of attaching event handler(s) before populating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Sure! |
Great finding 🚀 Consider extracting this a separate PR. |
Only process received messages after the discovered targets have been stored.
…sages unchanged. This fixes some of the unit tests, which were failing due to changes in the order of execution of initialization messages.
…sing return when ignoring a target.
…_logger field is initialized. * Fixed a deadlock that could happen if the connection is closed on the thread that is processing received messages. TaskQueue could not be disposed on the same thread that held the semaphore. * Fixed a race condition if targets are created/changed concurrently before the TargetHandler is registered as an event handler.
… thread A could invoke `TaskQueue.Dispose()` and set `_isDisposed = 1`, which would then allow thread B to finish work setting `_held = false` but without releasing the semaphore, and then thread A would attempt `_semaphore.Wait()` entering a deadlock.
…t having discovered all targets. This was causing unit tests such as PuppeteerConnectTests.ShouldSupportTargetFilter to fail because the test executed faster than the target discovery.
791f324
to
84c3dd5
Compare
@jnyrup I'll wait for your approval here. AppVeyor seems to go in the green direction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look well-argued to me.
I just have some questions to ensure your fixes don't regress anything else.
public ConcurrentDictionary<string, Target> GetAvailableTargets() => _attachedTargetsByTargetId; | ||
|
||
public async Task InitializeAsync() | ||
{ | ||
_ = _connection.SendAsync("Target.setDiscoverTargets", new TargetSetDiscoverTargetsRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benefit of moving the call to Target.setDiscoverTargets
in InitializeAsync
instead of having it in the constructor as it is in puppeteer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't fully understand this, so I'll have to run more tests to get a better grip at it. But the test PuppeteerConnectTests.ShouldSupportTargetFilter was failing randomly. This test connects to a browser that already has 3 tabs open.
What I found was that the messages of the targets were being processed concurrently, even before the event handlers in AttachAsync
were added. Again, ultimately this is due to the multi-threaded .NET vs single-threaded JS difference, as in JS the constructor and the event handlers would be added without any concurrent code.
I will get back to you with a more thorough explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the addition of _targetDiscoveryCompletionSource
to ensure it has completed before anything else tries to consume _targetsIdsForInit
which StoreExistingTargetsForInit
populates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea, if we move Target.setDiscoverTargets
to InitializeAsync
anyways, can we use modern async/await syntax and avoid _targetDiscoveryCompletionSource
or am I missing some concurrency again?
public async Task InitializeAsync()
{
var setDiscoverTargetsTask = _connection.SendAsync("Target.setDiscoverTargets", new TargetSetDiscoverTargetsRequest
{
Discover = true,
Filter = new[]
{
new TargetSetDiscoverTargetsRequest.DiscoverFilter()
{
Type = "tab",
Exclude = true,
},
new TargetSetDiscoverTargetsRequest.DiscoverFilter(),
},
});
await _connection.SendAsync("Target.setAutoAttach", new TargetSetAutoAttachRequest()
{
WaitForDebuggerOnStart = true,
Flatten = true,
AutoAttach = true,
}).ConfigureAwait(false);
try
{
await setDiscoverTargetsTask.ConfigureAwait(false);
StoreExistingTargetsForInit();
}
catch (Exception ex)
{
_logger.LogError(ex, "Target.setDiscoverTargets failed");
}
FinishInitializationIfReady();
await _initializeCompletionSource.Task.ConfigureAwait(false);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnyrup I have reverted these changes and couldn't reproduce any failure in 100 executions of PuppeteerConnectTests.ShouldSupportTargetFilter. Let's see how it behaves in AppVeyor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out awaiting for _targetDiscoveryCompletionSource
is needed before calling FinishInitializationIfReady()
, otherwise FinishInitializationIfReady
could execute before any targets have been added to the _targetsIdsForInit
collection and the initialization would be considered completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with how it's done now 👍
... but if we moved _connection.SendAsync("Target.setDiscoverTargets"
to InitializeAsync
again and awaited it, like in my example above, I'm failing to see how we can reach FinishInitializationIfReady
before setDiscoverTargetsTask
has completed, which should remove the need for _targetDiscoveryCompletionSource
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnyrup had a closer look at your code above.
The change seems fine, and it simplifies a few things - for example it uses a try/catch which is more natural than checking t.IsFaulted
. It's also closer to how FirefoxTargetManager works.
However, the _targetDiscoveryCompletionSource
is still needed, to signal other threads that may be processing a message and executing OnAttachedToTarget
.
It also has the downside of diverging from upstream.
I can commit these changes if you agree it's a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, it's those events messing with my brain again...
I now (again) see why _targetDiscoveryCompletionSource
still is necessary.
I'll let kblok decide whether to call _connection.SendAsync("Target.setDiscoverTargets"
in the constructor of InitializeAsync
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in this PR, _targetDiscoveryCompletionSource
is more important than the location of Target.setDiscoverTargets
. I would leave Target.setDiscoverTargets
in the constructor as we have in upstream.
We are making some important changes (good changes). But it would be nice to see if we do need to move it to InitializeAsync
in isolation. If we see that we need it, we can move it in a new PR.
This PR is looking great. I would merge it after the new WithTimeout
s. Thank you @leonardo-fernandes, @jnyrup for the hard work here.
…hat new targets are added to `_attachedTargetsByTargetId` inside of the semaphore. Also fixes `Page.CloseAsync()` which was returning before `Target.CloseTask` resolved. This affected BrowserContextTests.ShouldFireTargetEvents on which it was possible for the test to finish before the `TargetDestroy` event.
It was possible for the InitializeAsync to finish without all targets being initialized, and consequently the test would read an empty list of targets. The _targetDiscoveryCompletionSource should be awaited before logic that depends on _targetsIdsForInit inside of message processing, to make sure this collection was already initialized during the browser launch.
…stingPage. Disposing the `browser1` was closing the page, which then caused the `Page.CloseAsync()` in `PuppeteerPageBaseTest` to fail. The test code now matches upstream puppeteer.
Anyone has any idea why FrameWaitForFunctionTests.ShouldPollOnMutation is failing in ubuntu? I can't find any explanation for that failure, unless something is really broken with I tried reproducing it on an ubuntu VM and was not able to get it to fail. EDIT: what were the chances? after adding some logging, this test failed in my local machine for the first time. What is happening is that the first When the test fails, it is because the first This looks like it's mostly an issue with the test itself, and not related to the changes I've done so far. So I'll think on how it can be fixed, and create a separate PR if I find a good solution that doesn't require diverging from upstream so much. |
await Client.Connection.SendAsync("Target.closeTarget", new TargetCloseTargetRequest | ||
{ | ||
TargetId = Target.TargetId, | ||
}).ContinueWith(task => ((Target)Target).CloseTask, TaskScheduler.Default); | ||
} | ||
}).ConfigureAwait(false); | ||
|
||
_logger.LogWarning("Protocol error: Connection closed. Most likely the page has been closed."); | ||
return Task.CompletedTask; | ||
await ((Target)Target).CloseTask.ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm noticing that we now only execute ((Target)Target).CloseTask
if Client.Connection.SendAsync("Target.closeTarget"
completes successfully.
Should we still execute ((Target)Target).CloseTask
if Client.Connection.SendAsync("Target.closeTarget"
throws an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, I based it on upstream.
If Target.closeTarget
fails, it's very likely that TargetManager_TargetGone
will not be invoked, and the await of CloseTask
would block the code indefinitely.
public ConcurrentDictionary<string, Target> GetAvailableTargets() => _attachedTargetsByTargetId; | ||
|
||
public async Task InitializeAsync() | ||
{ | ||
_ = _connection.SendAsync("Target.setDiscoverTargets", new TargetSetDiscoverTargetsRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with how it's done now 👍
... but if we moved _connection.SendAsync("Target.setDiscoverTargets"
to InitializeAsync
again and awaited it, like in my example above, I'm failing to see how we can reach FinishInitializationIfReady
before setDiscoverTargetsTask
has completed, which should remove the need for _targetDiscoveryCompletionSource
.
{ | ||
_discoveredTargetsByTargetId.TryRemove(e.TargetId, out var targetInfo); | ||
await _targetDiscoveryCompletionSource.Task.ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we are trying to enforce this. What do you think about using WithTimeout
, and throwing an exception? If we fail to complete this task somehow, the code will get stuck here.
The same for the other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that _targetDiscoveryCompletionSource
will be set if and only if the message Target.setDiscoverTargets
completes. I can't think of any scenario where it wouldn't complete.
There's no code that uses a timeout value in ChromeTargetManager. Where should I derive the timeout value from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use Puppeteer.DefaultTimeout
. It's quite a long timeout. We want to prevent the code from hanging.
@leonardo-fernandes Since you seem to excel at this concurrency stuff, you might also want to give |
…nSource`, as a defensive measure against deadlocks.
await _targetDiscoveryCompletionSource.Task.ConfigureAwait(false); | ||
await EnsureTargetsIdsForInit().ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is way better than looking for the variable every time. Thanks! Let's wait for the green!
@leonardo-fernandes @jnyrup I consider this PR approved. Let's create a follow-up PR if you find more things to improve. Side Note: You can't image, well, I bet you @jnyrup can imagine what it means for a community project to have more developers looking at this code. Thank you! |
hardkoded#2214) * Fix hardkoded#2202 Only process received messages after the discovered targets have been stored. * Change the await to only apply to attachedToTarget, leaving other messages unchanged. This fixes some of the unit tests, which were failing due to changes in the order of execution of initialization messages. * Remove the await for OnAttachedToTarget call, and also included a missing return when ignoring a target. * * Fixed a race condition if a message is received before the Browser._logger field is initialized. * Fixed a deadlock that could happen if the connection is closed on the thread that is processing received messages. TaskQueue could not be disposed on the same thread that held the semaphore. * Fixed a race condition if targets are created/changed concurrently before the TargetHandler is registered as an event handler. * Previous commit introduced a new race condition. It was possible that thread A could invoke `TaskQueue.Dispose()` and set `_isDisposed = 1`, which would then allow thread B to finish work setting `_held = false` but without releasing the semaphore, and then thread A would attempt `_semaphore.Wait()` entering a deadlock. * It was possible for the TargetManager initialization to finish without having discovered all targets. This was causing unit tests such as PuppeteerConnectTests.ShouldSupportTargetFilter to fail because the test executed faster than the target discovery. * PR review * Rolling back Target.setDiscoverTargets to be sent from the constructor * Handle exceptions in OnAttachedToTarget * OnAttachedToTarget should be executed synchronously if possible, so that new targets are added to `_attachedTargetsByTargetId` inside of the semaphore. Also fixes `Page.CloseAsync()` which was returning before `Target.CloseTask` resolved. This affected BrowserContextTests.ShouldFireTargetEvents on which it was possible for the test to finish before the `TargetDestroy` event. * Fix PuppeteerConnectTests.ShouldSupportTargetFilter. It was possible for the InitializeAsync to finish without all targets being initialized, and consequently the test would read an empty list of targets. The _targetDiscoveryCompletionSource should be awaited before logic that depends on _targetsIdsForInit inside of message processing, to make sure this collection was already initialized during the browser launch. * Fix OOPIFTests.ShouldDetectExistingOopifsWhenPuppeteerConnectsToAnExistingPage. Disposing the `browser1` was closing the page, which then caused the `Page.CloseAsync()` in `PuppeteerPageBaseTest` to fail. The test code now matches upstream puppeteer. * Revert unintentional line ending changes. * Use the launcher timeout when awaiting for `_targetDiscoveryCompletionSource`, as a defensive measure against deadlocks.
Wow @leonardo-fernandes et al. - thanks for all of the effort here! Any idea when a 10.x release might land with this fix? I'm excited about not having to manually kill hung tasks! |
@danports I'm shipping this tomorrow! |
Fixes #2202
Only process received messages after the discovered targets have been stored.