Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HTTP/3] Stress hack for msquic dropping connections #84793

Merged
merged 13 commits into from
Apr 28, 2023

Conversation

ManickaP
Copy link
Member

Implement the same hack as for functional tests to prevent msquic from dropping connections, see #83687.

Fixes #77126

@ghost
Copy link

ghost commented Apr 13, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Implement the same hack as for functional tests to prevent msquic from dropping connections, see #83687.

Fixes #77126

Author: ManickaP
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

@ManickaP
Copy link
Member Author

/azp run runtime-libraries stress-http

@ManickaP ManickaP requested a review from antonfirsov April 13, 2023 17:19
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines 13 to 17
<Compile Include="..\..\..\..\Common\src\System\Net\Logging\NetEventSource.Common.cs" Link="Common\System\Net\Logging\NetEventSource.Common.cs" />
<Compile Include="..\..\..\..\System.Net.Quic\src\System\Net\Quic\Internal\MsQuicApi.cs" Link="ProductionCode\System\Net\Quic\Internal\MsQuicApi.cs" />
<Compile Include="..\..\..\..\System.Net.Quic\src\System\Net\Quic\Internal\MsQuicSafeHandle.cs" Link="ProductionCode\System\Net\Quic\Internal\MsQuicSafeHandle.cs" />
<Compile Include="..\..\..\..\System.Net.Quic\src\System\Net\Quic\Interop\*.cs" Link="ProductionCode\System\Net\Quic\Interop\*.cs" />
<Compile Include="..\..\..\..\System.Net.Quic\src\System\Net\Quic\NetEventSource.Quic.cs" Link="ProductionCode\System\Net\Quic\NetEventSource.Quic.cs" />
Copy link
Member

@antonfirsov antonfirsov Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A refactor of these internals may break stress builds without us noticing, but I guess there is no good way to deal with this. We should remember that HttpStress took a dependency.

Copy link
Member

@antonfirsov antonfirsov Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: these files are missing from the docker context on Windows, which is why the Windows build failed. Copying them would make this solution even more fragile. Can we implement the worker delay tuning hack with reflection instead and share that implementation between System.Net.Quic.Functional.Tests and the stress tests to make sure we notice if it's broken? That would also remove the need of including anything but QuicDefaults.cs in System.Net.Quic.Functional.Tests.csproj (apart from the utility containing the hack).

Copy link
Member

@antonfirsov antonfirsov Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, we would need #75347 to invoke function pointers with reflection safely. I guess, the best would be to copy & include msquic_generated.cs, then do a combination of reflection + code that only depends on the generated types.

I can take over if you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reflection was one of the options we discussed with @CarnaViire when implementing this in functional tests. I was able to workaround it in functional tests with source code inclusion, which is normal for corresponding tests. However, here it proves problematic.
I could re-work this to create an internal function on MsQuicApi which we would invoke via reflection from both functional tests as well as stress.

@ManickaP
Copy link
Member Author

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines 232 to 234
// Do not change the name and signature without looking for textual occurrences!
// This method is invoked via reflection from QUIC functional and HTTP stress tests.
private static (bool, string) SetUpForTests()
Copy link
Member

@antonfirsov antonfirsov Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely the easiest option to solve this problem, but I thought including test-only methods in production assemblies is no-go. Do we have any existing precedent in the BCL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought about that, but I found this for example:

// For testing.
internal HPackDecoder(int maxDynamicTableSize, int maxHeadersLength, DynamicTable dynamicTable)

I can always go back to code sharing though and try to minimize the number of shared files, either with some some reflection, strategically placed #if and/or code stubs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HPack ctr is not "dead code" otherwise, because it's also being used by the public constructor above. For me having a private, test-only method is a good tradeoff, considering the maintenance burden that comes with other options, but we should stick to BCL guidance. @stephentoub is it ok to expose such methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combo of shared code + reflection is there. @antonfirsov could you look into copying the appropriate files in the docker container please?

@ManickaP ManickaP force-pushed the mapichov/h3-stress-msquic-hack branch from f85d1af to 8a080e3 Compare April 20, 2023 10:08
@ManickaP ManickaP force-pushed the mapichov/h3-stress-msquic-hack branch from 8a080e3 to fa5820e Compare April 20, 2023 13:06
@@ -1,5 +1,5 @@
# Builds and copies library artifacts into target dotnet sdk image
ARG BUILD_BASE_IMAGE=mcr.microsoft.com/dotnet-buildtools/prereqs:centos-7-f39df28-20191023143754
ARG BUILD_BASE_IMAGE=mcr.microsoft.com/dotnet-buildtools/prereqs:centos-7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8.0 will not support Centos 7 AFAIK (hurray!) we should update this at some point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream9??? Or should we swap for different distro?

Copy link
Member

@antonfirsov antonfirsov Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we are running stress tests on Centos and not ubuntu, unlike enterprise tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the centos-7 images have build tools and we use them for product. I don't know if that is needed.
I think we should pick whatever is easiest for us to maintain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm switching to centos-stream8 in #85342, which seems to work.

@@ -44,14 +44,20 @@ public abstract class QuicTestBase : IDisposable

static unsafe QuicTestBase()
{
Console.WriteLine($"MsQuic {(IsSupported ? "supported" : "not supported")} and using '{MsQuicApi.MsQuicLibraryVersion}'.");
// If any of the reflection bellow breaks due to changes in "System.Net.Quic.MsQuicApi", also check and fix HttpStress project as it uses the same hack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should go somewhere to Common/test/.... so we can avoid duplication.
It may be worth of adding explanation as this is unusual craft.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll look into it.

Copy link
Member

@antonfirsov antonfirsov Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep in mind that whatever utility helper we add, its' file has to be also copied to the container, so avoiding duplication has a higher cost in complexity here than normally. At the moment I'm undecided if it's worth it, would prefer to do the docker copying work mentioned in #84793 (comment) first, planning to jump to it tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm holding back. Let me know when/if I should look into it.

@@ -158,6 +162,9 @@ private static async Task<ExitCode> Run(Configuration config)

string GetAssemblyInfo(Assembly assembly) => $"{assembly.Location}, modified {new FileInfo(assembly.Location).LastWriteTime}";

Type msQuicApiType = typeof(QuicConnection).Assembly.GetType("System.Net.Quic.MsQuicApi");
string msQuicLibraryVersion = (string)msQuicApiType.GetProperty("MsQuicLibraryVersion", BindingFlags.NonPublic | BindingFlags.Static).GetGetMethod(true).Invoke(null, Array.Empty<object?>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ManickaP unfortunately, this line fails with NullReferenceException in release builds. I assume the MsQuicLibraryVersion is being trimmed away, since it's unused in the production code.

We could workaround this by moving the version extraction to a shared test-utility (together with the MaxWorkerQueue hack helper), but IMHO it's not worth it unless we think that it's super-important to know the msquic version for stress test troubleshooting.

For now I'm just removing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that there is image without MsQuic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? If you are suspecting lack of msquic as the root cause of this error - it's not the case. Removing code messing with MsQuicLibraryVersion makes things work (= HTTP/3 stress runs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that there is image without MsQuic?

In that case you'd get "unknown" as the MsQuic version, the code expects that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to tell the trimmer to keep it there via DynamicDependency. And this is just one string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to clarify whether test-only usage of DynamicDependency is ok, without that, I prefer to defensively assume it's not.

@@ -143,7 +143,7 @@ static MsQuicApi()

if (NetEventSource.Log.IsEnabled())
{
NetEventSource.Info(null, $"Loaded MsQuic library version '{version}', commit '{gitHash}'.");
NetEventSource.Info(null, $"Loaded MsQuic library '{MsQuicLibraryVersion}'.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant 😆

@azure-pipelines

This comment was marked as resolved.

@ManickaP

This comment was marked as resolved.

@azure-pipelines

This comment was marked as resolved.

@ManickaP
Copy link
Member Author

If this passes, I think it's done. Thanks for the help @antonfirsov !

@ManickaP
Copy link
Member Author

I'm seeing this build error in Linux runs:

/app/Directory.Build.targets(2,3): error MSB4019: The imported project "/live-runtime-artifacts/targetingpacks.targets" was not found. Confirm that the expression in the Import declaration "/live-runtime-artifacts/targetingpacks.targets" is correct, and that the file exists on disk. [/app/HttpStress.csproj]

Anything familiar Anton?

@antonfirsov
Copy link
Member

Anything familiar Anton?

This failed in multiple runs on this PR, and not on main, need to check if my changes broke it.

@azure-pipelines

This comment was marked as resolved.

@antonfirsov
Copy link
Member

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good now. Fixed the linux build with 949d82b. The failure in the last run is #76183.

@ManickaP ManickaP merged commit 2deebf0 into dotnet:main Apr 28, 2023
@ManickaP ManickaP deleted the mapichov/h3-stress-msquic-hack branch April 28, 2023 08:37
@karelz karelz added this to the 8.0.0 milestone May 27, 2023
ManickaP added a commit to ManickaP/runtime that referenced this pull request Jun 22, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[HttpStress] massive timeouts around HTTP/3 stress test startup
5 participants