-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken System.Net.Http 4.1.1-4.3.0 post-mortem #20777
Comments
So it has been a month (again) and I still didn't do it. Sigh. I am sorry! There are just so many important things to do. |
While I understand the competing priorities, I am concerned that you'll "lose the trail" in figuring out what went wrong. People will move on and forget details. Do you yet know who the key people responsible for were involved in the efforts that created the situation yet or is this still in the pre-investigation stage? |
It is not about people (I won't point fingers, it is not productive). It is about decisions and motivations. The first decisions that lead to the unfortunate state happened before June 2016, the last decisions (affecting our reaction time) around October 2016. So waiting couple of more weeks won't really make a difference in tracking it down. |
That wasn't what I was implying. I meant do you have the right people to talk to or do you have to go fishing and risk ending up in a situation where you can't get the answers because you've waited too long. My concern is "a couple of more weeks" will turn into a "few more months". It took a long time to fix the issue and there appears to be even less motivation to figure out what went wrong. |
I have the right people. I talked to them (few months ago). I have to sit down, write it down (in a way to avoid finger pointing, just stating the facts), get it reviewed. Fill in additional gaps in the story I might have missed ... solid day or two of work. On technical side we already took steps to avoid the problems in future (esp. 2.0):
|
1 and 2 are the most important - having recognised that corefx packages are used on netfx and need to be tested on netfx will go a long way to preventing this class of problem. When the answer to "how did this make it to production, did anybody test this" is "no" you're in constant danger! The approach in dotnet/corefx#18300 is interesting, but i hope it doesn't become a license to require more and more binding redirects. There are still environments where you simply cannot specify them, or at least where no machinery exists to generate them. Consider msbuild tasks, non-exe unit tests, plugins loaded with Assembly.LoadFrom... |
I hope you get a task force standing by to pounce on such issues with immediate response time if they keep occurring. Right now I just hit Update in NuGet and suddenly I start getting "Type System.X.Y.Z does not match constraint ABC" style exceptions that I suspect are exactly because of some .NET Core cancer I do not even care about ruining my .NET Framework app. |
@sandersaares .NET Core "cancer" won't affect .NET Framework apps. Whatever we shipped (and broke) on .NET Framework as NuGet packages was because we wanted to deliver additional value to .NET Framework developers. |
It's good to see the lessons learned from this package dependency complexity, and the focus to fix this with .NET Standard 2.0. However, what about libraries that target versions before .NET 4.6.1 (.NET Standard 2 compatible baseline), and may not be able to switch to the .NET Standard 2 target? This is what originally caused this issue; where some libraries target the framework version, and other libraries then target newer Out-of-Band (OOB) NuGet variants of the same libraries. The combination of the two in an app requires binding redirects, where the consumer (developer) has to make the trade-off between either the lower or upper version. Isn't there a limbo for libraries that use any OOB NuGet that isn't targeting .NET Standard 2 yet? And those libraries cause the same binding redirect issues in the apps with framework targets? So:
Correct? |
The focus is to fix it in all NuGet packages (not scoped to .NET Standard 2.0).
That is correct and it will never go away, until we either stop shipping those OOB entirely (my original plan in #20502 and #20074 - which we are most likely going to back out), or until we change .NET Framework loader to follow .NET Core loader policy to upgrade version to latest available without any bindingRedirects (which is being considered, but also very tricky to do - the code (Fusion), is extremely challenging and it is easy to break other things with any change to it, or make it function only in some scenarios). If you create .NET 4.5 or 4.6.1 app, and your dependencies target 2 different versions of the same package, you will have to make a choice - to upgrade to latest or downgrade to lowest. Via bindingRedirects. dotnet/corefx#18300 will help, but as @gulbanana points out, it doesn't solve all scenarios. We do not plan to ship more (replacement) NuGet packages except the 2 we ship today. If you create .NET Core 2.0 app, then you don't have to ensure anything - things will just work out for you. Of course unless you start referencing (indirectly) packages from higher (2.1+) .NET Core versions, while trying to force them to run on .NET Core 2.0, then you might need some bindingRedirects (or whatever the alternative in Core is). |
Update: Ignore this comment (dumb oversight) Were the issues resolved? I'm still experiencing very weird issues with HttpClient, like doing PostAsync and it does a "GET" call... I'm targeting net core 1.1 the library that wraps HttpClient is NET standard 1.3 |
Yes, all known problems were resolved. If you see any new issues, please file a new bug with description what happens when. Thank you! |
@karelz Guess what I would like to know ;-) Hint: It's been 4 weeks since last update. |
@jahmai know what? |
I'm a little confused here, please be patient with me if I'm posting in the wrong location. I created a new netstandard2.0 project using bash on a mac:
Then I added the
In the default using System;
using System.Net.Http;
namespace Test
{
public class Class1
{
HttpClient client = new HttpClient();
public Class1() {}
}
} Now when I
The beginning of my project.assets.json looks like: {
"version": 2,
"targets": {
".NETStandard,Version=v2.0": {
"IdentityModel/2.8.1": {
"type": "package",
"dependencies": {
"NETStandard.Library": "1.6.1",
"Newtonsoft.Json": "9.0.1",
"System.Net.Http": "4.3.2",
"System.Security.Claims": "4.3.0",
"System.Security.Cryptography.Algorithms": "4.3.0",
"System.Security.Cryptography.X509Certificates": "4.3.0",
"System.ValueTuple": "4.3.1"
},
"compile": {
"lib/netstandard1.4/IdentityModel.dll": {}
},
"runtime": {
"lib/netstandard1.4/IdentityModel.dll": {}
}
}, IdentityModel package is dependent on System.Net.Http >= 4.3.2, I'm not sure where version 4.1.1.1 is coming from. |
sadly the assembly versions do NOT correspond to package versions |
@daveclarke fyi this is a bug in the netstandard2.0 conflict resolution that will be fixed for preview2: dotnet/standard#372, dotnet/sdk#1313 |
So is this issue considered resolved? Back in April it was a "couple more weeks". I was concerned it would turn into a "couple more months". Well. It's July. |
It is still on my personal backlog. Pretty high up now. As I said earlier:
|
Lets hope another item doesnt take its place again.
…On Sun, 2 Jul 2017 at 10:23, Karel Zikmund ***@***.***> wrote:
It is still on my personal backlog. Pretty high up now. As I said earlier:
Should happen before 2.0 ships. After the Ask mode & bug driving madness
(it takes a LOT of my time).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/dotnet/corefx/issues/17522#issuecomment-312475589>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABmDdgqSv3SPxmUELePX-wSoqFzVW2TRks5sJ0V_gaJpZM4MpMGJ>
.
|
Happy 1-year anniversary of #18280! |
This just broke a project I'm working on, again, and took quite a lot of searching to find, again. Where's the postmortem? The issue is also apparently still not fixed, I've upgraded a project to the latest (4.3.2), and still have to manually adjust the bindingredirect from 4.1.1.1 to 4.0.0.0 |
In our case we've seen it because of two problems:
To solve these problems, in our host project (e.g. console app, windows service, web site) we include explicitly the following NuGet packages (even though they are used in child projects, and there is no direct use of them in the host project), to make sure the binding redirects are in place so that the correct versions are loaded on startup:
That fixes the problem for us. Hope it helps. |
@bgever Unfortunately, we're already using Newtonsoft.Json 10.0.3, and have no dependency on Primitives (well, we do, on 4.3.0, but not at the point where System.Net.Http fails to load), so this specific case isn't valid for us. That might help me debug, at least. If I can track down exactly what's not loading, I'll update here. Thanks, regardless. To save anyone else the hours of trouble should they find this... Had to do two new things to get it working:
|
@karelz Status update please? |
And 2.0 has shipped.
…On Tue, 12 Sep 2017 at 03:37, jahmai ***@***.***> wrote:
@karelz <https://github.com/karelz> Status update please?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/dotnet/corefx/issues/17522#issuecomment-328697088>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABmDdlvFnVXiny_Mh-g_ltLB0CaKbjdiks5shdIugaJpZM4MpMGJ>
.
|
Thanks to @Kesmy for indirectly posting a fix that actually works! <dependentAssembly>
<assemblyIdentity name="System.Net.Http" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
<bindingRedirect oldVersion="0.0.0.0-4.1.1.2" newVersion="4.0.0.0" />
</dependentAssembly> Edit: Create a net framework 4.7 class libarary project (in my case it is my test project) |
You need a Meta-Post-Mortem now :( |
I'm giving @karelz the benefit of the doubt and assuming he is on vacation, because the alternative would be extremely disappointing. |
Just lost 2 full days of my team playing around with packages, updating nuget for the whole project, going back again to the old version in hopes that we could find a stable solutions that doesn't involve custom VS setups or weird assembly redirects. I insisted to the team to trust microsoft and the VS team that it must be something we were doing wrong. Very angry, sad and disappointed about this issue that for what I understand is around for over a year |
Hi @karelz, For clarity, here are the topics I'd like addressed by the post-mortem;
Even though the problem was resolved some time ago, there are still customers experiencing the symptoms because there isn't enough clear guidance on how to resolve it.
There were obviously some big decisions made about delivering .net Framework package updates through NuGet. Was the question ever asked; "What happens if we ship something from NuGet that is inherently incompatible with the OOB assembly?"
Hindsight shows us that this is actually a pretty easy problem to reproduce. What kind of testing was done before shipping?
There was a huge lead time between the issue getting reported and being recognized as significant. What is the process for traiging issues that come through github?
Once the gravity of the problem was recognized, it's obvious that immediate and decisive action should have been taken, such as gathering those with the authority to approve resources and gathering the best engineers to design and implement a solution. What is the process for escalating an issue in a .net deliverable to get more resources and attention? |
Thanks @jahmai for the pings. I have started putting the draft from my head into written form on Friday. I expect to review it and publish it by end of week. Your questions [2]-[5] are aligned with the post-mortem plan/scope as outlined in top post. It is the "looking back at what happened and why, and how to make it better in future". Your question [1] is already addressed by using the fixed version of System.Net.Http (4.3.1+) NuGet package. |
First, I want to apologize for the additional delay of the post-mortem after 2.0 shipped, very sorry for that, I am to blame for the delay. A little bit of history firstAt the time we introduced System.Net.Http 4.1.0 NuGet package in 2016/6, we wanted to deliver additional value to .NET Framework developers. The new implementation of HttpClient on top of WinHTTP we had in .NET Core/CoreFX was in many ways better and superior to the "older one" in .NET Framework (based on HTTP stack). There were perf gains and feature gains (http2 support) in the "new one". We wanted to deliver the value also to .NET Framework developers, without the need for them to wait for a new .NET Framework update (incl. the delay of the .NET Framework update being deployed to end customers at larger scale). We call it out-of-band (OOB) package, and we believed back then that we can make it work and deliver value to customers (developers) directly and much faster than in the cycle of .NET Framework updates being deployed to the customers of our customers. The problem with System.Net.Http 4.1.0-4.3.0 OOB package was that when it is used together with System.Net.Http.WebRequest (which is part of .NET Framework and used indirectly by ASP.NET and many popular Azure APIs), the user will get an exception at the runtime: So how did the issue slip through into release?Here are key contributing factors:
We didn't realize the dependency danger during development of System.Net.Http OOB package. Our test coverage was focused on component testing of System.Net.Http on .NET Framework (see [2]). Those tests had no chance to uncover the problem we introduced (see [3]). How to prevent such situation in future?There were several results of this issue:
Why did it take 6 months to fix?Here's brief recap of the issue's history: (with +?m as time from the issue creation in months)
So where did most of the time go?
Overall 6 months breakdown:
Note: The post-mortem above is not trying to hide or marginalize any impact, shift blames, or point fingers. It's an honest attempt to reiterate what happened when and why, with focus on future -- how to improve process and engineering practices to streamline similar events in future. |
Thanks for detailed writeup! I ran into the issue in its early days and would like to point out one thing that particularly frustrated me: it was not possible to understand what was happening - how exactly installing a NuGet package update can break something seemingly unrelated. Even after having read the threads on the topic, I still only have a vague understanding of the technical details. This caused frustration because it was not even possible to effectively experiment with it, beyond simply stating that a problem exists and hoping someone from Microsoft took an interest. Similar issues where "the post-net46 universe" does something illogical without presenting any no obvious investigative threads to even start unraveling exist even today (NuGet/Home#5812 for one example out of several that still impact me). In such a case, I would expect Microsoft to respond promptly to issues and, if not provide a resolution, at least maintain a steady dialogue to attempt to diagnose it (I am fine with being asked to experiment but I am not fine with a month of silence in what I consider a blocking issue). It feels like receiving a bouncy ball promised by Microsoft documentation to bounce most excellently and then finding it is in fact a puddle of water that does not bounce at all and that you will be returning to the merchant as part of some Monty Python sketch. Especially for a new product like .NET core, with new tooling and with complex interactions, I expect greater engagement from Microsoft in helping people diagnose and report such issues. Right now there are still many broken things with the tooling when trying to use .NET Core/Standard, which get barely any comments on GitHub or the VS issues portal, that it takes some real motivation to even report them as I do not feel that there is a meaningful dialogue when I do so. The linked issue above indicates a problem with installing a very common library, a problem that only occurs when .NET Standard libraries are in the mix, and a problem that flat out prevents one from using that library. For the package manager component of VS to not be able to install seemingly perfectly valid packages and for it to give a clearly invalid error message to explain the failure boggles the mind as much as does the fact that this issue is assigned to "Backlog" without obvious signs at investigation and will probably not even be fixed. I continue to avoid .NET Core and .NET Standard due to the number of such issues I encounter and due to a very underwhelming response when I do report them. This is despite the fact that both .NET Core and .NET Standard would (if I could use them) satisfy a lot of my business needs. Now I realize that most of this is not really your area, @karelz - issues these days are generally with bad tooling - but it is hard to see different parts of the VS/.NET ecosystem as separate from the emotional standpoint. It is all just a big pile of stuff I need and pay a lot of MSDN subscription money for in order to do my job. |
This cannot be emphasized enough. |
(Side note; there needs to be some consistency for terminology. Some issues use OOB as out-of-box, and some use it to mean out-of-band, which in the context of these issues mean completely the opposite packages). First of all, thank you @karelz for finally taking the time to write that out, I do not know why you are in the unenviable position to take that responsibility, but you have fielded the varying levels of customer feedback (constructive through outraged) with a level of grace and professionalism and I thank you for that. However, I believe that given the timeline highlighted, this break down could have been done 6 months ago, and I have my doubts that people monitoring this thread will be nearly as satisfied with the response now that so much time has passed (I know I am not). The priority between shipping the new-hot-thing vs addressing grave customer concerns raised since shipping the broken last-hot-thing could perhaps be considered part of "the problem" at Microsoft right now. I also think the timeline should account for #17770 and #17786 (reported early July) as the writing was already on the wall that there were fundamental incompatibilities between out-of-band and out-of-box versions of the assembly, so even at that time, someone should on the team should have been prepared to test the combination of the two prior to shipping subsequent versions of the package. There was obviously a communication breakdown with customers which needs improvement, but I want to highlight that good communication needs to be accompanied with appropriate action. All the status updates in the world are meaningless unless they are reporting on, or setting the expectations for, acceptable progress. As a founder of a company that has been using the latest (released/final) Microsoft products for the past 6 years I have to say the last 18-24 months stands out as being an extremely tumultuous and painful time to be a developer on Microsoft platforms. I don't think it's appropriate to list out all of the separate incidents of pain my team experienced over that period (I don't want this issue to be the flash-point of a dozen Microsoft failures) but what I do want to communicate is that this issue was a significant blow to mine and my teams confidence in Microsoft's ability to deliver quality, and that our attitude regarding big platform changes has shifted from excitement to skepticism and anxiety. I will say to Microsoft's credit, that the transition from netstandrd1.x to netstandard2.0 for us was fairly trivial and incident free, which is exactly how it should be, but there is still a lot of work to do to earn back that trust for our team. |
Thanks for the detailed post-mortem, it definitely helps understanding the process and the learning. A point that I'd like to address: You seem to have improved the testing of OOB packages in .NET Framework scenarios, which is good. However, it seems that the other way around - impact of .NET Framework updates when consuming OOB libraries - seems to be missing a few tests scenarios. For example in 4.7.1, the following issues surfaces:
@karelz I'm just wondering if something could be done to prevent or reduce such issues in future updates. .NET Framework used to be a highly compatible upgrade, but the introduction of OOB packages seems to have complicated this a lot. I fully understand that it is very hard to test these scenarios as they are quite specific to the combination of "fx version built for"/"OOB package version"/"platform run on"/"tools used to build". |
In effect, System.ValueTuple is a new OOB package. It was built as a package first, then incorporated into the BCL.. which is also the plan of record for many other apis. Will this keep happening? It's not really reassuring to hear that there are now OOB test runs but they The postmortem is appreciated, but frankly it indicates that these process issues are not solved. We've gone from nuget updates breaking desktop apps to framework auto-updates breaking desktop apps, which if anything is worse. |
Perhaps people who are clearly knowledgeable such as @onovotny, who raised the issue, should have the privileges to tag particular issues as serious and urgent. Just reading the first sentence of the issue tells me that it was a significant one, but I understand that the triage needs to consider who is raising and whether they have enough reputation. Otherwise they might just be somebody inexperienced who is doing something silly. My experience with the current VS tooling and all its issues is similar to others and it does 'waste hours' for me. The number of open issues against NuGet for example is crazy! |
Just need to throw out there that "We didn't test things" is not a great reason to force us back to the inbox assemblies. Now we appear to be (and correct me if I'm wrong) stuck in this limbo where installing a package with a dependency on one of these packages is a crapshoot. Maybe the library depends on bugfixes in the package, maybe it works with the inbox, we won't know until runtime when everything fails spectacularly on an edge case and we're up at 0200 after production goes down in flames. |
Followup. My issue was that I did not even realise I was on older tooling on some older ASP.NET projects. Fix: |
The System.Net.Http package on NuGet was put there as an OOB update where Microsoft also tried to add extra functionality for NET Framework developers. This experiment didn't work, but Microsoft didn't want to pull the package from NuGet as people would get strange errors. Microsoft has written a post-mortem on GitHub, ref https://github.com/dotnet/corefx/issues/17522#issuecomment-338418610 NOTE: I also had to remove NETStandard.Library from NET Framework as that depends on System.Net.Http from NuGet.
The System.Net.Http package on NuGet was put there as an OOB update where Microsoft also tried to add extra functionality for NET Framework developers. This experiment didn't work, but Microsoft didn't want to pull the package from NuGet as people would get strange errors. Microsoft has written a post-mortem on GitHub, ref https://github.com/dotnet/corefx/issues/17522#issuecomment-338418610 NOTE: I also had to remove NETStandard.Library from NET Framework as that depends on System.Net.Http from NuGet.
The System.Net.Http package on NuGet was put there as an OOB update where Microsoft also tried to add extra functionality for NET Framework developers. This experiment didn't work, but Microsoft didn't want to pull the package from NuGet as people would get strange errors. Microsoft has written a post-mortem on GitHub, ref https://github.com/dotnet/corefx/issues/17522#issuecomment-338418610 NOTE: I also had to remove NETStandard.Library from NET Framework as that depends on System.Net.Http from NuGet.
Issue #18280 caused a lot of problems for a long time. The overall road towards solution (fix in a new NuGet package 4.3.1) was less than ideal.
Let's track the post-mortem here (as initiated in https://github.com/dotnet/corefx/issues/11100#issuecomment-281827797):
High-level plan to cover:
See the writeup in https://github.com/dotnet/corefx/issues/17522#issuecomment-338418610
The text was updated successfully, but these errors were encountered: