-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve test stability issues caused by resources failing to start due to "address in use" errors #6678
Comments
Is there something to be done about the "address in use" problem in general? (Or more specifically, Aspire being bad at ending its child processes). I've had to resort to creating a script that first executes |
The challenge is always that it's inherently racy if you want to know what the port is before you assign it to the process. Also it's not clear when it does fail if it's due to some other random process using the port or a previous instance of something launched by Aspire that wasn't shut down properly, or even random collisions in situations like parallel startup. I think adding some kind of support for startup retries could help here. I might attempt to see if I can add some startup-failure detection and retry logic to the testing infrastructure in dotnet/aspire-samples at the app host level, e.g. when a resource state changes to FailedToStart, scrape its logs and look for "port/address already in use" messages and in those cases, issue a "Start" command.
This was improved in between 9.0.0-rc.1 and 9.0.0, so I'd love to hear if you're seeing any improvement after updating to the latest version.
For sure, this is something we're thinking of improving but likely more in the mid-to-long term rather than short term. |
This issue is not just a "test stability issue" Aspire 9 seems to have lots of problems stopping projects. In Aspire 8, I routinely started and stopped solutions without issue. In Aspire 9 on OSX with Podman Desktop and the Rider Aspire plugin, about half the time there are stuck ports preventing Aspire from starting. |
@karolz-ms @danegsta FYI @yoDon can you try seeing what it's like without using the Rider plug-in, i.e. starting/stopping from the cmd line using |
@DamianEdwards I'm also mentioning a non-test-related issue that I reported around this, incase it's helpful to connect the two: #6704 |
Fixed by #7098 |
Found in the dotnet/aspire-samples tests (e.g. this CI failure).
Sometimes when running app host integration tests using
DistributedApplicationTestingBuilder
, resources can fail to start due to the port that was randomly assigned by DCP already being in use by the time the resource goes to start:This results in a test that fails, but that will most likely pass on re-run. We should look at how we can improve this so that tests are more reliable, e.g. doing automatic retries when resources fail to start.
The text was updated successfully, but these errors were encountered: