Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute the MongoDB Connection Health Check on Startup #45482

Closed
Inithron opened this issue Jan 9, 2025 · 11 comments · Fixed by #45523
Closed

Execute the MongoDB Connection Health Check on Startup #45482

Inithron opened this issue Jan 9, 2025 · 11 comments · Fixed by #45523
Labels
area/health kind/bug Something isn't working
Milestone

Comments

@Inithron
Copy link

Inithron commented Jan 9, 2025

Description

It would be great if the MongoDB Connection Health Check could be improved.

Current behavior

After startup following health check is displayed:

{
    "status": "UP",
    "checks": [
        {
            "name": "MongoDB connection health check",
            "status": "UP"
        }
    ]
}

But this status is misleading / wrong. Even if no database is available, the status is UP. Only when the application tries to store the first message in the DB, the status goes to down:

{
    "status": "DOWN",
    "checks": [
        {
            "name": "MongoDB connection health check",
            "status": "DOWN",
            "data": {
                "<default>": "KO, reason: Timed out while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[{address=localhost:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused: no further information}}]",
                "<default-reactive>": "KO, reason: null"
            }
        }
    ]
}

Improvement

It would be great, if the connection to the DB would be checked on startup and not when the application tries to store the first message in the DB.

Benefit

With this changed approach the pod in Kubernetes would never get green if there is a wrong connection string or the DB is down.

Implementation ideas

No response

Copy link

quarkus-bot bot commented Jan 9, 2025

/cc @geoand (kubernetes), @iocanel (kubernetes), @jmartisk (health), @loicmathieu (mongodb), @xstefank (health)

@geoand
Copy link
Contributor

geoand commented Jan 9, 2025

Makes sense IMO

@geoand
Copy link
Contributor

geoand commented Jan 10, 2025

Looking at the code, it should work as expected.

What version of Quarkus are you using?

@geoand geoand added the triage/needs-feedback We are waiting for feedback. label Jan 10, 2025
@Inithron
Copy link
Author

Inithron commented Jan 12, 2025

I am using 3.17.6
Here is a small reproducer with a unit tests showing the mentioned behavior:

reproducer-for-45482.zip

@geoand geoand removed the triage/needs-feedback We are waiting for feedback. label Jan 13, 2025
@geoand
Copy link
Contributor

geoand commented Jan 13, 2025

Thanks!

@geoand
Copy link
Contributor

geoand commented Jan 13, 2025

#45523 fixes the issue

@geoand geoand added kind/bug Something isn't working and removed kind/enhancement New feature or request area/smallrye area/mongodb area/kubernetes labels Jan 13, 2025
geoand added a commit that referenced this issue Jan 13, 2025
@quarkus-bot quarkus-bot bot added this to the 3.18 - main milestone Jan 13, 2025
@gsmet gsmet modified the milestones: 3.18 - main, 3.17.7 Jan 14, 2025
@Inithron
Copy link
Author

Hi @geoand,
I tested the fix with 3.17.8 but it still seems not to be correct. When I execute q/health it shows now the correct status ("DOWN" if no DB is available). But now each call of q/health takes 30 seconds (which is the default value for mongodb.server-selection-timeout ). But the default timeout in Kubernetes for the readiness and liveness check is 1 second, see timeoutSeconds. So from my point of view q/health should not block the caller. Maybe a background task is necessary to check the health of the DB regularly wich caches the status. And if q/health is called, the cached value can be returned. What do you think about this?

@geoand
Copy link
Contributor

geoand commented Jan 27, 2025

I would need a sample application that behaves as you describe to be able to figure out what's going on

@Inithron
Copy link
Author

I updated the reproducer to version 3.17.8 and disabled #quarkus.mongodb.server-selection-timeout=1 so that the default timeout is used:
reproducer-for-45482_2.zip

  1. After starting the application you can see in the logs nothing is happening (no logs about MongoDB).
  2. Wenn you now open http://localhost:8080/q/health/ready it took round about 30 seconds until this page is loaded.
  3. After the call http://localhost:8080/q/health/ready you can see the first time logs from MongoDB (INFO [org.mon.dri.client] (vert.x-worker-thread-1) MongoClient with metadata [...] and the error: ERROR [io.qua.mut.run.MutinyInfrastructure] (executor-thread-1) Mutiny had to drop the following exception: com.mongodb.MongoTimeoutException [...])
  4. Every refresh in the browser or new call to http://localhost:8080/q/health/ready took again 30 seconds.
  5. http://localhost:8080/q/health/live is loaded without noticeable delay. I expect the same for http://localhost:8080/q/health/ready even if no DB is available.

@geoand
Copy link
Contributor

geoand commented Jan 28, 2025

@xstefank please take a look at ^

@xstefank
Copy link
Member

Hi @Inithron, I understand what you are after but the problem described in this issue is really fixed by @geoand's PR. Basically, the Mongo health check is designed in this way to wait for the timeout to run out. But personally I don't see anything wrong with your idea, I just moved it to a new issue, since it's really a new feature - #45924. If no one will object, I can implement it.

@jmartisk jmartisk modified the milestones: 3.17.7, 3.15.4 Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants