-
-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent 504s and Poor Uptime on Docker Compose deployments #821
Comments
Services go down and come back up in a few minutes all throughout the day, it's tanked uptime to 30%. Forgejo Docker compose:
Ghost Docker compose
Umami:
These are the docker composes for affected services |
I believe it's an issue with Traefik, I can access the port-forwarded services (for example, Umami is forwarded to 4999 on the host and stats.towu.dev via traefik). When stats.towu.dev is down, I can still access host:4999 to see Umami, so I'm pretty confident it's a proxy issue. Something peculiar, while all the affected compose services go down at the same time (Ghost, Umami, and Forgejo). Other compose projects, like Immich, don't go down at all. Immich is a photo-management app which has a website as a part of the dockercompose, like the other services. Immich (no dowmtime) Dockercompose
|
Immich has downtime as well. Related documentation, https://docs.dokploy.com/docs/core/troubleshooting#docker-compose-domain-not-working version: '3'
services:
umami:
image: ghcr.io/umami-software/umami:postgresql-latest
...
expose:
- 3000
ports:
- - 4999:3000
+ - 3000
networks:
- default
db:
image: postgres:15-alpine
...
networks:
- default
networks:
default:
driver: bridge I'm trying this just to check, I need the ports forwarded as I can't upload large files through the cloudflare-proxied domain for Immich, for example. |
I know what could be the error, currently there is a very rare bug related to docker compose, if you use the name of a duplicate service in several places it is possible that the information is mixed somehow, I have not yet found a solution to this problem, my suggestion would be, change the name of the service services:
db:
..... to something like this services:
ghost-db:
..... |
I've updated my services to use prefixed names, I guess that's what the randomize compose is for. Is there anything I can do to provide some more insight? Traefik logs, if you lmk how I can get em. (docker logs would be enough?) Likely related, umami-software/umami#3080 (reply in thread) - I believe another service was attempting to access Umami's database, leading to that error. |
Oh, is it because all the containers are part of the |
@Siumauricio I updated the services to have unique names and rebuilt the project
|
the problem still persists? |
Experiencing very similar issues. I'm also using the cloud hosted version of dokploy instead of self hosted because I thought that might have been why. After doing some digging It's definitely the reverse proxy stuff. |
My server was down 5 min ago. I am not monitoring but I assume this is still a issue. |
@Siumauricio This issue is causing me a lot of trouble, is there anything I can do to help? |
Yes I definitely think it is a bug in docker at the network level, I think we must find a solution to this problem because currently we can not have 2 instances of the same template because sometimes it causes the information to be mixed which is a very strange behavior, I will investigate in more detail how to solve this, the idea would be to isolate the docker compose in a separate network. |
@Siumauricio I tried the fix in #1004 (randomize compose names) and the uptime hasn't improved at all. This issue is urgent and affecting my users. Broken networking is a dealbreaker, is there anything else I can try? Last ditch effort would be disabling Traefik and using a reverse proxy on host networking, or moving to another platform - which is a huge effort. Are there any blockers for this issue? Any logs or information you need? Anything? |
I'm having similiar problems - randomizing compose names also didn't fix it for me. I also suspect it has something to do with the same internal port which is published from similiar services/containers on the same Feel free to ping me as well if I can provide any logs, information or test something helpful to this issue. Read more...I have 2 different dokploy projects on one server - each containing 2 docker compose services. For example one docker compose is `nextcloud + mariadb + redis`. I get this problem despite the nextcloud webserver images having different docker image tags/versions in both projects. Whenever I deploy the service from the second project, the container of the first project is not reachable anymore with traefik error page "404 page not found". I also defined a custom-named network for each service (docker compose), so the database and webserver in a single docker compose can communicate:
There are also many other services running on my single server which are working fine and seem not to be affected by these beforementioned problamatic deployments. |
FYI: Our current workaround is to set the port of the application/webserver itself inside the container to something different for each service. So the similiar webservers which would normally all listen on port 80 now listen on 81, 82, 83, ... |
Something else I noticed, whenever the services are unreachable, I'm unable to view logs from the dokploy dashboard, it's just empty. The logs load when the service is available via the domain, which is weird, because it's reachable through port mappings regardless |
To Reproduce
This isn't an issue with UptimeKuma, because there are long periods of inactivity on my statistics as well.
Uptime stats with large blocks of empty:
Before moving composes to Dokploy
During a "downtime",
Current vs. Expected behavior
Provide environment information
Which area(s) are affected? (Select all that apply)
Docker Compose
Are you deploying the applications where Dokploy is installed or on a remote server?
Same server where Dokploy is installed
Additional context
This doesn't happen when deploying on the host system without Dokploy, circumventing traefik.
Will you send a PR to fix it?
Maybe, need help
The text was updated successfully, but these errors were encountered: