Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup restore loop: "Could not get a connection to the database:" #200

Open
miracle2k opened this issue Aug 1, 2017 · 0 comments
Open

Comments

@miracle2k
Copy link

miracle2k commented Aug 1, 2017

This is the same behaviour as in: #123 The database container never comes up completely to pass a health check. It tries to restore from S3, and then fails.

The database image is quay.io/deis/postgres:v2.5.4 on AWS (I previously had the same problem on Azure).

.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.wal_e.blobstore.s3.s3_util WARNING  MSG: WALE_S3_ENDPOINT defined, ignoring AWS_REGION
        HINT: AWS_REGION is only intended for use with AWS S3, and not interface-compatible use cases supported by WALE_S3_ENDPOINT
        STRUCTURED: time=2017-08-01T17:23:59.161533-00 pid=2454
FATAL:  the database system is starting up
.wal_e.operator.backup INFO     MSG: promoted prefetched wal segment
        STRUCTURED: time=2017-08-01T17:24:00.068839-00 pid=2454 action=wal-fetch key=s3://k3-database/wal_005/0000000A0000001C000000DF.lzo prefix= seg=0000000A0000001C000000DF
FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
.FATAL:  the database system is starting up
. stopped waiting
server is still starting up
Performing an initial backup...
LOG:  restored log file "0000000A0000001C000000DF" from archive
wal_e.main   INFO     MSG: starting WAL-E
        DETAIL: The subcommand is "backup-push".
        STRUCTURED: time=2017-08-01T17:25:02.573642-00 pid=2534
wal_e.blobstore.s3.s3_util WARNING  MSG: WALE_S3_ENDPOINT defined, ignoring AWS_REGION
        HINT: AWS_REGION is only intended for use with AWS S3, and not interface-compatible use cases supported by WALE_S3_ENDPOINT
        STRUCTURED: time=2017-08-01T17:25:14.049587-00 pid=2534
wal_e.blobstore.s3.s3_util WARNING  MSG: WALE_S3_ENDPOINT defined, ignoring AWS_REGION
        HINT: AWS_REGION is only intended for use with AWS S3, and not interface-compatible use cases supported by WALE_S3_ENDPOINT
        STRUCTURED: time=2017-08-01T17:25:19.008224-00 pid=2536
FATAL:  the database system is starting up
psql: FATAL:  the database system is starting up
wal_e.main   ERROR    MSG: could not run one or more external programs WAL-E depends upon
        DETAIL: Could not get a connection to the database: note that superuser access
        is required
        Could not run the following programs, are they installed? psql
        STRUCTURED: time=2017-08-01T17:25:20.904420-00 pid=2534
wal_e.operator.backup INFO     MSG: promoted prefetched wal segment
        STRUCTURED: time=2017-08-01T17:25:22.171869-00 pid=2536 action=wal-fetch key=s3://k3-database/wal_005/0000000A0000001C000000E0.lzo prefix= seg=0000000A0000001C000000E0
LOG:  restored log file "0000000A0000001C000000E0" from archive

Seems to me based on those last lines is that it's still downloading backup files, and there is some sort of timeout inside postgres triggering. This happens after about half an hour.

It also doesn't seem right that a restore takes so long, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant