Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

goroutines and tcp errors #723

Closed
Virtual-Machine opened this issue Mar 15, 2018 · 4 comments · Fixed by #1013
Closed

goroutines and tcp errors #723

Virtual-Machine opened this issue Mar 15, 2018 · 4 comments · Fixed by #1013

Comments

@Virtual-Machine
Copy link

Full disclosure I may have something misconfigured with postgres.

max_connections on db = 1000

My understanding is that once you get the db object from the db.Open() that it is a connection pool, and it can be safely used inside goroutines for concurrent access.

However the below minimal example does one of 3 things on my machine.

  1. Very rarely it runs to completion as expected.
  2. Most often it fails with : "read tcp [::1]:port->[::1]:5432: read: connection reset by peer"
  3. Rarely it fails with "write tcp [::1]:port->[::1]:5432: write: broken pipe"

Lowering the goroutines int variable increases the reliability, raising it makes failure a guarantee.
The number of records that prints to the terminal before an error changes each time I run a given binary.

package main

import (
	"database/sql"
	"fmt"
	_ "github.com/lib/pq"
	"os"
)

func main() {
	// Connect to db
	connStr := "postgres://@localhost/test?sslmode=disable"
	db, err := sql.Open("postgres", connStr)
	if err != nil {
		panic(err.Error())
	}
	defer db.Close()

	err = db.Ping()
	if err != nil {
		panic(err.Error())
	}

	goroutines := 300

	// Channel to syncronize goroutines
	channel := make(chan int, goroutines)

	// Generate goroutines to select records and interate them
	for i := 0; i < goroutines; i++ {
		go func(j int) {
			rows, err := db.Query("select name, age from contacts where age = $1", j)
			if err != nil {
				fmt.Println(err)
				os.Exit(-1)
			}
			var name string
			var age int
			for rows.Next() {
				err := rows.Scan(&name, &age)
				if err != nil {
					fmt.Println(err)
					os.Exit(-1)
				}
			}
			fmt.Println(name, age)
			err = rows.Err()
			if err != nil {
				fmt.Println(err)
				os.Exit(-1)
			}
			rows.Close()
			channel <- 0
		}(i)
	}

	// Wait for goroutines to finish
	for i := 0; i < goroutines; i++ {
		<-channel
	}
}
@cbandy
Copy link
Contributor

cbandy commented Mar 20, 2018

When it fails, what is in the PostgreSQL (server) logs?

@Virtual-Machine
Copy link
Author

Let me know if you want me to enable more detailed logging or enable another logging option, this is what I see in the log around the time of a pipe breaking:

2018-03-20 14:02:18.853 EDT [16753] LOCATION:  shmem_exit, ipc.c:259
2018-03-20 14:02:18.853 EDT [16753] DEBUG:  00000: proc_exit(-1): 0 callbacks to make
2018-03-20 14:02:18.853 EDT [16753] LOCATION:  proc_exit_prepare, ipc.c:188
2018-03-20 14:02:18.853 EDT [16504] DEBUG:  00000: forked new backend, pid=16756 socket=12
2018-03-20 14:02:18.853 EDT [16504] LOCATION:  BackendStartup, postmaster.c:4099
2018-03-20 14:02:18.853 EDT [16754] DEBUG:  00000: CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0
2018-03-20 14:02:18.853 EDT [16754] LOCATION:  ShowTransactionStateRec, xact.c:5020
2018-03-20 14:02:18.854 EDT [16754] LOG:  08006: could not send data to client: Broken pipe
2018-03-20 14:02:18.854 EDT [16754] LOCATION:  internal_flush, pqcomm.c:1464
2018-03-20 14:02:18.854 EDT [16754] FATAL:  08006: connection to client lost

@dharmjit
Copy link

Hi, I am also facing the same issue. Random broken pipe errors. Is there any work around for this.

@gnuletik
Copy link

@dharmjit See #871 (comment) for workarounds.

roylee17 added a commit to roylee17/sqlx that referenced this issue Mar 21, 2021
    I'm seeing "broken pipe" errors when working with CRDB using sqlx.
    The issue seemed to be the tcp connections were diconnected while the conns
    in db driver (pq) still has stale connection.

    It happens more often when the DB is behind a proxy.
    In our cases, the pods were proxied by the envoy sidecar.

    There were other instances on the community reporting similar issues,
    and took different workaround by sebding perodic dummy queries in app
    mimicing keepalive, enlenghthen proxy idle timeout, or shortening the
    lifetime of db conn.

    This has been reported and fixed by the lib/pq upstream in v1.9+

      lib/pq#1013

      lib/pq#723
      lib/pq#897
      lib/pq#870

      grafana/grafana#29957
bonzofenix pushed a commit to cloudfoundry/app-autoscaler that referenced this issue Sep 24, 2021
To consume fix for lib/pq#723
as it has the same symptom as issues we have seen on Azure DB for PostgreSQL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants