Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error driver: bad connection #213

Closed
y761350477 opened this issue Sep 22, 2019 · 23 comments
Closed

Error driver: bad connection #213

y761350477 opened this issue Sep 22, 2019 · 23 comments
Assignees

Comments

@y761350477
Copy link

  1. I'll be prompted when the table I'm querying doesn't exist: Table xxx doesn't exist.
  2. When the table I'm requesting again exists, I prompt driver: Bad connection problem
    Use in a go web environment
@pablote
Copy link

pablote commented Sep 23, 2019

I’m getting the same problem. In general, whenever a query fails, for whatever reason, that seems to leave the connection in a bad state, and the next one will always fail with “bad connection”.

After that, everything starts working ok.

@kshvakov kshvakov self-assigned this Sep 23, 2019
@johnatannvmd
Copy link

johnatannvmd commented Dec 11, 2019

Got this error on "columnar" type of insertion if I try to use the same connection with OpenDirect. Something like that(not real code, just to show the idea):

c := clickhouse.OpenDirect("tcp://127.0.0.1:9000?debug=true&database=databse&compress=1")
for i := 0; i < 2; i++ {
    c.Begin()
    c.Prepare(stmt)
    block, _ := c.Block()
    block.NumRows(100)
    block.Reverse()
    for _, row := range rows {
        block.Writes...
    }
    c.WriteBlock(block)
    c.Rollback()
}

On the second iteration I got error on c.Begin() line. If I "catch" the error and reopen connection with opendirect function, everything is continue to work.

2019.12.11 20:55:29.409365 [ 49 ] {460e6c16-c4f6-4296-8ab8-98395f3997c5} <Error> executeQuery: Code: 32, e.displayText() = DB::Exception: Attempt to read after eof (version 19.17.4.11 (official build)) (from 192.168.38.105:59844) (in query: INSERT INTO samples (`date`,`time`,`device`,`src`,`dst`,`srcASN`,`dstASN`,`proto`) VALUES ), Stack trace:

0. 0x55fa1af89f60 StackTrace::StackTrace() /usr/bin/clickhouse
1. 0x55fa1af89d35 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
2. 0x55fa1aad043f DB::throwReadAfterEOF() /usr/bin/clickhouse
3. 0x55fa1e1eab1b ? /usr/bin/clickhouse
4. 0x55fa1e1ed6a1 DB::NativeBlockInputStream::readImpl() /usr/bin/clickhouse
5. 0x55fa1e1e001a DB::IBlockInputStream::read() /usr/bin/clickhouse
6. 0x55fa1b010590 DB::TCPHandler::receiveData(bool) /usr/bin/clickhouse
7. 0x55fa1b011029 DB::TCPHandler::receivePacket() /usr/bin/clickhouse
8. 0x55fa1b01120e DB::TCPHandler::readDataNext(unsigned long const&, int const&) /usr/bin/clickhouse
9. 0x55fa1b01179e DB::TCPHandler::readData(DB::Settings const&) /usr/bin/clickhouse
10. 0x55fa1b011a11 DB::TCPHandler::processInsertQuery(DB::Settings const&) /usr/bin/clickhouse
11. 0x55fa1b013045 DB::TCPHandler::runImpl() /usr/bin/clickhouse
12. 0x55fa1b0132cb DB::TCPHandler::run() /usr/bin/clickhouse
13. 0x55fa1ef0f660 Poco::Net::TCPServerConnection::start() /usr/bin/clickhouse
14. 0x55fa1ef0fd7d Poco::Net::TCPServerDispatcher::run() /usr/bin/clickhouse
15. 0x55fa205e5ef1 Poco::PooledThread::run() /usr/bin/clickhouse
16. 0x55fa205e3c9c Poco::ThreadImpl::runnableEntry(void*) /usr/bin/clickhouse
17. 0x55fa20d59e00 ? /usr/bin/clickhouse
18. 0x7f69ede74e65 start_thread /usr/lib64/libpthread-2.17.so
19. 0x7f69ed79188d __clone /usr/lib64/libc-2.17.so

Long story, but on 19.14.6.12-2 same application works perfectly.

@johnatannvmd
Copy link

johnatannvmd commented Dec 11, 2019

Test case for clickhouse_columnar_test.go

func Test_ColumnarInsertTwicely(t *testing.T) {
	const (
		ddl = `
			CREATE TABLE clickhouse_test_columnar_insert (
				uint64 UInt64,
				string  String,
				date    Date,
				datetime   DateTime
			) Engine=Memory
		`
		dml = `
			INSERT INTO clickhouse_test_columnar_insert (
				uint64,
				string,
				date,
				datetime
			) VALUES (
				?,
				?,
				?,
				?
			)
		`
	)
	if connect, err := clickhouse.OpenDirect("tcp://127.0.0.1:9000?debug=true"); assert.NoError(t, err) {
		{
			connect.Begin()
			stmt, _ := connect.Prepare("DROP TABLE IF EXISTS clickhouse_test_columnar_insert")
			stmt.Exec([]driver.Value{})
			connect.Commit()
		}
		{
			if _, err := connect.Begin(); assert.NoError(t, err) {
				if stmt, err := connect.Prepare(ddl); assert.NoError(t, err) {
					if _, err := stmt.Exec([]driver.Value{}); assert.NoError(t, err) {
						assert.NoError(t, connect.Commit())
					}
				}
			}
		}
		{
			if _, err := connect.Begin(); assert.NoError(t, err) {
				if _, err := connect.Prepare(dml); assert.NoError(t, err) {
					block, err := connect.Block()
					assert.NoError(t, err)
					block.Reserve()
					block.NumRows = 100

					for i := 0; i < 100; i++ {
						block.WriteUInt64(0, uint64(i))
						block.WriteString(1, "string")
						block.WriteDate(2, time.Now())
						block.WriteDateTime(3, time.Now())
						if !assert.NoError(t, err) {
							return
						}
					}
					assert.NoError(t, connect.WriteBlock(block))
					assert.NoError(t, connect.Rollback())
				}
			}
		}
		{
			if _, err := connect.Begin(); assert.NoError(t, err) {
				if _, err := connect.Prepare(dml); assert.NoError(t, err) {
					block, err := connect.Block()
					assert.NoError(t, err)
					block.Reserve()
					block.NumRows = 100

					for i := 0; i < 100; i++ {
						block.WriteUInt64(0, uint64(i))
						block.WriteString(1, "string")
						block.WriteDate(2, time.Now())
						block.WriteDateTime(3, time.Now())
						if !assert.NoError(t, err) {
							return
						}
					}
					assert.NoError(t, connect.WriteBlock(block))
					assert.NoError(t, connect.Commit())
				}
			}
		}
	}
}

With same error in CH:

2019.12.11 21:27:28.230137 [ 80 ] {4eb8cb2c-9e63-45db-9b19-b6b0d1b4afac} <Error> executeQuery: Code: 32, e.displayText() = DB::Exception: Attempt to read after eof (version 19.17.2.4 (official build)) (from 172.18.0.1:58282) (in query: SELECT 1, sleep(2)), Stack trace:

0. 0x557ae5b8bc40 StackTrace::StackTrace() /usr/bin/clickhouse
1. 0x557ae5b8ba15 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /usr/bin/clickhouse
2. 0x557ae56d2dbf DB::throwReadAfterEOF() /usr/bin/clickhouse
3. 0x557ae5c15cfb DB::readVarUInt(unsigned long&, DB::ReadBuffer&) /usr/bin/clickhouse
4. 0x557ae5c0f0ce DB::TCPHandler::isQueryCancelled() /usr/bin/clickhouse
5. 0x557ae5c114d7 DB::TCPHandler::processOrdinaryQuery() /usr/bin/clickhouse
6. 0x557ae5c14a8e DB::TCPHandler::runImpl() /usr/bin/clickhouse
7. 0x557ae5c14c7b DB::TCPHandler::run() /usr/bin/clickhouse
8. 0x557ae9b0d790 Poco::Net::TCPServerConnection::start() /usr/bin/clickhouse
9. 0x557ae9b0dead Poco::Net::TCPServerDispatcher::run() /usr/bin/clickhouse
10. 0x557aeb1e4021 Poco::PooledThread::run() /usr/bin/clickhouse
11. 0x557aeb1e1dcc Poco::ThreadImpl::runnableEntry(void*) /usr/bin/clickhouse
12. 0x557aeb957f40 ? /usr/bin/clickhouse
13. 0x7f3c6e7df6db start_thread /lib/x86_64-linux-gnu/libpthread-2.27.so
14. 0x7f3c6e0fc88f __clone /lib/x86_64-linux-gnu/libc-2.27.so

@johnatannvmd
Copy link

Holly molly, it's here 204d131#diff-17703cbd24e279392c28017efd531a45R163

Cleanup blocks and buffers are okay, but why close connection? And why I'm able to send data after connection is closed?

@kshvakov
Copy link
Collaborator

@johnatannvmd ClickHouse doesn't support the real transactions, begin/commit/rollback - is a transaction emulation. The connection is closed to prevent ClickHouse corruption (after error driver can send incorrect blocks).
It's normal, but connection must be removed from the connection pool

@qJkee
Copy link

qJkee commented Dec 27, 2019

Same here.
Getting strange error, which occurs on some writes with driver: bad connection.
@kshvakov according to your answer above, the problem is that sql package in golang storing this connection in the pool after usage, instead of closing it(but its a pool =)).
So, should I force to close the connection after usage manually, and open a new connection for each request?

@cameronbraid
Copy link

I'm having similar issues

2020/02/10 03:02:55 Unable to prepare insert statement : driver: bad connection

Is there a known workaround ?

@qJkee
Copy link

qJkee commented Feb 12, 2020

@cameronbraid
use sql.DB.Acquire() and Release() methods to work with connection directly.
Acquire will open a new connection if there are no live connections(or with bad connection) available

@pablote
Copy link

pablote commented Feb 17, 2020

@qJkee I think that working with the driver directly is supposed to work like that already. Requesting a connection makes the code more verbose, more error handling, and only really needed if you need to do something with the connection object itself.

@sudhakar
Copy link

sudhakar commented Jun 1, 2020

FWIW if anybody is working on a fix for driver.ErrBadConn, here's how mysql guys fixed this issue

@lelvisl
Copy link

lelvisl commented Jul 6, 2020

any updates?

@goku321
Copy link

goku321 commented Jul 9, 2020

@kshvakov would like to work on this issue.

@kshvakov
Copy link
Collaborator

kshvakov commented Jul 9, 2020

@goku321 , It's Open Source. Of course you can do it!)

@goku321
Copy link

goku321 commented Aug 31, 2020

Not able to reproduce it locally. Any help would be appreciated. I'm using precompiled binary for macos.

Here's my code:

package main

import (
	"context"
	"database/sql"
	"log"
	"time"

	_ "github.com/ClickHouse/clickhouse-go"
)

var connStr string = "tcp://localhost:9000"

func write(ch *sql.DB) error {
	tx, err := ch.BeginTx(context.Background(), nil)
	if err != nil {
		return err
	}

	stmt, err := tx.Prepare(`
		INSERT INTO not_exists (
			ts,
			siteID,
			user.id,
			user.accountID,
			segmentID
		)
		VALUES(?, ?, ?, ?)
	`)
	if err != nil {
		tx.Rollback()
		return err
	}
	defer stmt.Close()

	for i := 0; i < 3; i++ {
		_, err = stmt.Exec(
			time.Now(),
			1,
			[]string{"a"},
			[]string{"a"},
			1,
		)

		if err != nil {
			stmt.Close()
			tx.Rollback()
			return err
		}
	}

	err = stmt.Close()
	if err != nil {
		return err
	}

	return tx.Commit()
}

func main() {
	db, err := sql.Open("clickhouse", connStr)
	if err != nil {
		log.Fatal(err)
	}

	db.SetMaxOpenConns(1)
	db.SetMaxIdleConns(0)
	db.SetConnMaxLifetime(5 * time.Second)

	// will fail.
	_, err = db.Query("select * from not_exists")
	if err != nil {
		log.Println(err)
	}

	// will fail.
	err = write(db)
	if err != nil {
		log.Println(err)
	}

	// users table exists.
	rows, err := db.Query("select * from users")
	if err != nil {
		log.Fatal(err)
	}
	rows.Close()
}

@sudhakar
Copy link

@goku321 This usually happens when server side sockets gets closed abruptly & client tries to re-use the stale connection. Setting keep_alive_timeout / max_session_timeout to a lower value in /etc/clickhouse-server/config.xml might help in reproducing the issue (or try killing the server when connection is active)

@achettyiitr
Copy link

Hi guys,
We are still getting this issue and now at a broader level.

saydamir pushed a commit to saydamir/clickhouse-go that referenced this issue Oct 8, 2021
This commit fixes for ClickHouse#213.

If the connection closes from the server side for some reason, the database driver returns driver.ErrBadConn to database/sql.

Usually, database/sql retries a request, assuming that the error occurs in a function that could be called first after retrieving a connection from the pool.
But beginTx in clickhouse-go doesn't perform any network interaction and driver.ErrBadConn is returned later in the transaction. database/sql doesn't retry it because assumes that connection is alive - beginTx didn't return the error.

 This commit adds a method to check the connection liveness and performs that check in beginTx function.

 The check is taken from go-sql-driver/mysql#934.

There is no way to check the liveness of a secure connection, as long as there is no access to raw TCP net.Conn in clickhouse-go.
saydamir pushed a commit to saydamir/clickhouse-go that referenced this issue Oct 8, 2021
This commit fixes for ClickHouse#213.

If the connection closes from the server side for some reason, the database driver returns driver.ErrBadConn to database/sql.

Usually, database/sql retries a request, assuming that the error occurs in a function that could be called first after retrieving a connection from the pool.
But beginTx in clickhouse-go doesn't perform any network interaction and driver.ErrBadConn is returned later in the transaction. database/sql doesn't retry it because assumes that connection is alive - beginTx didn't return the error.

 This commit adds a method to check the connection liveness and performs that check in beginTx function.

 The check is taken from go-sql-driver/mysql#934.

There is no way to check the liveness of a secure connection, as long as there is no access to raw TCP net.Conn in clickhouse-go.
abraithwaite pushed a commit to segmentio/clickhouse-go that referenced this issue Oct 23, 2021
This commit fixes for ClickHouse#213.

If the connection closes from the server side for some reason, the database driver returns driver.ErrBadConn to database/sql.

Usually, database/sql retries a request, assuming that the error occurs in a function that could be called first after retrieving a connection from the pool.
But beginTx in clickhouse-go doesn't perform any network interaction and driver.ErrBadConn is returned later in the transaction. database/sql doesn't retry it because assumes that connection is alive - beginTx didn't return the error.

 This commit adds a method to check the connection liveness and performs that check in beginTx function.

 The check is taken from go-sql-driver/mysql#934.

There is no way to check the liveness of a secure connection, as long as there is no access to raw TCP net.Conn in clickhouse-go.
@co11ter
Copy link

co11ter commented Nov 1, 2021

The last patch didn't fix the problem completely: https://github.com/ClickHouse/clickhouse-go/pull/421/files#diff-bf47e8d1e5bf079269e36fce5f142b421ec333fcdc9b0dc8117c63cf175f8984R129

When we begin transaction, the database/sql doesn't check connection, as sql.Tx has no access to connection pool and use only one.

Look at tests please:

func TestClickBadConnectionTxBefore(t *testing.T) {
	db := openDB(t)

	time.Sleep(1100 * time.Millisecond)

	// Fixed by actual master of clickhouse driver.
	tx, err := db.BeginTx(context.Background(), &sql.TxOptions{})
	assert.NoError(t, err, "start tx")

	_, err = tx.ExecContext(context.Background(), "SELECT * FROM system.time_zones")
	assert.NoError(t, err, "select")

	err = tx.Commit()
	assert.NoError(t, err, "commit")
}

func TestClickBadConnectionTxAfter(t *testing.T) {
	db := openDB(t)

	tx, err := db.BeginTx(context.Background(), &sql.TxOptions{})
	assert.NoError(t, err, "start tx")

	time.Sleep(1100 * time.Millisecond)

	// Connection of tx become stale.
	_, err = tx.ExecContext(context.Background(), "SELECT * FROM system.time_zones")
	assert.NoError(t, err, "select")

	err = tx.Commit()
	assert.NoError(t, err, "commit")
}

func TestClickBadConnectionNoTx(t *testing.T) {
	db := openDB(t)

	time.Sleep(1100 * time.Millisecond)

	_, err := db.ExecContext(context.Background(), "SELECT * FROM system.time_zones")
	assert.NoError(t, err, "select")
}

func openDB(t *testing.T) *sql.DB {
	db, err := sql.Open("clickhouse", testutils.GetClickHouseConnectionString(t))
	assert.NoError(t, err, "get db")

	_, err = db.ExecContext(context.Background(), "set idle_connection_timeout=1")
	assert.NoError(t, err, "set idle conn")

	_, err = db.ExecContext(context.Background(), "set tcp_keep_alive_timeout=0")
	assert.NoError(t, err, "set tcp keep alive")

	return db
}

@saydamir
Copy link

@co11ter connection pooling and retries are the responsibility of database/sql.
database/sql expects the connection to be checked during the transaction opening. If the connection is closed, several attempts will be performed to call the BeginTX driver function with other connections.
database/sql does not try to fix the situation if the connection is closed after the start of the transaction(it should be pretty challanging).

So I'm not sure if it's right to try to handle such situation inside the driver.
And whether it is possible at all, given that the driver does not have access to the connection pool.

@co11ter
Copy link

co11ter commented Nov 17, 2021

I guess solving it on database/sql side is not good idea, since connection pooling in this case allows to send queries of transaction to various connections. It can be inconsistent within some database.

Thus it turns out ensuring liveness of connection within transaction is responsibility of driver.

@saydamir
Copy link

@co11ter I agree that such a problem can't be solved in database/sql.

But what behavior is expected here?
At the moment driver sees that connection is invalid and return driver.ErrBadConn to database/sql.

@co11ter
Copy link

co11ter commented Nov 17, 2021

I suggest to reopen tcp connection if old one is invalid and clickhouse.connect obtained for transaction.

@saydamir
Copy link

It seems to me more like a new feature than a bug fix 👍

kshvakov added a commit that referenced this issue Jan 20, 2022
@kshvakov
Copy link
Collaborator

Added in V2 75ad13d along with ResetSession

farbodsalimi pushed a commit to segmentio/clickhouse-go that referenced this issue May 18, 2022
This commit fixes for ClickHouse#213.

If the connection closes from the server side for some reason, the database driver returns driver.ErrBadConn to database/sql.

Usually, database/sql retries a request, assuming that the error occurs in a function that could be called first after retrieving a connection from the pool.
But beginTx in clickhouse-go doesn't perform any network interaction and driver.ErrBadConn is returned later in the transaction. database/sql doesn't retry it because assumes that connection is alive - beginTx didn't return the error.

 This commit adds a method to check the connection liveness and performs that check in beginTx function.

 The check is taken from go-sql-driver/mysql#934.

There is no way to check the liveness of a secure connection, as long as there is no access to raw TCP net.Conn in clickhouse-go.
mshustov pushed a commit that referenced this issue Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests