You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My team bumped into failed to begin subsegment panics when using SQLContext together with setting limits on connection numbers and durations.
With aws-xray-sdk-go v1.8.0 the problem still occurs, but logs instead of panics (which is great!). But I wanted to check if we are misusing the SDK in anyway, and give you a heads up if there is some kind of race condition here.
Executing queries with instrumented contexts occasionally caused panics like:
"LOG panic: failed to begin subsegment named 'xyz@localhost': segment cannot be found."
From what I can tell, there is a small chance that database/sql will try to open a fresh conn using context.Background(), instead of using the caller context, or re-using an existing connection.
Reprod
I managed to come up with this slightly contrived program (tons of connections and very low lifetimes) that reliably triggers the failure on my machine:
package main
import (
"context"
"database/sql"
"log"
"net"
"time"
"github.com/aws/aws-xray-sdk-go/xray"
// "sqlite" database driver.
_ "modernc.org/sqlite"
)
func initDB() (*sql.DB, error) {
db, err := xray.SQLContext("sqlite", "tmp.sqlite")
if err != nil {
return nil, err
}
db.SetMaxOpenConns(10)
db.SetConnMaxLifetime(time.Millisecond)
return db, nil
}
func oneRequest(db *sql.DB) error {
var err error
ctx, seg := xray.BeginSegment(context.Background(), "main")
defer seg.Close(err)
_, err = db.ExecContext(ctx, "select sqlite_version();")
if err != nil {
return err
}
return nil
}
func xraySink() {
pc, err := net.ListenPacket("udp", ":2000")
if err != nil {
log.Fatalf("listen: %s", err)
}
defer pc.Close()
buf := make([]byte, 1024)
for {
pc.ReadFrom(buf)
}
}
func main() {
// Give X-ray something to log to.
go xraySink()
db, err := initDB()
if err != nil {
log.Fatalf("open DB: %s", err)
}
start := make(chan struct{})
for i := 0; i < 10_000; i++ {
go func(i int) {
<-start
err := oneRequest(db)
if err != nil {
log.Fatalf("oneRequest failed: %s", err)
}
}(i)
}
close(start)
time.Sleep(time.Second)
}
The stacktrace at the failed subsegment is:
0 0x00000001029a3e90 in github.com/aws/aws-xray-sdk-go/xray.BeginSubsegment
at /Users/viktor/go/pkg/mod/github.com/aws/[email protected]/xray/segment.go:283
1 0x000000010299db84 in github.com/aws/aws-xray-sdk-go/xray.Capture
at /Users/viktor/go/pkg/mod/github.com/aws/[email protected]/xray/capture.go:19
2 0x00000001029ae85c in github.com/aws/aws-xray-sdk-go/xray.(*driverConnector).Connect
at /Users/viktor/go/pkg/mod/github.com/aws/[email protected]/xray/sql_go110.go:59
3 0x0000000102267f58 in database/sql.(*DB).openNewConnection
at /Users/viktor/local/go/src/database/sql/sql.go:1238
4 0x0000000102267ec0 in database/sql.(*DB).connectionOpener
at /Users/viktor/local/go/src/database/sql/sql.go:1228
5 0x0000000102265a00 in database/sql.OpenDB.func1
at /Users/viktor/local/go/src/database/sql/sql.go:792
6 0x00000001021a47e4 in runtime.goexit
at /Users/viktor/local/go/src/runtime/asm_arm64.s:1172
So its the connectionOpener goroutine that gets a request for another connection, and tries to do it with context.Background().
The text was updated successfully, but these errors were encountered:
vbqz
changed the title
Using xray.SQLContext with SetMaxOpenConns/SetConnMaxLifetime can initialize connections without a segment.
Using xray.SQLContext with SetMaxOpenConns/SetConnMaxLifetime sometimes "failed to begin subsegment".
Jan 11, 2023
Hi,
My team bumped into
failed to begin subsegment
panics when usingSQLContext
together with setting limits on connection numbers and durations.With
aws-xray-sdk-go v1.8.0
the problem still occurs, but logs instead of panics (which is great!). But I wanted to check if we are misusing the SDK in anyway, and give you a heads up if there is some kind of race condition here.Issue
When using a database connection created like:
Executing queries with instrumented contexts occasionally caused panics like:
From what I can tell, there is a small chance that
database/sql
will try to open a fresh conn usingcontext.Background()
, instead of using the caller context, or re-using an existing connection.Reprod
I managed to come up with this slightly contrived program (tons of connections and very low lifetimes) that reliably triggers the failure on my machine:
The stacktrace at the failed subsegment is:
So its the
connectionOpener
goroutine that gets a request for another connection, and tries to do it withcontext.Background()
.The text was updated successfully, but these errors were encountered: