You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have you tried reproducing the issue with the latest release?
Yes
What is the hardware spec (RAM, CPU, OS)?
MacBook Pro M2
What steps will reproduce the bug?
When 2 concurrent transactions read a non-existent key, and then write to that key, the behavior I currently observe is that one of the transactions will conflict (this, I believe, is expected). For example:
package main
import (
"log""os""time""github.com/dgraph-io/badger/v4""golang.org/x/sync/errgroup"
)
funcmain() {
iferr:=run(); err!=nil {
log.Fatal(err)
}
}
funcrun() error {
dir, err:=os.MkdirTemp(os.TempDir(), "badger")
iferr!=nil {
returnerr
}
deferfunc() {
log.Printf("cleaning up %s", dir)
_=os.RemoveAll(dir)
}()
db, err:=badger.Open(badger.DefaultOptions(dir).WithLoggingLevel(badger.ERROR))
iferr!=nil {
returnerr
}
deferfunc() {
_=db.Close()
}()
key:= []byte("key-1")
eg:= errgroup.Group{}
eg.Go(func() error {
returndb.Update(func(txn*badger.Txn) error {
<-time.After(time.Second) // pause long enough to ensure the other goroutine is running_, _=txn.Get(key)
returntxn.Set(key, []byte("value-1"))
})
})
eg.Go(func() error {
returndb.Update(func(txn*badger.Txn) error {
<-time.After(time.Second) // pause long enough to ensure the other goroutine is running_, _=txn.Get(key)
returntxn.Set(key, []byte("value-2"))
})
})
returneg.Wait()
}
...yields...
$ go run .
2023/06/04 19:26:09 cleaning up /var/folders/fd/my_tbdw53yj8rn0gb_m7dlm40000gn/T/badger3191223375
2023/06/04 19:26:09 Transaction Conflict. Please retry
exit status 1
Another way to observe this behavior is to use 2 manually managed transactions and "inline" the executed steps as it were, i.e. execute one after the other (with the 2 transactions active at the same time). This is generally an easier way to trigger this behavior. For example:
$ go run .
<snip>
2023/06/04 19:27:41 cleaning up /var/folders/fd/my_tbdw53yj8rn0gb_m7dlm40000gn/T/badger3558899794
2023/06/04 19:27:41 tx2 failed: Transaction Conflict. Please retry
exit status 1
Now, I have a couple of transactions that are slightly more convoluted than the above, but they are expected to conflict with each other for the same reasons. These transactions are being executed from different goroutines, so conflicts can happen on a general basis. To simplify things, I've done the same as the above - I've taken what would be executed by 2 goroutines concurrently and "inlined" the steps to highlight a scenario where these 2 transactions could conflict with each other (this is the doWork function below). This "inlined" version conflicts as expected. However, If I run those steps across many goroutines at the same time (each operating on completely different keys), 1 of the executions fails to conflict, unexpectedly.
$ while go run .; do echo "---"; done
2023/06/04 19:58:58 completed as expected with 1000 conflicts
2023/06/04 19:58:58 cleaning up /var/folders/fd/my_tbdw53yj8rn0gb_m7dlm40000gn/T/badger944970449
---
2023/06/04 19:58:58 completed as expected with 1000 conflicts
2023/06/04 19:58:58 cleaning up /var/folders/fd/my_tbdw53yj8rn0gb_m7dlm40000gn/T/badger358100745
---
2023/06/04 19:58:59 completed as expected with 1000 conflicts
2023/06/04 19:58:59 cleaning up /var/folders/fd/my_tbdw53yj8rn0gb_m7dlm40000gn/T/badger373541670
---
2023/06/04 19:59:00 completed as expected with 1000 conflicts
2023/06/04 19:59:00 cleaning up /var/folders/fd/my_tbdw53yj8rn0gb_m7dlm40000gn/T/badger55216248
---
2023/06/04 19:59:00 failed with 999 conflicts
2023/06/04 19:59:00 cleaning up /var/folders/fd/my_tbdw53yj8rn0gb_m7dlm40000gn/T/badger2814289974
2023/06/04 19:59:00 unexpected result: err = <nil> (i = 101)
exit status 1
So essentially 1 execution of the doWork() function has failed to conflict, whilst 999 (and 4k prior) executions did conflict as expected. Note that the inclusion of the random delays seems to help trigger the conditions for this, though it is unclear to me why.
Expected behavior and actual result.
It's unclear to me if my expectation for things to conflict is reasonable, but it does strike me as odd that the behaviour can apparently differ. Failure to conflict can result in loss of writes.
Additional information
same behavior is observed when using in-memory mode.
behavior is still observed even with a much reduced number of goroutines running (e.g. 10)
The text was updated successfully, but these errors were encountered:
Thanks for filling a detailed bug, I am looking into it. It seems like when txn timestamps (readTs) are zero, it doesn't seem to conflict. I am trying figure out why that is the case.
This seems like a bug to me, the read watermark used by badger does not handle the case when ts=0 well. I am looking for a solution for it. Thanks again for filing a reproducible bug.
Fixes#1962
We have assumed that index won't be zero for a WaterMark but in
badger's unmanaged mode we start transactions with readTs = 0.
This affects oracle.readMark that could have values starting at 0.
Fixes#1962
We have assumed that index won't be zero for a WaterMark but in badger's
unmanaged mode we start transactions with readTs = 0. This affects
oracle.readMark that could have values starting at 0.
What version of Badger are you using?
v4.1.0 (tested on latest
main
too)What version of Go are you using?
go version go1.20.3 darwin/arm64
Have you tried reproducing the issue with the latest release?
Yes
What is the hardware spec (RAM, CPU, OS)?
MacBook Pro M2
What steps will reproduce the bug?
When 2 concurrent transactions read a non-existent key, and then write to that key, the behavior I currently observe is that one of the transactions will conflict (this, I believe, is expected). For example:
...yields...
Another way to observe this behavior is to use 2 manually managed transactions and "inline" the executed steps as it were, i.e. execute one after the other (with the 2 transactions active at the same time). This is generally an easier way to trigger this behavior. For example:
..outputs..
Now, I have a couple of transactions that are slightly more convoluted than the above, but they are expected to conflict with each other for the same reasons. These transactions are being executed from different goroutines, so conflicts can happen on a general basis. To simplify things, I've done the same as the above - I've taken what would be executed by 2 goroutines concurrently and "inlined" the steps to highlight a scenario where these 2 transactions could conflict with each other (this is the
doWork
function below). This "inlined" version conflicts as expected. However, If I run those steps across many goroutines at the same time (each operating on completely different keys), 1 of the executions fails to conflict, unexpectedly.So, with the following code:
This is the output I get with repeated running:
So essentially 1 execution of the
doWork()
function has failed to conflict, whilst 999 (and 4k prior) executions did conflict as expected. Note that the inclusion of the random delays seems to help trigger the conditions for this, though it is unclear to me why.Expected behavior and actual result.
It's unclear to me if my expectation for things to conflict is reasonable, but it does strike me as odd that the behaviour can apparently differ. Failure to conflict can result in loss of writes.
Additional information
The text was updated successfully, but these errors were encountered: