Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race in unit tests #4349

Closed
michaeldiamant opened this issue Aug 3, 2022 · 1 comment
Closed

Fix data race in unit tests #4349

michaeldiamant opened this issue Aug 3, 2022 · 1 comment
Assignees

Comments

@michaeldiamant
Copy link
Contributor

Problem

From https://app.circleci.com/pipelines/github/algorand/go-algorand/8335/workflows/26741401-daf2-4ab0-be50-9cd00b6ae547/jobs/150292/parallel-runs/3:

=== Failed
=== FAIL: ledger/internal TestMaxInnerTxForSingleAppCall/cv=v32 (1.26s)
    testing.go:1152: race detected during execution of test
    --- FAIL: TestMaxInnerTxForSingleAppCall/cv=v32 (1.26s)

=== FAIL: ledger/internal TestMaxInnerTxForSingleAppCall (3.80s)
    testing.go:1152: race detected during execution of test

=== FAIL: ledger/internal TestEvalAppState/cv=v29 (0.30s)
    testing.go:1152: race detected during execution of test
    --- FAIL: TestEvalAppState/cv=v29 (0.30s)

=== FAIL: ledger/internal TestEvalAppState (3.65s)
    testing.go:1152: race detected during execution of test

=== FAIL: ledger/internal TestDuplicates/cv=v18 (0.29s)
==================
WARNING: DATA RACE
Write at 0x00c0350a9b3f by goroutine 66:
  github.com/algorand/go-algorand/ledger/internal_test.TestMaxInnerTxForSingleAppCall.func1()
      /opt/cibuild/project/ledger/internal/apptxn_test.go:1818 +0xdc4
  github.com/algorand/go-algorand/ledger/internal_test.testConsensusRange.func1()
      /opt/cibuild/project/ledger/internal/eval_blackbox_test.go:946 +0x55
  testing.tRunner()
      /opt/cibuild/.gimme/versions/go1.17.9.darwin.amd64/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /opt/cibuild/.gimme/versions/go1.17.9.darwin.amd64/src/testing/testing.go:1306 +0x47

Previous read at 0x00c0350a9b3f by goroutine 46:
  runtime.slicecopy()
      /opt/cibuild/.gimme/versions/go1.17.9.darwin.amd64/src/runtime/slice.go:284 +0x0
  github.com/algorand/msgp/msgp.AppendBytes()
      /opt/cibuild/go/pkg/mod/github.com/algorand/[email protected]/msgp/write_bytes.go:178 +0x745
  github.com/algorand/go-algorand/data/transactions.(*Transaction).MarshalMsg()
      /opt/cibuild/project/data/transactions/msgp_gen.go:4417 +0x50c7
  github.com/algorand/go-algorand/data/transactions.(*SignedTxnInBlock).MarshalMsg()
      /opt/cibuild/project/data/transactions/msgp_gen.go:3531 +0x1972
  github.com/algorand/go-algorand/data/transactions.Payset.MarshalMsg()
      /opt/cibuild/project/data/transactions/msgp_gen.go:3127 +0x1a5
  github.com/algorand/go-algorand/data/bookkeeping.(*Block).MarshalMsg()
      /opt/cibuild/project/data/bookkeeping/msgp_gen.go:346 +0x2a6e
  github.com/algorand/go-algorand/protocol.EncodeMsgp()
      /opt/cibuild/project/protocol/codec.go:121 +0x197
  github.com/algorand/go-algorand/protocol.Encode()
      /opt/cibuild/project/protocol/codec.go:127 +0x198
  github.com/algorand/go-algorand/ledger.blockPut()
      /opt/cibuild/project/ledger/blockdb.go:171 +0x494
  github.com/algorand/go-algorand/ledger.(*blockQueue).syncer.func1()
      /opt/cibuild/project/ledger/blockqueue.go:114 +0x1c4
  github.com/algorand/go-algorand/util/db.(*Accessor).atomic.func1()
      /opt/cibuild/project/util/db/dbutil.go:234 +0xbd
  github.com/algorand/go-algorand/util/db.(*Accessor).atomic()
      /opt/cibuild/project/util/db/dbutil.go:286 +0xec4
  github.com/algorand/go-algorand/util/db.(*Accessor).Atomic()
      /opt/cibuild/project/util/db/dbutil.go:214 +0x245
  github.com/algorand/go-algorand/ledger.(*blockQueue).syncer()
      /opt/cibuild/project/ledger/blockqueue.go:112 +0x185
  github.com/algorand/go-algorand/ledger.bqInit·dwrap·41()
      /opt/cibuild/project/ledger/blockqueue.go:72 +0x39

Goroutine 66 (running) created at:
  testing.(*T).Run()
      /opt/cibuild/.gimme/versions/go1.17.9.darwin.amd64/src/testing/testing.go:1306 +0x726
  github.com/algorand/go-algorand/ledger/internal_test.testConsensusRange()
      /opt/cibuild/project/ledger/internal/eval_blackbox_test.go:946 +0xaa
  github.com/algorand/go-algorand/ledger/internal_test.TestMaxInnerTxForSingleAppCall()
      /opt/cibuild/project/ledger/internal/apptxn_test.go:1734 +0x1bb
  testing.tRunner()
      /opt/cibuild/.gimme/versions/go1.17.9.darwin.amd64/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /opt/cibuild/.gimme/versions/go1.17.9.darwin.amd64/src/testing/testing.go:1306 +0x47

Goroutine 46 (running) created at:
  github.com/algorand/go-algorand/ledger.bqInit()
      /opt/cibuild/project/ledger/blockqueue.go:72 +0x384
  github.com/algorand/go-algorand/ledger.(*Ledger).reloadLedger()
      /opt/cibuild/project/ledger/ledger.go:189 +0x17b
  github.com/algorand/go-algorand/ledger.OpenLedger()
      /opt/cibuild/project/ledger/ledger.go:163 +0x13b4
  github.com/algorand/go-algorand/ledger/internal_test.newTestLedgerFull()
      /opt/cibuild/project/ledger/internal/apptxn_test.go:404 +0x570
  github.com/algorand/go-algorand/ledger/internal_test.newTestLedgerWithConsensusVersion()
      /opt/cibuild/project/ledger/internal/apptxn_test.go:393 +0x151
  github.com/algorand/go-algorand/ledger/internal_test.NewDoubleLedger()
      /opt/cibuild/project/ledger/internal/double_test.go:59 +0x96
  github.com/algorand/go-algorand/ledger/internal_test.TestMaxInnerTxForSingleAppCall.func1()
      /opt/cibuild/project/ledger/internal/apptxn_test.go:1735 +0x144
  github.com/algorand/go-algorand/ledger/internal_test.testConsensusRange.func1()
      /opt/cibuild/project/ledger/internal/eval_blackbox_test.go:946 +0x55
  testing.tRunner()
      /opt/cibuild/.gimme/versions/go1.17.9.darwin.amd64/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      /opt/cibuild/.gimme/versions/go1.17.9.darwin.amd64/src/testing/testing.go:1306 +0x47
==================
    testing.go:1152: race detected during execution of test
    --- FAIL: TestDuplicates/cv=v18 (0.29s)

=== FAIL: ledger/internal TestDuplicates (6.49s)
    testing.go:1152: race detected during execution of test

Solution

If no root cause can be identified, removing t.Parallel() will fix the build.

Dependencies

Urgency

@michaeldiamant
Copy link
Contributor Author

From my inspection, I suspect the data race stems from TestMaxInnerTxForSingleAppCall in these lines:

dl.fullBlock(callTxGroup[0])
// Can't do 257 txns
callTxGroup[0].ApplicationArgs[0][1] = 1

Believed root issue
Invoking dl.fullBlock adds blocks to a queue processed in a different channel. Since dl.fullBlock does not wait for async processing to complete, it creates a data race when its argument (callTxGroup) is later modified by the test (ln 1818).

Proposed resolution
In eval_blackbox_test.go, modify endBlock to await a signal that block processing is complete (e.g. committedUpTo).

Analysis

  • Analysis performed on 86da537.
  • dl.fullBlock(callTxGroup[0]) eventually invokes AddValidatedBlock. AddValidatedBlock appends to blockQueue queue via
    err := l.blockQ.putBlock(blk, cert)
  • blockQueue queue read happens in a different channel:
    err0 := blockPut(tx, e.block, e.cert)
    • The enclosing method, syncer is referenced by the data race stacktrace.
    • The syncer channel writes the block's transactions into the database. The write explains why msgpack Transaction serialization appears in the data race stacktrace.
  • The test does not await block write completion. The next test step modifies a transaction (callTxGroup[0].ApplicationArgs[0][1] = 1) that might not yet be written in syncer.
  • Alternative - Rather than awaiting block write completion, a unique callTxGroup can be used for each block operation. The approach addresses the issue, but feels counter to prevailing Go language conventions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants