Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a BeginTransaction request. #2799

Merged
merged 2 commits into from
Oct 24, 2015
Merged

Conversation

spencerkimball
Copy link
Member

BeginTransactionRequest is automatically inserted by the KV client
immediately before the first transactional write. On execution,
BeginTransaction creates the transaction record.

If a heartbeat arrives for a txn that has no transaction record, it's
ignored. If a push txn arrives for a txn that has no transaction record,
the txn is considered aborted.

This solves the problem of errant heartbeats or pushes recreating a GC'd
transaction record, addressing #2062.

@spencerkimball
Copy link
Member Author

This PR isn't finished yet...there are two failing unittests (and looks like there are merge conflicts...again).

@bdarnell : the TestStoreRangeDownReplicate test is broken in
client_raft_test.go. This addition of requiring the txn to have
a BeginTransaction which writes the record before EndTransaction
can work is surfacing the bug. On occasion, when up-replicating
from 4 to 5 replicas, the most recently added replica isn't sent
the BeginTransaction request. It does however receive the
EndTransaction request, which causes it to fail. There's a bunch
of logging I added which indicates to me that state.addNode is in
fact being called correctly. I'm not sure why the
BeginTransaction request isn't sent through as expected. Perhaps
you'll have an easier time figuring out what's going wrong there.

@petermattis , @vivekmenezes : the sql LogicTest is failing with this
change and I haven't even started looking into why that's
happening. If it has something to do with the addition of a
BeginTransaction, it occurs to me that the SQL stuff may be
hiding errors.

@spencerkimball spencerkimball force-pushed the spencerkimball/begin-txn branch from eb64586 to 50d25e1 Compare October 8, 2015 07:32
@petermattis
Copy link
Collaborator

The SQL logic test failures that I looked at were all the same root problem. We were modifying the schema (e.g. adding a unique index) and found an error that was unrelated to the conditional puts we performed. For example, testdata/alter_table:63 fails because the error returned is associated with the key /3/1/1001/2, but that is a key in the descriptor table and in the batch we wrote to that key using Put, not CPut. The error returned is:

unexpected value: bytes:"\013\002" checksum:168851347 timestamp:<wall_time:1444333838303045793 logical:0 > tag:BYTES

That's a conditional put failure, but as I said we wrote to the key using Put. This makes me think that this change somehow screwed up the mapping of errors to results.

@spencerkimball
Copy link
Member Author

@petermattis Ah, good call. IIRC, @tschottdorf introduced an index into the error for batches. That would get screwed up because we insert the begin transaction request and then remove it. I'll look into fixing that. The other problem with raft remains however.

// the first transactional write. Also enforce that no transactional
// writes occur before a begin transaction.
var haveBeginTxn bool
for _, req := range ba.Requests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take a look at ba.GetArg(proto.BeginTransactionRequest).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to go through them all in order to check for error conditions.

@tbg
Copy link
Member

tbg commented Oct 9, 2015

Is the client code really the right place to insert this call? It seems like we're shoving more complexity into the client though we could do that in the coordinator transparently.
That we have this EndTransaction business in the client already may seem like that's where its counterpart should live, but really EndTransaction is there because only the client knows when we're done. But the coordinator knows perfectly well when we start the write portion of a transaction (pretty much at the beginning of TxnCoordSender.Send). What do you think?

I assumed that's what you were doing before I saw the client changes (it's not like you didn't explain it, but alas), so some of my comments will refer to this erroneously.

@tbg
Copy link
Member

tbg commented Oct 9, 2015

@petermattis Ah, good call. IIRC, @tschottdorf introduced an index into the error for batches. That would get screwed up because we insert the begin transaction request and then remove it. I'll look into fixing that. The other problem with raft remains however.

Yes, that'll be it. Just go for a hotfix for now; I'm filing an issue. We should move the error index out of the error.

@bdarnell
Copy link
Contributor

bdarnell commented Oct 9, 2015

On occasion, when up-replicating
from 4 to 5 replicas, the most recently added replica isn't sent
the BeginTransaction request. It does however receive the
EndTransaction request, which causes it to fail.

I can't reproduce this - do you have a log? I can, however, reproduce a timing-dependent failure in the down-replicate phase of this test. The transaction record is created successfully, but it's not there by the time EndTransaction is called. This looks like a race between the RangeGCQueue and the operations on this range. This test is artificially accelerating the clock to expire leases and running GC scans, which are the normal means by which we avoid this race in the real world (although we should probably also have something more iron-clad).

@spencerkimball
Copy link
Member Author

@tschottdorf I'm going to leave this stuff in the client. There's parity there with the end transaction stuff and I'm by no means convinced that the coordinator will always have enough information (with future directions) to determine the beginning of a txn.

@spencerkimball spencerkimball force-pushed the spencerkimball/begin-txn branch 2 times, most recently from 432c77b to ca6ef78 Compare October 24, 2015 17:09
BeginTransactionRequest is automatically inserted by the KV client
immediately before the first transactional write. On execution,
BeginTransaction creates the transaction record.

If a heartbeat arrives for a txn that has no transaction record, it's
ignored. If a push txn arrives for a txn that has no transaction record,
the txn is considered aborted.

This solves the problem of errant heartbeats or pushes recreating a GC'd
transaction record, addressing #2062.
@spencerkimball spencerkimball force-pushed the spencerkimball/begin-txn branch from ca6ef78 to ca96b85 Compare October 24, 2015 20:01
spencerkimball added a commit that referenced this pull request Oct 24, 2015
@spencerkimball spencerkimball merged commit 47fb047 into master Oct 24, 2015
@spencerkimball spencerkimball deleted the spencerkimball/begin-txn branch October 24, 2015 21:37
if !haveTxnWrite {
haveTxnWrite = roachpb.IsTransactionWrite(args)
if roachpb.IsTransactionWrite(args) && firstWriteIndex == -1 {
firstWriteKey = args.Header().Key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this end up being the last write key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err, disregard

@tbg
Copy link
Member

tbg commented Oct 26, 2015

I understand that you wanted to get this out of your queue, but please leave future PRs open for discussion at least up to the first LGTM.

br.Responses = append(br.Responses[:firstWriteIndex], br.Responses[firstWriteIndex+1:]...)
}
// Handle case where inserted begin txn confused an indexed error.
if pErr != nil && pErr.Detail != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just if iErr, ok := pErr.GoError().(roachpb.IndexedError) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@spencerkimball
Copy link
Member Author

My mistake. Thought this was already LGTM.

@spencerkimball
Copy link
Member Author

@tschottdorf Will send another PR with fixes.

tbg added a commit to tbg/cockroach that referenced this pull request Nov 10, 2015
@tbg tbg mentioned this pull request Nov 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants