Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The driver is not thread-safe #15

Closed
bchavez opened this issue Nov 21, 2015 · 2 comments
Closed

The driver is not thread-safe #15

bchavez opened this issue Nov 21, 2015 · 2 comments

Comments

@bchavez
Copy link
Owner

bchavez commented Nov 21, 2015

Just dawned on me using the driver while writing a server-side application. The ReadResponse isn't thread-safe. We possibly need a dedicated thread ReadLoop for reading responses from the network-stream, then dispatch / signal / handing off the response to awaiting threads identified by their query token.

bchavez added a commit that referenced this issue Nov 22, 2015
Need to fix tests for AggregateException.
@bchavez
Copy link
Owner Author

bchavez commented Nov 23, 2015

The driver connection is now thread-safe and can now be used in multi-threaded server processing environments like ASP.NET or Message Queue systems. C# asynchronicity async/await keywords are now supported in run() and run helpers.

Internal Design Notes

Previously, the C# and Java driver followed the same ReadResponse semantic calling pattern. WriteQuery followed by a ReadResponse. The cursor implementation is a bit more complicated, ReadResponse pass through ConnectionInstance.ReadResponse. When a cursor sequence is detected, it is added to the cursorCache; an instance of Dictionary<token,Cursor> where token is the query token of type long.

  • cursorCache was changed to be a ConcurrentDictionary<token,Cursor> for thread-safety. So, potentially, many threads can add themselves to the cursorCache.

Next, starting with SocketWrapper, WriteQuery was changed prevent multiple threads writing to the network stream at the same time.

lock( writeLock )
{   // Everyone can write their query as fast as they can; block if needed.
    try
    {
        this.bw.Write(token);
        var jsonBytes = Encoding.UTF8.GetBytes(json);
        this.bw.Write(jsonBytes.Length);
        this.bw.Write(jsonBytes);
    }
    catch
    {
        Log.Trace($"Write Query failed for Token {token}.");
    }
}

Without the writeLock, multiple threads could write to the network stream at different times while writing a query to the network stream. writeLock is an object that's created on the CLR heap. The CLR run-time stores objects in the heap in the following raw memory format: [syncblock][handle][object]. The syncblock is part of an object header used by synchronization primitives and the CLR to determine whether or not an object is locked by a thread. The syncblock has two formats, thinlock and indexed. Thinlocks are super fast (locking info is already preset in the [syncblock] object header); whereas indexed require an extra lookup into the CLR syncblock table to determine locking information for an object. There is no way to explicitly define what type of "lock" gets placed with an object that goes on the CLR heap because it is entirely managed by CLR run-time. After experimenting I found the CLR is indeed giving us a thinlock:

Setting a breakpoint inside the lock:

lock( writeLock )
{   // Everyone can write their query as fast as they can; block if needed.
    try
    {
        Debugger.Break();
        this.bw.Write(token);
        var jsonBytes = Encoding.UTF8.GetBytes(json);
        this.bw.Write(jsonBytes.Length);
        this.bw.Write(jsonBytes);
    }

Run a query, break, then loading up the windows debugger on the process ID. Dump the heap for SocketWrapper objects:

ntsd -p 1644

0:012> .loadby sos mscorwks
0:012> .load sosex
0:019> !DumpHeap -type RethinkDb.Driver.Net.SocketWrapper
 Address       MT     Size
02a0d21c 0706d744       52
02a12430 0706f61c       16

Statistics:
      MT    Count    TotalSize Class Name
0706f61c        1           16 RethinkDb.Driver.Net.SocketWrapper+<>c__DisplayClass11_0
0706d744        1           52 RethinkDb.Driver.Net.SocketWrapper
Total 2 objects

The object we're intrested in is 52 bytes at address 02a0d21c:

0:019> !do 02a0d21c
Name:        RethinkDb.Driver.Net.SocketWrapper
MethodTable: 0706d744
EEClass:     06f17184
Size:        52(0x34) bytes
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
0706e664  4000035        4 ...Sockets.TcpClient  0 instance 02a0d740 socketChannel
04e57e8c  4000036       28      System.TimeSpan  1 instance 02a0d244 timeout
04e51d7c  4000037        8        System.String  0 instance 02a0c0a0 hostname
04e53cc4  4000038       24         System.Int32  1 instance    28015 port
0706da24  4000039        c ...ets.NetworkStream  0 instance 02a0f124 ns
04e4c41c  400003a       10 ...m.IO.BinaryWriter  0 instance 02a0f14c bw
04e65770  400003b       14 ...m.IO.BinaryReader  0 instance 02a0f1ec br
04e4966c  400003c       18 ...lationTokenSource  0 instance 02a12408 pump
0706df74  400003d       1c ...ethinkDb.Driver]]  0 instance 02a0d250 awaiters
04e5211c  400003e       20        System.Object  0 instance 02a0d734 writeLock

The SocketWrapper has reference to the writeLock object and is on the heap at address 02a0d734. So, let's look at the hidden object header by going back 0x4 bytes from the object pointer starting address:

0:019> dd 02a0d734-0x4 l1
02a0d730  0000000a

Indeed, we see that the object header is a _thin-lock_. AWESOME. Thin-locks have the upper byte of the object header as 0x00 and an indexed lock would have the upper byte 0x08... the object header has the value of 0xA or 10 dec which is the managed thread ID that is holding the current lock.

0:019> !threads
ThreadCount:      12
UnstartedThread:  0
BackgroundThread: 10
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no

       ID OSID ThreadOBJ
   0    1 2660 0068dce8
   2    2 2c80 0069b3d0
   7    3 2534 06521b20
   8    4 25a4 06523d78
  10    5 270c 065247a8
  11    6 2db8 06528de8
  12    7  728 06566a30
  13    8 155c 0656f750
  14    9  8a0 065d1540
  15   10 2eec 065e9020
  17   11 20b8 065fae68
  18   12  5ec 065fb688

So, the managed thread ID 10 is actually thread 15 inside the operating system. Switch our thread context and double check we're inside the lock:

0:010> ~15s

0:015> !ClrStack
OS Thread Id: 0x2eec (15)
Child SP       IP Call Site
083fde14 0546400f System.Diagnostics.Debugger.Break()
083fde3c 082fa41b RethinkDb.Driver.Net.SocketWrapper.WriteQuery

Indeed, we are the thread with the writeLock. Awesome.

Sanity check by checking the syncblock table:

0:015> !syncblk
Index         SyncBlock MonitorHeld Recursion
-----------------------------
Total           18
CCW             0
RCW             0
ComClassFactory 0
Free            0

Nothing is indexed in the syncblock table, so we're going as fast as we can with this write-lock. 👌 While it might not always be the case that we are using _thin-lock_s because locks are handled by the CLR, we have some good empirical evidence that we're in fact using a _thin-lock_s for this workload.


So, now reading responses
_See addendum design note below for current implementation_. The following block quote was from previous proposal idea:

... just before a thread writes their query to the network, the query token is assigned an awaiter. A dedicated thread from the thread pool is used to pull messages off the network. The dedicated thread loops inside SocketWrapper.ResponsePump reads any response, then looks up the awaiter by the response token ID. Then pushes the result to the awaiting thread, and everything continues as normally.

I would have preferred to have both of these interactions happen in one SendQuery (send & wait on a returned awaiter), but it would have had too much of an impact in high-level construct objects like Cursor that does things like MaybeSendContinue without really awaiting on a returned value from the call. So the current implementation still keeps same semantic calling patterns as the Java driver, but also requires us to make sure that we come back and grab our task AwaitResponseAsync (replacing ReadResponse) after we have written the query. In other words, do not write internal driver code to write a query to the network and not come back for the associated awaiting task. If you need to WriteQuery which allows "fire and forget" types of writes that don't expect any token associated response pass in WriteQuery(assignAwaiter:false).

🚀

@bchavez
Copy link
Owner Author

bchavez commented Nov 24, 2015

Addendum to this design note:

I decided to implement SendQuery in SocketWrapper and avoid the Write then ReadResponse semantic calling pattern. SendQuery returns a future awaiter Response value. Previously, the threading architecture was getting too complicated orchestrating threads to follow a two step process (write then come back later to await/read response). For example, the two step process also included a very obscure GC memory leak in a Cursor because Cursor sent MaybeSendContinue and Cursor STOP, both, which did not process their awaiters. The leak would occur because it was the awaiter responsibility to remove themselves from the awaiter's dictionary. The main ResponsePump thread would set the awaiter's Response result but not remove the awaiter from the dictionary of awaiters. Since the zombie awaiter (from a Cursor MaybeSendContinue or STOP) did not finish processing the awaited response, the zombie awaiter would be sitting in the awaiters dictionary forever.

Now the responsibility of removal of an awaiter from the awatier's dictionary is on the ResponsePump thread. Soon as a response is read from the network, the associated awaiter is removed off the awaiter dictionary (avoiding the memory leak). Additionally, SendQuery always returns an awaiter (Response future). Callers to SendQuery can specify awaitResponse=false for truly fire-and-forget queries. If the caller thread has awaitResponse=false indicating they are not interested in a response, they still get an instantly completed awaiter of Response null. If you don't want the awaiter, fine, disregard it; otherwise, the response you're looking for pushed inside the returned awaiter at some future moment in time. This added some additional responsibility to a Cursor. Cursors now need to keep track of the current CONTINUE awaiter in progress to read the CONTINUEed response. It's not too bad. Note: The outstandingRequests == 0 check in MaybeSendContinue guards us from sending too many continues (and consequently losing track of subsequent CONTINUEed awaiters).

Overall, SendQuery in SocketWrapper has much better calling semantics. Makes everything less complicated and simplified a lot of code making it easier to understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant