Integrate new memory management interfaces into client and server #610

LaurentiuCristofor · 2021-04-01T00:57:04Z

This change enables the use of new memory allocation interfaces in the client and server.

Highlights:

the memory allocation is now done through the new memory manager/chunk manager interfaces. There is no longer any communication with the server to request memory, so all the "request_memory" code has been removed. Also, the method through which memory was allocated using the memory manager until now has been removed. Now the client has its own per-session memory manager instance that can be used to allocate chunks which can then be used with a per-session chunk manager instance. The server also uses a memory manager and chunk manager during recovery.
the client manager uses the commit/rollback interfaces to mark locally the allocations it has committed. The client calling commit is a short-term solution - the long-term plan is to communicate the chunk manager information to the server and have the server perform the commits instead - this will happen at a later time.
there is some additional cleanup: use of __func__ in arguments to retail_assert has been removed, bitmaps have been changed from uint64 arrays to arrays of std::atomic<uint64_t>, and two bitmap operations have been re-implemented using builtin functions.
there are some additional bug fixes and two additional unit tests for the bitmap functions.
some chunk/memory manager methods have been made private because they are only meant to be used internally.

Known issues:

I am occasionally hitting the retail assert "Another thread has marked chunk as used!" during test runs, but the problem doesn't happen reliably. I'll look into this next.

Out-of-scope:

There is a bit of support for GC in the form of the deallocate() and deallocate_chunk() methods, but little support for compaction. Any further support for these operations will be added in future changes. The focus here is just to update the memory allocation to use the new interfaces.

senderista

Looks good, but there are some safety issues that need to be addressed.

senderista · 2021-04-01T17:55:22Z

production/db/core/src/db_client.cpp

@@ -532,50 +528,103 @@ void client_t::commit_transaction()
    // (https://gaiaplatform.atlassian.net/browse/GAIAPLAT-292).
    if (event == session_event_t::DECIDE_TXN_ABORT)
    {
+        rollback_chunk_manager_allocations();


I'm unclear on the semantics of rollback_chunk_manager_allocations(). In the current architecture, we GC aborted txn logs and free their redo versions in regular post-commit maintenance tasks, but only after the aborted txn falls behind the watermark. (This is necessary to ensure that any undecided txns in the conflict window of a committing txn cannot have their txn logs freed while they are being used for validation.) So the client can't overwrite the redo versions from the aborted txn before the GC tasks that expect to free the txn's redo versions and log have been able to run. We can certainly discuss immediately reusing space from aborted txns in the future, but those are the present constraints (and if aborts are rare enough, they may be acceptable, since if aborts are not rare, we have much bigger problems to address than wasting allocated space). Rollbacks don't have the same issue since the txns haven't been submitted to the server, so no GC tasks will attempt to free their redo versions or logs.

So I think the only safe solution for now is for the client to do nothing on abort. You could add a REVIEW comment re: reusing the aborted txn's allocated space as a possible future optimization.

I'm not sure what the conflict would be. Currently, we only GC the undo versions of committed transactions. For aborted transactions, I'm not sure why there would be any impact from reusing non-committed memory - validation should not need to check the content of the memory - it would only need to compare locator values to find out conflicts. The release of the memory should be orthogonal to the freeing of the txn logs.

Think about it and let me know. If you still think there is a problem, I can comment out these calls for now with REVIEW comments.

After reviewing deallocate_txn_log, I agree that more changes are needed. I'll comment the current calls to rollback and add more comments on what should happen.

The release of the memory should be orthogonal to the freeing of the txn logs.

It's not orthogonal, because both the logs and the memory (i.e., undo versions, for committed txns) become obsolete exactly when any versions superseded by the associated txn are not visible to any present or future txns. Specifically, when the associated txn's commit_ts falls behind the watermark, we know that its redo versions can be safely applied to the shared view, that its redo log does not need to be applied to any new snapshots, and that its undo versions are not visible to any present or future txns. (Later, we also might need to apply the undo log to new snapshots if we allow the shared view to be updated ahead of the watermark, so this applies to the undo log as well.)

You're right, though, that we need only the txn logs for conflict detection, not the undo or redo versions. It's simply cleaner for now to treat aborted txns just like committed txns for GC, except that we free the redo versions rather than the undo versions. I currently don't think we should prioritize optimizing aborts, since as I said, if aborts are common then we have much bigger problems to worry about (i.e., contention management in general). So if aborted txns end up holding onto some physical and virtual memory a bit longer than strictly necessary, that's OK with me if it simplifies implementation.

Note that the above applies only to aborts, not to rollbacks (where we can safely rewind the allocation pointer). I also assume that txns can only abort (i.e., fail after submission) because of update conflicts.

I don't completely follow your argument.

Uncommitted allocations should only be relevant to the current transaction as far as their contents go. No other transaction should care about uncommitted data. So, de-allocating the memory when the transaction is known to have been aborted or rolled back should be safe - nobody should be accessing its content.

I was ok with commenting out the code because with the current implementation, the memory would get de-allocated twice given the current implementation of deallocate_txn_log, so the slot_bitmap could end up corrupted (client already starts allocations but then deallocate_txn_log marks some as freed). But note that with commit still happening, previously rolled back allocations may appear as committed due to later commits. However, all this won't matter until we'll actually start using the metadata for compaction. So, at least, there is no functional impact with these changes, just not an optimal use of memory.

Now we seem to be overloading the word "committed", so it could refer to either a txn that was successfully validated on the server, or some allocations in a chunk that are safely behind the "committed" pointer. I'm not sure which you are referring to in the above (and I think we need to be careful about this overloading).

I'm not sure if it makes much of a difference. The original implementation (before I commented out the rollbacks) ensured that chunk allocations could only end up behind the committed pointer as a result of successful server validation. Any server decisions other than a successful validation would rollback the allocations (this is no longer the case now that I commented those out). So within this framework an allocation couldn't be behind the committed pointer unless it was actually committed.

I've actually decided to comment out the calls to commit as well, not just those to rollback. This is because there is no functional benefit right now from making those calls and the state that they describe is a lie if there are intermediate rollbacks or abortions. We can uncomment them or change them later when we do the GC work.

Sounds good. I'll update the shared DB design doc next week with more details on the allocation, GC, and compaction algorithms, which should help clarify the metadata requirements.

production/db/core/src/db_client.cpp

production/db/inc/memory_manager/bitmap.hpp

senderista · 2021-04-01T19:53:57Z

production/db/inc/memory_manager/base_memory_manager.inc

@@ -138,3 +138,24 @@ uint8_t* base_memory_manager_t::get_address(address_offset_t memory_offset) cons

    return memory_address;
 }
+
+address_offset_t base_memory_manager_t::get_chunk_address_offset(address_offset_t memory_offset) const


Later when we convert to gaia_offset_t, these conversions can be made simpler/faster (just take the high/low 16 bits, no mod or division required).

I think that's unnecessary optimization that can be postponed until it is really needed.

Thinking a bit more about all this, it doesn't add up:

To save 4B by using 32bit offsets instead of 64bit ones we've had to round up allocation sizes to 64B multiples, which will give us about 32B wasted per allocation on average. So we're wasting 32B to be able to save 4B! How does that work?

Multiple reasons. First, note that we currently have a 16-byte object header, and that will only grow (I can easily see it expanding to at least 24 bytes). So you're already "wasting" 16 bytes/allocation. Furthermore, I don't think any real-world object payload (i.e., references + flatbuffer) would fit in less than 16 bytes. So it already doesn't make sense to allow <32B minimum total object size. Forcing the alignment unit to be the same as the minimum allocation unit means we can track occupancy in the same units as object offsets, which simplifies our metadata considerably (e.g., you can just take the difference between the starting offset of an object and the next starting offset, in alignment units, to find the allocated size of that object in minimum allocation units).

A few more issues: if we used 8-byte alignment, the bitmaps tracking the occupancy of each slot in the chunk would have 8x higher overhead (64KB, which doesn't fit in L1d cache). We would lose guaranteed cache-line alignment for small objects, so would incur twice as many cache misses on average when dereferencing offsets. We would risk false sharing for small objects if we ever embedded write locks in them. But the biggest win from 32-bit offsets as I see it is that it halves the memory requirements of client-side snapshots (and server-side shared view copies, if we maintain multiple copies as I'm considering doing). Even for read-only transactions, each new txn's COW locator segment mapping requires physical memory proportional to the size of the union of the combined write logs of all unapplied committed txns and the txn's own write log. (This may become less of an issue if we "roll back" a snapshot via undo logs in a userspace page fault handler.)

It's a good point that such a large allocation granularity gives up some of the memory savings of an arena allocator design (I think coarse allocation granularity is more beneficial for a size-class allocator). But I think it's worth it for us. Feel free to bring this up at the next DB meeting.

Maybe a better punchline: the space wasted by a coarse allocation granularity is wasted only once (since the data segment pages are shared by all clients). But the space saved by smaller offset sizes is multiplied by the number of currently active txns (assuming they all have about the same number of COW pages in their snapshots).

Another thing I didn't mention is that 32-bit offsets and 32-bit locators also cut txn log size approximately in half, which matters when we're forced to keep a lot of txn logs around due to long-running txns (although admittedly that matters less than having to keep all the obsolete versions around).

All these considerations become moot if we end up having to manage more than 256GB of memory. And I don't think that's a very distant possibility. We can discuss this further on the 12th.

All of them? Not the fact that only a minimum object size >= 32B makes sense, or the fact that using the same units for alignment and allocation simplifies allocation metadata, or the fact that cache-line alignment of small objects could potentially improve random read performance. The only thing that would change is that 32-bit offsets would no longer be possible, so space considerations would be less important.

Well, we're discussing the benefit of using 32 bit offsets. So the only considerations that are relevant are those that demonstrate an advantage. Once we can no longer use 32 bit offsets, all those considerations become moot.

Not quite. My point was that 64-byte alignment and allocation granularity has some benefits independent of smaller offsets. Those considerations will not become moot if our address space can no longer fit into 32 bits, even if the absence of space savings from smaller offsets makes 64-byte allocation granularity less compelling overall.

production/db/memory_manager/src/bitmap.cpp

production/db/memory_manager/src/chunk_manager.cpp

production/db/memory_manager/src/memory_manager.cpp

senderista

I believe all blocking issues have been addressed.

senderista

Latest changes look good, thanks.

Integrate new memory management interfaces into client and server

1d3c71e

LaurentiuCristofor requested review from senderista, mihirj1993 and simone-gaia April 1, 2021 00:57

Updates and fixes

f750491

senderista suggested changes Apr 1, 2021

View reviewed changes

Laurentiu Cristofor added 3 commits April 1, 2021 18:53

Fix 2 bugs and extent bitmap tests

7cccc69

Simplify the code for allocation from unused memory

7c3200c

Implement review feedback

79d61ba

senderista reviewed Apr 2, 2021

View reviewed changes

production/db/memory_manager/src/memory_manager.cpp Show resolved Hide resolved

senderista reviewed Apr 2, 2021

View reviewed changes

production/db/memory_manager/src/memory_manager.cpp Outdated Show resolved Hide resolved

senderista self-requested a review April 2, 2021 19:56

senderista approved these changes Apr 2, 2021

View reviewed changes

Additional implementation of feedback suggestions

26a925a

senderista approved these changes Apr 2, 2021

View reviewed changes

Laurentiu Cristofor added 2 commits April 2, 2021 15:19

Additional fixes from feedback

fb3657a

Comment out the call to commit_chunk_manager_allocations()

e7da1b2

LaurentiuCristofor merged commit 89e3d6b into master Apr 3, 2021

LaurentiuCristofor deleted the laur_mm3 branch April 3, 2021 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate new memory management interfaces into client and server #610

Integrate new memory management interfaces into client and server #610

LaurentiuCristofor commented Apr 1, 2021 •

edited

Loading

senderista left a comment

senderista Apr 1, 2021

LaurentiuCristofor Apr 2, 2021

LaurentiuCristofor Apr 2, 2021

senderista Apr 2, 2021 •

edited

Loading

senderista Apr 2, 2021

LaurentiuCristofor Apr 2, 2021

senderista Apr 2, 2021

LaurentiuCristofor Apr 2, 2021

LaurentiuCristofor Apr 2, 2021

senderista Apr 2, 2021

senderista Apr 1, 2021

LaurentiuCristofor Apr 2, 2021

senderista Apr 2, 2021

LaurentiuCristofor Apr 2, 2021

senderista Apr 2, 2021

senderista Apr 2, 2021

LaurentiuCristofor Apr 2, 2021

senderista Apr 2, 2021

LaurentiuCristofor Apr 2, 2021

senderista Apr 2, 2021

senderista left a comment

senderista left a comment

Integrate new memory management interfaces into client and server #610

Integrate new memory management interfaces into client and server #610

Conversation

LaurentiuCristofor commented Apr 1, 2021 • edited Loading

senderista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

senderista Apr 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

senderista left a comment

Choose a reason for hiding this comment

senderista left a comment

Choose a reason for hiding this comment

LaurentiuCristofor commented Apr 1, 2021 •

edited

Loading

senderista Apr 2, 2021 •

edited

Loading