-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate new memory management interfaces into client and server #610
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but there are some safety issues that need to be addressed.
production/db/core/src/db_client.cpp
Outdated
@@ -532,50 +528,103 @@ void client_t::commit_transaction() | |||
// (https://gaiaplatform.atlassian.net/browse/GAIAPLAT-292). | |||
if (event == session_event_t::DECIDE_TXN_ABORT) | |||
{ | |||
rollback_chunk_manager_allocations(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unclear on the semantics of rollback_chunk_manager_allocations()
. In the current architecture, we GC aborted txn logs and free their redo versions in regular post-commit maintenance tasks, but only after the aborted txn falls behind the watermark. (This is necessary to ensure that any undecided txns in the conflict window of a committing txn cannot have their txn logs freed while they are being used for validation.) So the client can't overwrite the redo versions from the aborted txn before the GC tasks that expect to free the txn's redo versions and log have been able to run. We can certainly discuss immediately reusing space from aborted txns in the future, but those are the present constraints (and if aborts are rare enough, they may be acceptable, since if aborts are not rare, we have much bigger problems to address than wasting allocated space). Rollbacks don't have the same issue since the txns haven't been submitted to the server, so no GC tasks will attempt to free their redo versions or logs.
So I think the only safe solution for now is for the client to do nothing on abort. You could add a REVIEW
comment re: reusing the aborted txn's allocated space as a possible future optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the conflict would be. Currently, we only GC the undo versions of committed transactions. For aborted transactions, I'm not sure why there would be any impact from reusing non-committed memory - validation should not need to check the content of the memory - it would only need to compare locator values to find out conflicts. The release of the memory should be orthogonal to the freeing of the txn logs.
Think about it and let me know. If you still think there is a problem, I can comment out these calls for now with REVIEW comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reviewing deallocate_txn_log
, I agree that more changes are needed. I'll comment the current calls to rollback and add more comments on what should happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The release of the memory should be orthogonal to the freeing of the txn logs.
It's not orthogonal, because both the logs and the memory (i.e., undo versions, for committed txns) become obsolete exactly when any versions superseded by the associated txn are not visible to any present or future txns. Specifically, when the associated txn's commit_ts falls behind the watermark, we know that its redo versions can be safely applied to the shared view, that its redo log does not need to be applied to any new snapshots, and that its undo versions are not visible to any present or future txns. (Later, we also might need to apply the undo log to new snapshots if we allow the shared view to be updated ahead of the watermark, so this applies to the undo log as well.)
You're right, though, that we need only the txn logs for conflict detection, not the undo or redo versions. It's simply cleaner for now to treat aborted txns just like committed txns for GC, except that we free the redo versions rather than the undo versions. I currently don't think we should prioritize optimizing aborts, since as I said, if aborts are common then we have much bigger problems to worry about (i.e., contention management in general). So if aborted txns end up holding onto some physical and virtual memory a bit longer than strictly necessary, that's OK with me if it simplifies implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the above applies only to aborts, not to rollbacks (where we can safely rewind the allocation pointer). I also assume that txns can only abort (i.e., fail after submission) because of update conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't completely follow your argument.
Uncommitted allocations should only be relevant to the current transaction as far as their contents go. No other transaction should care about uncommitted data. So, de-allocating the memory when the transaction is known to have been aborted or rolled back should be safe - nobody should be accessing its content.
I was ok with commenting out the code because with the current implementation, the memory would get de-allocated twice given the current implementation of deallocate_txn_log
, so the slot_bitmap could end up corrupted (client already starts allocations but then deallocate_txn_log
marks some as freed). But note that with commit still happening, previously rolled back allocations may appear as committed due to later commits. However, all this won't matter until we'll actually start using the metadata for compaction. So, at least, there is no functional impact with these changes, just not an optimal use of memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we seem to be overloading the word "committed", so it could refer to either a txn that was successfully validated on the server, or some allocations in a chunk that are safely behind the "committed" pointer. I'm not sure which you are referring to in the above (and I think we need to be careful about this overloading).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it makes much of a difference. The original implementation (before I commented out the rollbacks) ensured that chunk allocations could only end up behind the committed pointer as a result of successful server validation. Any server decisions other than a successful validation would rollback the allocations (this is no longer the case now that I commented those out). So within this framework an allocation couldn't be behind the committed pointer unless it was actually committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually decided to comment out the calls to commit as well, not just those to rollback. This is because there is no functional benefit right now from making those calls and the state that they describe is a lie if there are intermediate rollbacks or abortions. We can uncomment them or change them later when we do the GC work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll update the shared DB design doc next week with more details on the allocation, GC, and compaction algorithms, which should help clarify the metadata requirements.
@@ -138,3 +138,24 @@ uint8_t* base_memory_manager_t::get_address(address_offset_t memory_offset) cons | |||
|
|||
return memory_address; | |||
} | |||
|
|||
address_offset_t base_memory_manager_t::get_chunk_address_offset(address_offset_t memory_offset) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later when we convert to gaia_offset_t
, these conversions can be made simpler/faster (just take the high/low 16 bits, no mod or division required).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's unnecessary optimization that can be postponed until it is really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more about all this, it doesn't add up:
To save 4B by using 32bit offsets instead of 64bit ones we've had to round up allocation sizes to 64B multiples, which will give us about 32B wasted per allocation on average. So we're wasting 32B to be able to save 4B! How does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple reasons. First, note that we currently have a 16-byte object header, and that will only grow (I can easily see it expanding to at least 24 bytes). So you're already "wasting" 16 bytes/allocation. Furthermore, I don't think any real-world object payload (i.e., references + flatbuffer) would fit in less than 16 bytes. So it already doesn't make sense to allow <32B minimum total object size. Forcing the alignment unit to be the same as the minimum allocation unit means we can track occupancy in the same units as object offsets, which simplifies our metadata considerably (e.g., you can just take the difference between the starting offset of an object and the next starting offset, in alignment units, to find the allocated size of that object in minimum allocation units).
A few more issues: if we used 8-byte alignment, the bitmaps tracking the occupancy of each slot in the chunk would have 8x higher overhead (64KB, which doesn't fit in L1d cache). We would lose guaranteed cache-line alignment for small objects, so would incur twice as many cache misses on average when dereferencing offsets. We would risk false sharing for small objects if we ever embedded write locks in them. But the biggest win from 32-bit offsets as I see it is that it halves the memory requirements of client-side snapshots (and server-side shared view copies, if we maintain multiple copies as I'm considering doing). Even for read-only transactions, each new txn's COW locator segment mapping requires physical memory proportional to the size of the union of the combined write logs of all unapplied committed txns and the txn's own write log. (This may become less of an issue if we "roll back" a snapshot via undo logs in a userspace page fault handler.)
It's a good point that such a large allocation granularity gives up some of the memory savings of an arena allocator design (I think coarse allocation granularity is more beneficial for a size-class allocator). But I think it's worth it for us. Feel free to bring this up at the next DB meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better punchline: the space wasted by a coarse allocation granularity is wasted only once (since the data segment pages are shared by all clients). But the space saved by smaller offset sizes is multiplied by the number of currently active txns (assuming they all have about the same number of COW pages in their snapshots).
Another thing I didn't mention is that 32-bit offsets and 32-bit locators also cut txn log size approximately in half, which matters when we're forced to keep a lot of txn logs around due to long-running txns (although admittedly that matters less than having to keep all the obsolete versions around).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these considerations become moot if we end up having to manage more than 256GB of memory. And I don't think that's a very distant possibility. We can discuss this further on the 12th.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of them? Not the fact that only a minimum object size >= 32B makes sense, or the fact that using the same units for alignment and allocation simplifies allocation metadata, or the fact that cache-line alignment of small objects could potentially improve random read performance. The only thing that would change is that 32-bit offsets would no longer be possible, so space considerations would be less important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we're discussing the benefit of using 32 bit offsets. So the only considerations that are relevant are those that demonstrate an advantage. Once we can no longer use 32 bit offsets, all those considerations become moot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. My point was that 64-byte alignment and allocation granularity has some benefits independent of smaller offsets. Those considerations will not become moot if our address space can no longer fit into 32 bits, even if the absence of space savings from smaller offsets makes 64-byte allocation granularity less compelling overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all blocking issues have been addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes look good, thanks.
This change enables the use of new memory allocation interfaces in the client and server.
Highlights:
__func__
in arguments toretail_assert
has been removed, bitmaps have been changed fromuint64
arrays to arrays ofstd::atomic<uint64_t>
, and two bitmap operations have been re-implemented using builtin functions.Known issues:
Out-of-scope:
deallocate()
anddeallocate_chunk()
methods, but little support for compaction. Any further support for these operations will be added in future changes. The focus here is just to update the memory allocation to use the new interfaces.