Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GAIAPLAT-1752] INDEX: eliminate memory allocation when garbage collecting offsets #1124

Merged
merged 10 commits into from
Dec 20, 2021

Conversation

yiwen-wong
Copy link

Removed unordered_set and replaced the with fixed size arrays/buffers for garbage collection of offset for indices.

Also opportunistically switched the catalog integrity check against dropped types to std::vector. Note: memory allocation has not been removed from that path.

…cting offsets from index

yeet unordered_set outta index_builder
@senderista
Copy link
Contributor

This has reminded me that I need to finally get rid of the unordered_set in validate_txn() :)

Copy link
Contributor

@senderista senderista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just be able to use std::vector instead of std::array everywhere, for simpler code and no hardcoded array sizes.

production/db/core/src/index_builder.cpp Outdated Show resolved Hide resolved
{
const auto& log_record = records.log_records[i];
gaia_offset_t offset = deallocate_new_offsets ? log_record.new_offset : log_record.old_offset;
std::array<gaia_offset_t, c_offset_buffer_size> collected_offsets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code would be simpler if you used std::vector instead of std::array, and instead of maintaining parallel arrays of offsets and their types, you just used a single vector of (offset, type) pairs. That would allow you to just use push_back() to add elements instead of maintaining a "last index" variable, and you wouldn't need to keep the two arrays' current insert positions in sync. You also wouldn't need to hardcode the array size.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to remove any heap allocations whatsoever the size needs to be hard coded and keep track of the number of elements. At that point the code is going to be similarly complicated.

Copy link
Contributor

@senderista senderista Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "removing any heap allocations whatsoever" is a reasonable goal; this is not an embedded or hard real-time system. We should just try to minimize dynamic allocations in hot paths. If you don't call shrink_to_fit() then std::vector will never have to reallocate if you don't exceed its max size, and you can call reserve() or the sized constructor to allocate the maximum "reasonable" size up front, and let the vector automatically handle exceptional cases that overflow that size. This will normally perform only one dynamic allocation per instance, so it is nearly as efficient as stack-allocating using std::array, except that you don't need special code to handle overflow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility: in C++17, you can use a stack-allocated or static thread_local fixed-size buffer with std::vector via std::pmr::monotonic_buffer_resource. Example from https://www.cppstories.com/2020/06/pmr-hacking.html/:

#include <iostream>
#include <memory_resource>   // pmr core types
#include <vector>            // pmr::vector

int main() {
    char buffer[64] = {}; // a small buffer on the stack
    std::fill_n(std::begin(buffer), std::size(buffer) - 1, '_');
    std::cout << buffer << '\n';

    std::pmr::monotonic_buffer_resource pool{std::data(buffer), std::size(buffer)};

    std::pmr::vector<char> vec{ &pool };
    for (char ch = 'a'; ch <= 'z'; ++ch)
        vec.push_back(ch);

    std::cout << buffer << '\n';
}

(Note that I'm not suggesting this is actually warranted in this case.)

production/db/core/src/index_builder.cpp Outdated Show resolved Hide resolved
production/db/core/src/index_builder.cpp Outdated Show resolved Hide resolved
gaia_type_t indexed_type = it.second->table_type();

// This index does not contain any of the deleted offsets.
if (std::find(offset_types.begin(), offset_types.end(), indexed_type) == offset_types.end())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you used std::find_if, I think you could use (offset, type) pairs and the code could be just as concise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that remove_entries_with_offsets could be updated to also work with an array of pairs, so that only on type of array is needed.

@@ -48,6 +49,8 @@ class index_writer_guard_t
T_structure& m_data;
};

constexpr size_t c_offset_buffer_size = 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any need to hardcode the array size if you just use std::vector.

@@ -42,7 +42,7 @@ void index_t<T_structure, T_iterator>::remove_index_entry_with_offsets(const std

for (auto it = m_data.begin(); it != m_data.end();)
{
if (offsets.find(it->second.offset) != offsets.end())
if (std::find(offsets.begin(), offsets.end(), it->second.offset) != offsets.end())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, I think you could use std::find_if with a vector of (offset, type) pairs.

Copy link
Contributor

@simone-gaia simone-gaia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (don't have anything to add to the given comments).

const auto& log_record = records.log_records[i];
gaia_offset_t offset = deallocate_new_offsets ? log_record.new_offset : log_record.old_offset;
std::array<gaia_offset_t, c_offset_buffer_size> collected_offsets;
std::array<gaia_type_t, c_offset_buffer_size> offset_types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need two arrays or can you just use an array of pairs to hold two values for each entry?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array of pairs is ok, memory layout is better for two separate arrays.

Copy link
Contributor

@senderista senderista Dec 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, "array of structs" can be cheaper or more expensive WRT cache efficiency than "struct of arrays". It all depends on the access patterns (e.g. OLTP row-oriented tuple storage vs. OLAP column-oriented tuple storage).

switch (it.second->type())
{
case catalog::index_type_t::range:
remove_entries_with_offsets<range_index_t>(it.second.get(), collected_offsets, records.begin_ts);
Copy link
Contributor

@LaurentiuCristofor LaurentiuCristofor Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused - doesn't collected_offsets contain offsets of different types? Why would we pass the full array to an index of a specific type?

Perhaps a better approach is to build a map of types to offset arrays/vectors so that we can only pass relevant offsets in these calls. Building that map can be done with trivial changes to the code that fills the 2 arrays.

Copy link
Contributor

@senderista senderista Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if the offset array is small enough it would be reasonable to just scan the whole array (assuming it contains offset/type pairs) and filter out offsets with inapplicable types. This would be really concise with C++20 ranges but still wouldn't be much code in a for loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset array is small limited to 32 records. Passing the type does not provide any benefit over straight checking against offset: if the type is correct, we still need to do another comparison, if it is wrong, we already did one comparison. Comparing straight against the offset is one comparison regardless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a fragile assumption. If you rely on the size being small, then that should be statically asserted somewhere, otherwise someone can easily edit the constant value and bump it to a much larger value than you expect to have to deal with in this code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 is a somewhat arbitrary size set by me. I have not done any benchmarking and picked this value because it seems reasonable. I think it's fine if someone bumps it up (in fact my guess is the performance at 128 or 256 elements is acceptable). The important thing is that the array is bounded at some size.

The dangerous case for gc still remains on 'hot' indexes with many, many entries, and I have not addressed this danger. That case would require a different strategy altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add more comments to the constant declaration and, if possible, some static_asserts to document our expectations on the size of these arrays. Their length is declared in one place and the reliance on that length being small happens in a different place. It's very easy for someone to update the constant without being aware of the reliance on a small value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the offset array is small is rather irrelevant given that the number of arrays you can batch here is not really bounded.

Furthermore, any comparison that you avoid once when you build your array will add overhead to the find operation executed for all indexes of a different type. The penalty of including offsets of wrong type gets multiplied by the number of indexes of different type.

I.e.: Let's say you collect 2 values each for 8 different types. You avoided 16 type comparisons during this collection. But now for each index, you'll perform a linear search through a 16 element array instead of performing a search through a 2 element array (or no search at all for indexes of different types). Your performance is 8 times worse in this example!

This is why I'd rather see an implementation done with no regard to memory allocation and then look to see if it can be optimized further to remove that memory allocation. As it is now, you're probably doing more operations doing linear array searches than you'd do allocating memory for some vectors. This is premature optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the important worst-case bound here is not on the size of a single array, but on the work done by a thread in a single GC pass. So far I've avoided bounding per-thread GC work in favor of approximating fairness by GC handoff on contention. But if there's no opportunity for contention detection and backoff in this case, we may need to design fairness explicitly into the protocol, which implies batching work across GC threads. I think I need to understand the possible worst-case scenarios better here before commenting further.

I'm not terribly worried about optimizing comparisons, given that basically any computation done in an already-resident cache line is free. Indirections from a mapping structure could easily swamp the overhead of unnecessary comparisons. But of course we don't want to scan thousands or millions of entries for a single entry of a given type. We could possibly optimize this without introducing expensive mappings, e.g. by sorting the (offset, type) pairs by type and binary searching for the first occurrence of a type. Seems like maybe this would be better discussed in a meeting, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to clarify, I'm not worried at all about linear searches of an array that fits into 1 or 2 cache lines. Those comparisons are effectively free (once you've taken the cache miss). Performance will definitely not be "8 times worse" for a 16-element array than a 2-element array. But I do think we could safely use much larger arrays if we sorted (offset, type) pairs by type and binary-searched for the first occurrence of a type.

Copy link
Contributor

@LaurentiuCristofor LaurentiuCristofor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left several comments and suggestions, some of which apply to the original approach too - my apologies for not making those comments when the code was originally introduced.

@yiwen-wong
Copy link
Author

#1151

Needed to do a lot of rewriting to eliminate whole index scans.

@senderista
Copy link
Contributor

Has this PR been superseded by #1151?

@yiwen-wong
Copy link
Author

Has this PR been superseded by #1151?

#1151 is on top of this PR, but basically yes, I need to merge that one on this one.

Copy link
Contributor

@senderista senderista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I still think the data structure design should be revisited, but we can do that after this is merged, given that another PR depends on it.

…cting offsets (#1151)

* [GAIAPLAT-1810] Do not scan entire index structure when garbage-collecting offsets

* Addressed one comment.

Co-authored-by: yiwen-wong <[email protected]>
@yiwen-wong yiwen-wong merged commit f313924 into master Dec 20, 2021
@yiwen-wong yiwen-wong deleted the yiwen_remove_unordered_set branch December 20, 2021 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants