Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize join probe stage if one hash table key has many entries #8295

Closed
windtalker opened this issue Nov 2, 2023 · 0 comments · Fixed by #8320
Closed

Optimize join probe stage if one hash table key has many entries #8295

windtalker opened this issue Nov 2, 2023 · 0 comments · Fixed by #8320
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@windtalker
Copy link
Contributor

Enhancement

In TiFlash, the entry of a join hash table is <key, RowRefList>, where RowRefList is a list of the rows that has the same key, and each element if the RowRefList is a pair of <block_index, row_index>. During hash join probe stage, if a probe row has an existing key in the hash table, it will combine all the matched rows in the RowRefList using AddFound, inside AddFound, it just iterator over the RowRefList and insert the data row by row

for (auto current = &static_cast<const typename Map::mapped_type::Base_t &>(it->getMapped());
current != nullptr;
current = current->next)
{
for (size_t j = 0; j < num_columns_to_add; ++j)
added_columns[j]->insertFrom(
*current->block->getByPosition(right_indexes[j]).column.get(),
current->row_num);
}

If the list is very long (~10000), it will take a lot if time in AddFound. We need to find a way to optimize this implementation.

A straightforward optimization is to cache AddFound result of such key, so next time when a probe row which has the same key comes, we can just use insertRange to batch insert the data instead of iterator over RowRefList and insert the data row by row

@windtalker windtalker added the type/enhancement The issue or PR belongs to an enhancement. label Nov 2, 2023
ti-chi-bot bot pushed a commit that referenced this issue Nov 22, 2023
windtalker added a commit to windtalker/tiflash that referenced this issue Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant