Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LINEAR HASH use a HASH partitioned table, instead of a non-partitioned table #38450

Closed
mjonss opened this issue Oct 13, 2022 · 1 comment · Fixed by #38451
Closed

Make LINEAR HASH use a HASH partitioned table, instead of a non-partitioned table #38450

mjonss opened this issue Oct 13, 2022 · 1 comment · Fixed by #38451
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@mjonss
Copy link
Contributor

mjonss commented Oct 13, 2022

Enhancement

Currently a table created with PARTITION BY LINEAR HASH it is issuing a warning and ignoring the PARTITION BY and creates a non-partitioned table.

It would be better if only ignored the LINEAR part and created a partitioned table, just without the LINEAR keyword. It should still issue a warning that it ignored the LINEAR keyword.

@mjonss mjonss added the type/enhancement The issue or PR belongs to an enhancement. label Oct 13, 2022
@mjonss
Copy link
Contributor Author

mjonss commented Oct 14, 2022

A description of LINEAR HASH vs HASH for why not support LINEAR HASH, and just use the fallback HASH partitioning:

LINEAR HASH with power of 2 partitions (2,4,8,16...) is the same as HASH.

The difference is when the number of partitions is not equal to a power of 2 number:
Then the partition number is calculated from the integer-column as:

Pow2 :=  lowest power of 2 which is >= number of partitions
// So if number of partitions is 6, then Pow2 will be 8
Part_number := int_col MODULUS Pow2
if Part_number >= number of partitions
  Part_number := int_col MODULUS (Pow2 / 2)

As an example of some numbers:

partition number values
0 0,8,16,24
1 1,9,17,25
2 2,6,10,14,18,22,26,30
3 3,7,11,15,19,23,27,31
4 4,12,20,28
5 5,13,21,29

Which makes the distribution unbalanced, since all values where (col MODULUS Pow2) >= number of partitions will be "folded" into partition 2 and 3, making those partitions double the size of others.

The only benefit of LINEAR HASH vs HASH is when changing number of partitions where the full table data does not need to be copied to new partitions, but fewer partitions needs to be either merged or splitted.

Like if you use the example and going from 6 to 7 partitions, only partition 2 is splitted into a new partition 2 and the new partition 6:

partition number values
0,1 same
2 2,10,18,26
3,4,5 same
6 6,14,22,30

Or going from 6 to 5 partitions only merges partition 5 and 1 into a new partition 1:

partition number values
0 same
1 1,5,9,13,17,21,25,29
2,3,4 same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant