-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Optimize S3 layout of Datafiles by expanding first character set of the hash #7128
Core: Optimize S3 layout of Datafiles by expanding first character set of the hash #7128
Conversation
89a7d66
to
aef9631
Compare
HASH_FUNC.hashBytes(fileName.getBytes(StandardCharsets.UTF_8)).asBytes(); | ||
|
||
StringBuilder hash = new StringBuilder(); | ||
for (int i = 0; i < 8; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make 8 a static variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any benefit in making it a config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess from what Daniel replied below the answer is no and 8 is more than enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, not required at the moment, though delta has this configurable, with a property called randomPrefixLength
defaulting to 2
code pointer .if required, can take this as a follow-up pr.
I'm not sure this is really necessary. While the distribution of the first character is relatively narrow, the reference you point to even states:
We currently have 8 characters in the hash so without confirmation from the S3 it seems like we're already following the recommendation. |
@danielcweeks this comes out of a test we did with one of our biggest customers, where the current object storage layout still resulted in throttling, but by extending it we were able to eliminate throttling. There are 2 issues with the current layout, here are some data we collected internally:
This indicates that ~12.5% of all file paths will start with “0”, which for a hex-encoded string should have been 6.25% since there are 16 characters in the set.
The issue here is that the default algorithm generates values up to With that being said, maybe we do not need all the character range to solve it. An alternative approach is to use something like a reverse hex, which we have also tested to be working. But this approach will maximize the entropy in any cases without the need to worry about S3 internals. I will ask S3 team to take a look and see if the proposed approach is necessary. |
aef9631
to
ff3a9f7
Compare
@@ -143,11 +143,11 @@ public String newDataLocation(PartitionSpec spec, StructLike partitionData, Stri | |||
|
|||
@Override | |||
public String newDataLocation(String filename) { | |||
int hash = HASH_FUNC.apply(filename); | |||
String hash = computeHash(filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to support hash type as a configuration ? This change might break if anyone having custom list operation based from object store on hex prefix (%08x).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's a good idea. If someone has a custom list operation, then they're relying on behaviors of the implementation and take the risk that something like this will break their implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by custom list operation? Listing partitions based on the path after the random prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @jackye1995 if a object storage path has millions of files, S3 listing usually takes very long time since it has to go thru all the keys recursively. This behavior can be improved by listing files with hash prefix in parallel like /data/00, /data/01, /data/0a etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for what Daniel says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a object storage path has millions of files, S3 listing usually takes very long time since it has to go thru all the keys recursively.
do you mean during orphan file removal? There could be better ways in making that work without listing if it's considered slow, like running a join against the inventory list.
ff3a9f7
to
e70c81a
Compare
@singhpk234, I opened a PR against your branch to fix a few remaining issues, like unnecessary object allocation and to improve the number of bits we get out of the hash. I think this is ready once that's merged: singhpk234#57 |
ACK, many thanks @rdblue, cherry-picked your commit. |
Thanks, @singhpk234! Merged. |
Hey @singhpk234 I think this was supposed to be 6 bytes . To avoid the |
@danielcweeks, since we added |
Didn't realize that was an option. Thanks @singhpk234 |
@singhpk234 thanks for the change, just want to confirm. Looks like the entropy prefix moved
|
nvm, just realized most of time the trailing |
About the change
Presently we use "%08x" to get, implies it will produce a 8 digits hex number, padded by preceding zeros. This effectively means the distribution will be skewed, also since we are relying on hex number our character set is any ways limited to [0-9][A-F].
This change attempts to use a wider character set as well as meantime making sure the distribution of first character remains as much uniform as possible.
Sample code for distribution :
Distribution of first character before (10M UUID String) :
it's being restricted to only [0-7]
hm[0] = 125099
hm[1] = 124953
hm[2] = 125440
hm[3] = 124705
hm[4] = 124777
hm[5] = 125103
hm[6] = 124908
hm[7] = 125015
Distribution of first character after this change (10M UUID String):
hm[0] = 15715
hm[1] = 15524
hm[2] = 15861
hm[3] = 15680
hm[4] = 15411
hm[5] = 15638
hm[6] = 19410
hm[7] = 19298
hm[8] = 19472
hm[9] = 19399
hm[A] = 15661
hm[B] = 15633
hm[C] = 15414
hm[D] = 15675
hm[E] = 15711
hm[F] = 15569
hm[G] = 15767
hm[H] = 15643
hm[I] = 15616
hm[J] = 15508
hm[K] = 15636
hm[L] = 15726
hm[M] = 15701
hm[N] = 15658
hm[O] = 15525
hm[P] = 15646
hm[Q] = 15686
hm[R] = 15666
hm[S] = 15675
hm[T] = 15521
hm[U] = 15569
hm[V] = 15613
hm[W] = 15398
hm[X] = 15797
hm[Y] = 15855
hm[Z] = 15620
hm[a] = 19318
hm[b] = 19460
hm[c] = 19579
hm[d] = 19673
hm[e] = 15790
hm[f] = 15687
hm[g] = 15622
hm[h] = 15833
hm[i] = 15693
hm[j] = 15547
hm[k] = 15725
hm[l] = 15521
hm[m] = 15911
hm[n] = 15468
hm[o] = 15579
hm[p] = 15753
hm[q] = 15594
hm[r] = 15723
hm[s] = 15628
hm[t] = 15433
hm[u] = 15645
hm[v] = 15544
hm[w] = 15761
hm[x] = 15524
hm[y] = 15565
hm[z] = 15527
More resources :