Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File sorting #1

Open
ChacheGS opened this issue Oct 24, 2018 · 4 comments
Open

File sorting #1

ChacheGS opened this issue Oct 24, 2018 · 4 comments

Comments

@ChacheGS
Copy link

Hi,
I just found this library searching for a way to generate time based sortable UUIDs that don't give up sensitive information. I'm using the Ksuid to generate file names and provide file rotation.

So I have a bunch of files, but it looks like new ones are being sorted in the middle of the rest, and if I understood the blog post correctly,

KSUID provides two fixed-length encodings: a 20-byte binary encoding and a 27-character base62 encoding. The lexicographic ordering property is provided by encoding the timestamp using big endian byte ordering. The base62 encoding is tailored to map to the lexicographic ordering of characters in terms of their ASCII order.

sorting the files either by name or by reverse creation date should yield the same result, but it's not the case:

$ ls -lrt
total 1100
-rw-rw-r-- 1 carlos carlos 74057 Oct 24 14:05 CF4SGtALPcpyPrVRmsmHu9A0JTG.json
-rw-rw-r-- 1 carlos carlos 80944 Oct 24 14:05 CF4SGpuMmJ9AlS9C9CEMUzJE79C.json
-rw-rw-r-- 1 carlos carlos 75810 Oct 24 14:05 CF4SGj9ACnJufyCBmtuQbLbeX1l.json
-rw-rw-r-- 1 carlos carlos 74125 Oct 24 14:05 CF4SGtNrdxOz6tv8QnY9Bf53LH5.json
-rw-rw-r-- 1 carlos carlos 80365 Oct 24 14:06 CF4SSrp3BXqfbWdPfjBL21yot5w.json
-rw-rw-r-- 1 carlos carlos 75167 Oct 24 14:06 CF4SSjwkE7wdlLndVrfZoDFftFw.json
-rw-rw-r-- 1 carlos carlos 78930 Oct 24 14:06 CF4SSgjzxAvnE6ae5UMan9BoA9C.json
-rw-rw-r-- 1 carlos carlos 75301 Oct 24 14:06 CF4SSsvONy3jQrPk6kYVWSLpeHY.json
-rw-rw-r-- 1 carlos carlos 74426 Oct 24 14:06 CF4SSvtdNJefKEbFyY9Cm6d5xkP.json
-rw-rw-r-- 1 carlos carlos 79777 Oct 24 14:07 CF4Secj1xufzRjuiKptyujkH1yY.json
-rw-rw-r-- 1 carlos carlos 77760 Oct 24 14:07 CF4SeYh7VeujDA1dK6ZZjV9Bc3U.json
-rw-rw-r-- 1 carlos carlos 74632 Oct 24 14:07 CF4SeVj2o9BF9CIRVAdkBDWzEK7.json
-rw-rw-r-- 1 carlos carlos 76893 Oct 24 14:07 CF4SefkthoDogD4VRMRhJk2LFxs.json
-rw-rw-r-- 1 carlos carlos 75185 Oct 24 14:07 CF4SeaNMrr7t9BoZNU5rGDXFoO5.json
-rw-rw-r-- 1 carlos carlos 19567 Oct 24 14:07 latest.json
$ ls -l               
total 1100
-rw-rw-r-- 1 carlos carlos 75185 Oct 24 14:07 CF4SeaNMrr7t9BoZNU5rGDXFoO5.json
-rw-rw-r-- 1 carlos carlos 79777 Oct 24 14:07 CF4Secj1xufzRjuiKptyujkH1yY.json
-rw-rw-r-- 1 carlos carlos 76893 Oct 24 14:07 CF4SefkthoDogD4VRMRhJk2LFxs.json
-rw-rw-r-- 1 carlos carlos 74632 Oct 24 14:07 CF4SeVj2o9BF9CIRVAdkBDWzEK7.json
-rw-rw-r-- 1 carlos carlos 77760 Oct 24 14:07 CF4SeYh7VeujDA1dK6ZZjV9Bc3U.json
-rw-rw-r-- 1 carlos carlos 75810 Oct 24 14:05 CF4SGj9ACnJufyCBmtuQbLbeX1l.json
-rw-rw-r-- 1 carlos carlos 80944 Oct 24 14:05 CF4SGpuMmJ9AlS9C9CEMUzJE79C.json
-rw-rw-r-- 1 carlos carlos 74057 Oct 24 14:05 CF4SGtALPcpyPrVRmsmHu9A0JTG.json
-rw-rw-r-- 1 carlos carlos 74125 Oct 24 14:05 CF4SGtNrdxOz6tv8QnY9Bf53LH5.json
-rw-rw-r-- 1 carlos carlos 78930 Oct 24 14:06 CF4SSgjzxAvnE6ae5UMan9BoA9C.json
-rw-rw-r-- 1 carlos carlos 75167 Oct 24 14:06 CF4SSjwkE7wdlLndVrfZoDFftFw.json
-rw-rw-r-- 1 carlos carlos 80365 Oct 24 14:06 CF4SSrp3BXqfbWdPfjBL21yot5w.json
-rw-rw-r-- 1 carlos carlos 75301 Oct 24 14:06 CF4SSsvONy3jQrPk6kYVWSLpeHY.json
-rw-rw-r-- 1 carlos carlos 74426 Oct 24 14:06 CF4SSvtdNJefKEbFyY9Cm6d5xkP.json
-rw-rw-r-- 1 carlos carlos 19567 Oct 24 14:07 latest.json
$ diff <(ls -rt) <(ls)  # filenames only
1,2c1,5
< CF4SGtALPcpyPrVRmsmHu9A0JTG.json
< CF4SGpuMmJ9AlS9C9CEMUzJE79C.json
---
> CF4SeaNMrr7t9BoZNU5rGDXFoO5.json
> CF4Secj1xufzRjuiKptyujkH1yY.json
> CF4SefkthoDogD4VRMRhJk2LFxs.json
> CF4SeVj2o9BF9CIRVAdkBDWzEK7.json
> CF4SeYh7VeujDA1dK6ZZjV9Bc3U.json
3a7,8
> CF4SGpuMmJ9AlS9C9CEMUzJE79C.json
> CF4SGtALPcpyPrVRmsmHu9A0JTG.json
5,6d9
< CF4SSrp3BXqfbWdPfjBL21yot5w.json
< CF4SSjwkE7wdlLndVrfZoDFftFw.json
7a11,12
> CF4SSjwkE7wdlLndVrfZoDFftFw.json
> CF4SSrp3BXqfbWdPfjBL21yot5w.json
10,14d14
< CF4Secj1xufzRjuiKptyujkH1yY.json
< CF4SeYh7VeujDA1dK6ZZjV9Bc3U.json
< CF4SeVj2o9BF9CIRVAdkBDWzEK7.json
< CF4SefkthoDogD4VRMRhJk2LFxs.json
< CF4SeaNMrr7t9BoZNU5rGDXFoO5.json
@akhawaja
Copy link
Owner

Hi Carlos. The KSUID algorithm only lexically orders by the timestamp component. This means that if you generate one ID per second, the entire body of IDs will always be lexically sortable. With a higher time resolution, you could achieve more IDs per second and retain some sort of lexical order. What do you think if that is created as a configuration? Perhaps you can also elaborate your use case here.

Cheers!

@ChacheGS
Copy link
Author

ChacheGS commented Oct 26, 2018

Hi akhawaja. Thank you for your answer. I've read the blog post again and it's totally true that the 1 second resolution is there. But it also says that more resolution can be traded for less random bits.

The timestamp provides 1-second resolution, which we found to be acceptable for a broad range of use cases. If a higher resolution timestamp is desired, payload bits can be traded for more timestamp bits. While high-resolution timestamp support is not included in our implementation, it is backwards compatible. Any implementation which uses 32-bit timestamps can safely work with KSUIDs that use higher resolution timestamps.

Is that what you mean by "created as a configuration"?

Perhaps you can also elaborate your use case here.

Sure. I'm capturing the messages a 3rd party system sends into my application. The goal of this is to be able to replay what was received. As an easy way to keep the inputs sorted, I thought about giving the files a UUID name. The files are created with a fixed number of messages. If the sender sends in one burst 5x that number, I'll have 5 new files created at nearly the same moment. I don't need those files to be sorted to each other. In the ls outputs I sent earlier you can see that they are not sorted even by the second. Check the output of ls -l: the files created Oct 24 14:07 are listed before those created Oct 24 14:05 and Oct 24 14:06.

Thanks!

@akhawaja
Copy link
Owner

In the ls outputs I sent earlier you can see that they are not sorted even by the second. Check the output of ls -l: the files created Oct 24 14:07 are listed before those created Oct 24 14:05 and Oct 24 14:06.

Curious. I'll have to make some time to dig in a little deeper and understand why that is happening. Thank you for sharing details about your use case.

Is that what you mean by "created as a configuration"?

Yes, that is what I meant. I have not looked at this work in quite a while. I'll report back here when I am able to sit down and work on this.

@ChacheGS
Copy link
Author

I'll try to find some time and do some digging of my own. I'll PR if I find something. Thanks for your quick answers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants