Queryable Sub-Trees #22

vassudanagunta · 2018-10-10T00:33:03Z

I see that the implementation is recursive, but this isn't exposed in the public API.

For example:

Get(k []byte) is defined on *Node, but the only Node that is publicly accessible is the root node. It would be powerful to be able to make queries relative to any Node (e.g. Get("xyz") on the "abc" Node finding "abcxyz").
The docs state "Iterator is used to return an iterator at the given node to walk the tree", but the only possible "given node" is the root node.

I might be able to figure it out reverse engineering the code, but I'm hoping the maintainers know whether this is something that is (a) already supported; just need to provide public access to non-root nodes, (b) not supported but could be with some reasonalble changes or (c) technically not possible given the current implementation.

This is a great library. I'm trying to do some great open source code with it.

The text was updated successfully, but these errors were encountered:

vassudanagunta · 2019-05-23T01:12:07Z

@banks, since you just worked on the code, and @i0rek and @kyhavlov since you just reviewed said code, might any of you have an idea?

banks · 2019-05-23T10:42:15Z

Hey @vassudanagunta,

Yeah the API could certainly be more flexible in general - it's really the minimum needed for specific use case (go-memdb).

That said, a prefix based lookup can in most cases give you essentially the same thing - it doesn't expose a subtree's actual node directly as it's wrapped in the iterator but generally allows you to walk any subset of the tree from a given node. Given that the only thing you can do with a Node currently is further prefix/exact match based lookups it's kinda of equivalent but if you have a use case where exposing the actual Node API of a subtree is convenient then it's probably not hard to add.

The Iterator itself has a reference to the node that was sought with any call to SeekPrefix so you could just make it return that node explicitly. I've not thought through whether this is a good idea or not in detail but off the to of my head it seems reasonable and likely a smallish change.

vassudanagunta · 2019-05-23T18:06:53Z

Thanks for the detail reply, @banks.

My use-case is fast prefix-based lookups of file/URL paths, as well as glob matching, which I can implement by seeking to glob's wildcard-free prefix, then walking the tree from there to find matches on the wildcard-containing suffix.

I will often have to do repeated lookups (not walks) relative to a specific node (e.g. a directory). You might call these subprefix lookups. Having to do it from the root every time means several points of redundancy: (1) having to concatenate the prefix (the directory/base path) and the subprefix (the relative path), (2) all the redundant string-[]byte conversions and then (3) go-immutable-radix having to rewalk the tree down to the prefix, even though it already did that for me many times just a microsecond ago.

Being a Go newb and only an open source programmer on the side at that, I'm not sure which makes most sense:

The cost is relatively insignificant; just pay it! (context: this is for Hugo, must support hundreds/thousand/sometimes tens of thousands of paths)
Update go-immutable-radix to provide a way of getting a public handle on an arbitrary Node. (I could do that, with tests, and submit a PR)
Roll my own file-path optimized radix tree-like index, because for file paths "/" nodes are special.

banks · 2019-05-24T16:01:41Z

I think that use case sounds legit! It would almost be easy - just adding a `Node()` method to the iterator in iter.go that returns `i.node` would kinda work for some of your examples. The awkward part is that because the iterator is only designed to iterate the leaves and not the internal nodes, it would depend on the struture of your keys and which nodes have been collapsed by prefix compression as to exactly what worked which is a bit ugly to expose to the user to reason about. It might be possible to use rawIterator (raw_iter.go) to expose the nodes in a nicer way. Overall though whether it's worth trying to build a nice-enough API that is also more efficient that just doing the lookup every time is a hard call - I'd probably not do that until I had real benchmark data to show it was important unless you want to do it just for the learning experience!

…

On Thu, May 23, 2019 at 7:07 PM Vas Sudanagunta ***@***.***> wrote: Thanks for the detail reply, @banks <https://github.com/banks>. My use-case is fast prefix-based lookups of file/URL paths, as well as glob matching, which I can implement by seeking to glob's wildcard-free prefix, then walking the tree from there to find matches on the wildcard-containing suffix. I will often have to do repeated lookups (not walks) relative to a specific node (e.g. a directory). You might call these *subprefix* lookups. Having to do it from the root every time means two points of redundancy: (1) having to concatenate the prefix (the directory/base path) and the subprefix (the relative path) and then (2) go-immutable-radix having to rewalk the tree down to the prefix, even though it already did that for me many times just a microsecond ago. Being a Go newb and only a programmer on the side at that, I'm not sure which makes most sense: 1. The cost is relatively insignificant; just pay it! 2. Update go-immutable-radix to provide a way of getting a public handle on an arbitrary Node. (I could do that, with tests, and submit a PR) 3. Roll my own file-path optimized radix tree-like index, because for file paths "/" nodes are special. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22?email_source=notifications&email_token=AAA5QU7EQ27MSIWNNZT2Z43PW3MMHA5CNFSM4F2RPA2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWDA64Q#issuecomment-495325042>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAA5QU72LE5GROST7YKQVWDPW3MMHANCNFSM4F2RPA2A> .

vassudanagunta · 2019-05-25T05:51:09Z

@banks I have too many things to learn! I wanted to know if it would be relatively trivial (It didn't seem trivial to figure out if it would be trivial; perhaps that should have been my first clue. Thanks again for your help and for your work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queryable Sub-Trees #22

Queryable Sub-Trees #22

vassudanagunta commented Oct 10, 2018 •

edited

Loading

vassudanagunta commented May 23, 2019

banks commented May 23, 2019

vassudanagunta commented May 23, 2019 •

edited

Loading

banks commented May 24, 2019 via email

vassudanagunta commented May 25, 2019

Queryable Sub-Trees #22

Queryable Sub-Trees #22

Comments

vassudanagunta commented Oct 10, 2018 • edited Loading

vassudanagunta commented May 23, 2019

banks commented May 23, 2019

vassudanagunta commented May 23, 2019 • edited Loading

banks commented May 24, 2019 via email

vassudanagunta commented May 25, 2019

vassudanagunta commented Oct 10, 2018 •

edited

Loading

vassudanagunta commented May 23, 2019 •

edited

Loading