-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about efficiency (lazy loading) #111
Comments
TLDR; what you describe is not supported currently.
That is correct, though its worth noting that what is output is a vec of references, not owned values, i.e., Although it would be possible to have an iterator-based approach that only yields one So, I guess, there are two potential feature requests out of this:
I suppose that 2) is possible; 1) however will be quite challenging, as that would probably need to be an entirely new library/API and would need to handle the JSON parsing. For 1), you might be able to re-use the JSONPath parser from |
Makes sense, thanks. Do you think an intermediary solution for the second could be possible? If the whole json object fits into memory, and there are a lot of matches but I only want the Nth one, stop looking after you find it? You could basically have the same API as the current NodeList (first, get(index), slice, ...) but just stop evaluating once the requirements are met. No explicit Iterator that you can lazily query with, but rather a "fetch what I want and then stop" I understand that the first one (streaming solution) is a lot of work though. Was hoping Serde would have enough support for it already, but I'm not that familiar with the crate. |
It is certainly possible, but it would need a separate API, so will still be some work to figure out the implementation. The current implementation takes the inputted JSONPath, parses it into an abstract syntax tree (which is what the It uses a recursive approach which has the advantage of not needing to hold onto any state to produce a correct result. But the downside is that the resulting API can't provide the efficiency wins that you're describing.
I would really push here to just have an API like so: impl JsonPath {
fn query_iter(value: &serde_json::Value) -> QueryIter {
/* ... */
}
}
struct QueryIter { /* ... */ }
impl Iterator for QueryIter {
type Item = &Value;
/* ... */
} That is obviously oversimplified, but the idea is basically an API Anyhow, I don't know that I will have time to take an attempt at this any time soon. Perhaps if I get some spare time over the holidays, but unfortunately I can't guarantee. |
This issue is more of a question about the way the library works. If the answer is that this is currently not supported, consider it a feature request instead.
JsonPath::query()
seems to return aVec<Value>
, which to me suggests that it collects all results from a query at once and returns that.When querying very large files (e.g. a 100 GB Json file that doesn't fit in memory) or querying small amounts of data, this is not always ideal. For example, if I only want the first or second element, there's no point in storing all the others, or even trying to query them. Similarly, if I want to fetch all the items, it'd be useful to be able to loop over an
Iterator<Value>
that progressively retrieves more as I request them.Does this library have support for such queries? I'm essentially looking for a
LazyJsonPath
orJsonPath::lazy_query()
or whatever that only fetches the results I actually want, only when I request them, rather than everything upfront.Examples
Example 1: I only want the first element, so I'd expect
query()
to stop immediately once it finds a result.Example 2: I only want the second element, so I'd expect
query()
to stop immediately once it finds two results. It should not store the first but just skip over it and increase a counter, as it is pointless to me anyways.Example 3: I want all results, but they don't fit in memory all at once, so they should be streamed instead.
The text was updated successfully, but these errors were encountered: