-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLiteTrie: use pygtrie for in memory traversal #21
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #21 +/- ##
==========================================
+ Coverage 79.50% 79.80% +0.29%
==========================================
Files 11 11
Lines 605 604 -1
Branches 106 105 -1
==========================================
+ Hits 481 482 +1
+ Misses 112 110 -2
Partials 12 12 ☔ View full report in Codecov by Sentry. |
def _node_factory_wrapper(path_conv, path, children, *value): | ||
return node_factory( | ||
path_conv, path, children, self._load(path, value[0] if value else None) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node factory value is defined as an optional positional arg in pygtrie (since there is a difference between a node having no value set and a node value being set to None
)
dvc-data and dvc tests pass with this PR |
trie = PyGTrie() | ||
for key, value in self.items(): | ||
trie[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this requires reading the whole trie into memory, right? That should become a problem at some size and kinda defeats the purpose a little bit it seems like. Also reading so much will likely become a problem on slower filesystems. Again, kinda defeats the general purpose of sqltrie.
After researching, what do you think about sql-based view as discussed in the previous PR? Just trying to understand if that would be the long term solution.
closing in favor of #24 |
Should fix iterative/dvc#9914
After looking at this some more we can just simplify it to use an in-memory trie specifically for traversal, and leave everything else as-is (so all other view operations still reflect the live db)
DVC main with this PR
DVC main without PR:
DVC 3.10