-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Support large tables with FFI data keys #115
Comments
There is a BountySource bounty available to anybody who can solve this problem such that the custom hashtables referenced above could be retired in favor of a standard solution. |
So FFI data interning? Sounds like a pain. |
Sadly, the general case would be bulky and inherently inefficient when implemented like how it is suggested. Ie not particularly different from mirroring the key as it is done now (ie just use string as your lookup key). The code would be fairly intrusive, too. What about this: Allow storing Lua objects directly in ffi types. That way you can implement any exotic lookup structure you want (ie fitting tightly your use case). Modification to luajit would be relatively simple - extend gc to traverse ctypes, and implement somewhat tricky tweak to allow rematerialization of Lua type from cdata and the opposite in the JIT frontend - basically we need to emit a type guard and potentially fork a new trace whenever we rematerialize a Lua value from ctype (very similiar to table load). Users could now write highly specialized trie/b-trees/sparse tables which would be still pretty indistinguishable from Lua tables. However, there would be far more uses for this. C types could now anchor other c types, currently a frequent bug as anchoring is not intuitive. A lot of ugly warts of LuaJIT codebase could be discarded - complex and bug-prone C function recording present in LuaJIT can be in some instances replaced by simple Lua wrappers which would handle guards and Lua type marshalling instead, and then simply FFI call to given C function. This is already possible in select cases (such as string handling functions, as those usually just return strings), but it can't be done universally, yet. Obviously, all of this would remain unsafe - it would be still user's responsibility to not instantiate a dangling Lua GCval pointer at any given time (because gc could stomp on it), and they'd have to handle barriers too, as we'd emit those only with explicit stores, but could not in cases like memcpy or a cast pointer. Let me know what you think about this. |
@katlogic The latest implementation we are using is snabbco/snabb#722. This is an FFI hashtable written in Lua. The lookup function being written in Lua seems problematic for production use, prone to side-traces, and so we are gradually adopting a lookup function written in assembler with the Lua-mode DynASM. The assembler version is currently optimized for loading values from outside of cache i.e. it aims to exploit out-of-order execution to put all relevant loads "in flight" at the same time. Could be that a different implementation would make sense for workloads with more temporal locality and expect to be able to hit the cache at some level. See Igalia/snabb#97 for more background. So bringing that on topic here... Could LuaJIT somehow provide us with a table that supports this use case? Are the LuaJIT hashtable routines already as good or better for our workloads than the ones we are writing ourselves? If so that could save us work and also benefit other applications. Having said that: the software we are writing has hard-realtime constraints on the order of 100 microseconds and so any interaction with the garbage collector seems likely to be off-limits. So the implementation you sketch does not sound like it would be suitable for our application. Could also be that we are fundamentally better off with hand-coded hashtables that suit our specific workloads -- I suppose that is the question I am hoping to answer here. |
@katlogic reading your comment again I suppose the feedback is that it probably does make sense to stick with the strategy we have i.e. implement special-case tables in FFI without integrating them with LuaJIT internals. I close this issue for now but we could always reopen it if things look different in the future. |
You don't need to do that. Write the lookup function in C, and call it via FFI. Do this always whenever you encounter trace explosion.
This sounds extremely cumbersome, but same solution as I stated above.
LuaJIT code emits gc steps at predictable times, namely when there is allocation. Simply avoid allocations, and you won't be hit by gc steps. It should be noted that Lua(JIT) internal table implementation invokes allocator as well, whenever it needs to resize the table.
No, Lua tables are extremely generic, as they cater to a lot of uses. They obviously don't fit targeted workloads pretty well, but must stay general as there are simply too many users. What you're asking for boils down to user-definable hashing function for table indexing AND/OR arbitrarily-sized keys - both of which would be value-owned, not separate values on their own. That would break several core assumptions made all over LuaJIT internals. Anyway, what SnabbSwitch would benefit from seems to be be something more akin to compressed trie, where the prefix levels store the keys as they go (so keys themselves are prefix-compressed when shared with a lot of objects). Obviously this is incredibly specific and not particularly useful to other users. |
@katlogic Thank you very much for the detailed feedback! |
Implement emit_jmp() #define.
Feature request: extend Lua tables to provide a native solution for large tables based on FFI data keys.
The solution would render these custom hashtable implementations/discussions obsolete:
The basic requirements would be to allow FFI data for keys, to support millions of elements, and to be fast enough to be a real alternative to the above (including soft-realtime requirements re: GC sweeps).
To solve this once-and-for-all would seem very satisfying since this is a common need and tables are traditionally Lua's big strength.
The text was updated successfully, but these errors were encountered: