Skip to content

Commit

Permalink
Merge PR #722 (Add lib.ctable module) into max-next
Browse files Browse the repository at this point in the history
  • Loading branch information
eugeneia committed Feb 10, 2016
2 parents eae7da6 + 5b9400a commit 323902a
Show file tree
Hide file tree
Showing 3 changed files with 670 additions and 0 deletions.
2 changes: 2 additions & 0 deletions src/doc/genbook.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ $(cat ../apps/socket/README.md)
# Libraries
$(cat ../lib/README.md)
## Hardware
$(cat ../lib/hardware/README.md)
Expand Down
202 changes: 202 additions & 0 deletions src/lib/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
## Specialized data structures

### Ctable (lib.ctable)

A *ctable* is a hash table whose keys and values are instances of FFI
data types. In Lua parlance, an FFI value is a "cdata" value, hence the
name "ctable".

A ctable is parameterized for the specific types for its keys and
values. This allows for the table to be stored in an efficient manner.
Adding an entry to a ctable will copy the value into the table.
Logically, the table "owns" the value. Lookup can either return a
pointer to the value in the table, or copy the value into a
user-supplied buffer, depending on what is most convenient for the user.

As an implementation detail, the table is stored as an open-addressed
robin-hood hash table with linear probing. This means that to look up a
key in the table, we take its hash value (using a user-supplied hash
function), map that hash value to an index into the table by scaling the
hash to the table size, and then scan forward in the table until we find
an entry whose hash value is greater than or equal to the hash in
question. Each entry stores its hash value, and empty entries have a
hash of `0xFFFFFFFF`. If the entry's hash matches and the entry's key
is equal to the one we are looking for, then we have our match. If the
entry's hash is greater than our hash, then we have a failure. Hash
collisions are possible as well of course; in that case we continue
scanning forward.

The distance travelled while scanning for the matching hash is known as
the *displacement*. The table measures its maximum displacement, for a
number of purposes, but you might be interested to know that a maximum
displacement for a table with 2 million entries and a 40% load factor is
around 8 or 9. Smaller tables will have smaller maximum displacements.

The ctable has two lookup interfaces. One will perform the lookup as
described above, scanning through the hash table in place. The other
will fetch all entries within the maximum displacement into a buffer,
then do a branchless binary search over that buffer. This second
streaming lookup can also fetch entries for multiple keys in one go.
This can amortize the cost of a round-trip to RAM, in the case where you
expect to miss cache for every lookup.

To create a ctable, first create a parameters table specifying the key
and value types, along with any other options. Then call `ctable.new`
on those parameters. For example:

```lua
local ctable = require('lib.ctable')
local ffi = require('ffi')
local params = {
key_type = ffi.typeof('uint32_t'),
value_type = ffi.typeof('int32_t[6]'),
hash_fn = ctable.hash_i32,
max_occupancy_rate = 0.4,
initial_size = math.ceil(occupancy / 0.4)
}
local ctab = ctable.new(params)
```

— Function **ctable.new** *parameters*

Create a new ctable. *parameters* is a table of key/value pairs. The
following keys are required:

* `key_type`: An FFI type (LuaJIT "ctype") for keys in this table.
* `value_type`: An FFI type (LuaJT "ctype") for values in this table.
* `hash_fn`: A function that takes a key and returns a hash value.

Hash values are unsigned 32-bit integers in the range `[0,
0xFFFFFFFF)`. That is to say, `0xFFFFFFFF` is the only unsigned 32-bit
integer that is not a valid hash value. The `hash_fn` must return a
hash value in the correct range.

Optional entries that may be present in the *parameters* table include:

* `initial_size`: The initial size of the hash table, including free
space. Defaults to 8 slots.
* `max_occupancy_rate`: The maximum ratio of `occupancy/size`, where
`occupancy` denotes the number of entries in the table, and `size` is
the total table size including free entries. Trying to add an entry
to a "full" table will cause the table to grow in size by a factor of
2. Defaults to 0.9, for a 90% maximum occupancy ratio.
* `min_occupancy_rate`: Minimum ratio of `occupancy/size`. Removing an
entry from an "empty" table will shrink the table.

#### Methods

Users interact with a ctable through methods. In these method
descriptions, the object on the left-hand-side of the method invocation
should be a ctable.

— Method **:resize** *size*

Resize the ctable to have *size* total entries, including empty space.

— Method **:insert** *hash*, *key*, *value*, *updates_allowed*

An internal helper method that does the bulk of updates to hash table.
*hash* is the hash of *key*. This method takes the hash as an explicit
parameter because it is used when resizing the table, and that way we
avoid calling the hash function in that case. *key* and *value* are FFI
values for the key and the value, of course.

*updates_allowed* is an optional parameter. If not present or false,
then the `:insert` method will raise an error if the *key* is already
present in the table. If *updates_allowed* is the string `"required"`,
then an error will be raised if *key* is *not* already in the table.
Any other true value allows updates but does not require them. An
update will replace the existing entry in the table.

Returns the index of the inserted entry.

— Method **:add** *key*, *value*, *updates_allowed*

Add an entry to the ctable, returning the index of the added entry. See
the documentation for `:insert` for a description of the parameters.

— Method **:update** *key*, *value*

Update the entry in a ctable with the key *key* to have the new value
*value*. Throw an error if *key* is not present in the table.

— Method **:lookup_ptr** *key*

Look up *key* in the table, and if found return a pointer to the entry.
Return nil if the value is not found.

An entry pointer has three fields: the `hash` value, which must not be
modified; the `key` itself; and the `value`. Access them as usual in
Lua:

```lua
local ptr = ctab:lookup(key)
if ptr then print(ptr.value) end
```

Note that pointers are only valid until the next modification of a
table.

— Method **:lookup_and_copy** *key*, *entry*

Look up *key* in the table, and if found, copy that entry into *entry*
and return true. Otherwise return false.

— Method **:remove_ptr** *entry*

Remove an entry from a ctable. *entry* should be a pointer that points
into the table. Note that pointers are only valid until the next
modification of a table.

— Method **:remove** *key*, *missing_allowed*

Remove an entry from a ctable, keyed by *key*.

Return true if we actually do find a value and remove it. Otherwise if
no entry is found in the table and *missing_allowed* is true, then
return false. Otherwise raise an error.

— Method **:selfcheck**

Run an expensive internal diagnostic to verify that the table's internal
invariants are fulfilled.

— Method **:dump**

Print out the entries in a table. Can be expensive if the table is
large.

— Method **:iterate**

Return an iterator for use by `for in`. For example:

```lua
for entry in ctab:iterate() do
print(entry.key, entry.value)
end
```

#### Hash functions

Any hash function will do, as long as it produces values in the `[0,
0xFFFFFFFF)` range. In practice we include some functions for hashing
byte sequences of some common small lengths.

— Function **ctable.hash_32** *number*

Hash a 32-bit integer. As a `hash_fn` parameter, this will only work if
your key type's Lua representation is a Lua number. For example, use
`hash_32` on `ffi.typeof('uint32_t')`, but use `hashv_32` on
`ffi.typeof('uint8_t[4]')`.

— Function **ctable.hashv_32** *ptr*

Hash the first 32 bits of a byte sequence.

— Function **ctable.hashv_48** *ptr*

Hash the first 48 bits of a byte sequence.

— Function **ctable.hashv_64** *ptr*

Hash the first 64 bits of a byte sequence.
Loading

0 comments on commit 323902a

Please sign in to comment.