Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lib.ctable module. #722

Merged
merged 8 commits into from
Feb 10, 2016
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ CSRC = $(shell find . -regex '[^\#]*\.c' -not -regex './arch/.*' -printf '%P '
CHDR = $(shell find . -regex '[^\#]*\.h' -printf '%P ')
ASM = $(shell find . -regex '[^\#]*\.dasl' -printf '%P ')
ARCHSRC= $(shell find . -regex '^./arch/[^\#]*\.c' -printf '%P ')
RMSRC = $(shell find . -name README.md.src -printf '%P ')
MDSRC = $(shell find . -regex '[^\#]*\.md.src' -printf '%P ')
# regexp is to include program/foo but not program/foo/bar
PROGRAM = $(shell find program -regex '^[^/]+/[^/]+' -type d -printf '%P ')
# sort to eliminate potential duplicate of programs.inc
Expand All @@ -30,7 +30,7 @@ ARCHOBJ:= $(patsubst %.c,obj/%_c.o, $(ARCHSRC))
ASMOBJ := $(patsubst %.dasl,obj/%_dasl.o, $(ASM))
JITOBJS:= $(patsubst %,obj/jit_%.o,$(JITSRC))
EXTRAOBJS := obj/jit_tprof.o obj/jit_vmprof.o obj/strict.o
RMOBJS := $(patsubst %.src,%,$(RMSRC))
MDOBJS := $(patsubst %.src,%,$(MDSRC))
INCOBJ := $(patsubst %.inc,obj/%_inc.o, $(INCSRC))
EXE := bin/snabb $(patsubst %,bin/%,$(PROGRAM))

Expand Down Expand Up @@ -69,7 +69,7 @@ $(EXE): snabb bin
@echo -n "BINARY "
@ls -sh $@

markdown: $(RMOBJS)
markdown: $(MDOBJS)

test: $(TESTMODS) $(TESTSCRIPTS)

Expand Down Expand Up @@ -152,7 +152,7 @@ $(JITOBJS): obj/jit_%.o: ../lib/luajit/src/jit/%.lua $(OBJDIR)
$(Q) luajit -bg -n $(patsubst obj/jit_%.o, jit.%, $@) $< $@


$(RMOBJS): %: %.src
$(MDOBJS): %: %.src
$(E) "MARKDOWN $@"
$(Q) scripts/process-markdown $< > $@

Expand Down Expand Up @@ -206,8 +206,8 @@ clean:
$(Q)-rm -rf $(CLEAN)

mrproper: clean
$(E) "RM $(RMOBJS)"
$(Q)-rm -rf $(RMOBJS)
$(E) "RM $(MDOBJS)"
$(Q)-rm -rf $(MDOBJS)

benchmarks:
$(Q) (scripts/bench.sh)
Expand Down
4 changes: 4 additions & 0 deletions src/doc/genbook.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ $(cat ../lib/hardware/README.md)

$(cat ../lib/protocol/README.md)

## Specialized data structures

$(cat ../lib/README.ctable.md)

## Snabb NFV

$(cat ../program/snabbnfv/README.md)
Expand Down
198 changes: 198 additions & 0 deletions src/lib/README.ctable.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
### `ctable` (lib.ctable)

A ctable is a hash table whose keys and values are instances of FFI
data types. In Lua parlance, an FFI value is a "cdata" value, hence the
name "ctable".

A ctable is parameterized for the specific types for its keys and
values. This allows for the table to be stored in an efficient manner.
Adding an entry to a ctable will copy the value into the table.
Logically, the table "owns" the value. Lookup can either return a
pointer to the value in the table, or copy the value into a
user-supplied buffer, depending on what is most convenient for the user.

As an implementation detail, the table is stored as an open-addressed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate implementation details (these should be comments in the source) from user documentation

We can certainly remove implementation details from here, but the performance characteristics of this table are user-visible features and it's hard to explain them without describing how the table works. WDYT?

robin-hood hash table with linear probing. This means that to look up a
key in the table, we take its hash value (using a user-supplied hash
function), map that hash value to an index into the table by scaling the
hash to the table size, and then scan forward in the table until we find
an entry whose hash value is greater than or equal to the hash in
question. Each entry stores its hash value, and empty entries have a
hash of `0xFFFFFFFF`. If the entry's hash matches and the entry's key
is equal to the one we are looking for, then we have our match. If the
entry's hash is greater than our hash, then we have a failure. Hash
collisions are possible as well of course; in that case we continue
scanning forward.

The distance travelled while scanning for the matching hash is known as
the /displacement/. The table measures its maximum displacement, for a
number of purposes, but you might be interested to know that a maximum
displacement for a table with 2 million entries and a 40% load factor is
around 8 or 9. Smaller tables will have smaller maximum displacements.

The ctable has two lookup interfaces. One will perform the lookup as
described above, scanning through the hash table in place. The other
will fetch all entries within the maximum displacement into a buffer,
then do a branchless binary search over that buffer. This second
streaming lookup can also fetch entries for multiple keys in one go.
This can amortize the cost of a round-trip to RAM, in the case where you
expect to miss cache for every lookup.

To create a ctable, first create a parameters table specifying the key
and value types, along with any other options. Then call `ctable.new`
on those parameters. For example:

```lua
local ctable = require('lib.ctable')
local ffi = require('ffi')
local params = {
key_type = ffi.typeof('uint32_t'),
value_type = ffi.typeof('int32_t[6]'),
hash_fn = ctable.hash_i32,
max_occupancy_rate = 0.4,
initial_size = math.ceil(occupancy / 0.4)
}
local ctab = ctable.new(params)
```

— Function **ctable.new** *params*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepend method definitions with the class name. E.g. ctable:new instead of :new

Why is ctable:new a function? Shouldn't it be a method?

This is a function exported by the module. It is not a method on an object. I believe this documentation is correct; LMK if there are more concerns.


Create a new ctable. *params* is a table of key/value pairs. The
following keys are required:

* `key_type`: An FFI type for keys in this table.
* `value_type`: An FFI type for values in this table. (In the future,
`value_type` will be optional; a nil `value_type` will create a
set).
* `hash_fn`: A function that takes a key and returns a hash value.

Hash values are unsigned 32-bit integers in the range `[0,
0xFFFFFFFF)`. That is to say, `0xFFFFFFFF` is the only unsigned 32-bit
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a typo: [] indicate inclusive bounds, () indicate exclusive bounds. I think it's explained in the next sentence.

integer that is not a valid hash value. The `hash_fn` must return a
hash value in the correct range.

Optional entries that may be present in the *params* table include:

* `initial_size`: The initial size of the hash table, including free
space. Defaults to 8 slots.
* `max_occupancy_rate`: The maximum ratio of `occupancy/size`, where
`occupancy` denotes the number of entries in the table, and `size` is
the total table size including free entries. Trying to add an entry
to a "full" table will cause the table to grow in size by a factor of
2. Defaults to 0.9, for a 90% maximum occupancy ratio.
* `min_occupancy_rate`: Minimum ratio of `occupancy/size`. Removing an
entry from an "empty" table will shrink the table.

#### Methods

Users interact with a ctable through methods. In these method
descriptions, the object on the left-hand-side of the method invocation
should be a ctable.

— Method **:resize** *size*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepend method definitions with the class name. E.g. ctable:new instead of :new

So, new is a function, but there are many methods like resize. I was a bit worried though that there would be confusion between the module name and the class name, and for that reason left off the LHS on these definitions. In fact partly for that reason we don't mention the name "class" in these docs, as it doesn't really exist. What's the right solution here? Is this acceptable as I've described it above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex has made the precedent for OO style in Snabb and I have accepted/adopted his conventions because I felt they were good. In these terms the module ctable is a class, which would have a class method new to instantiate ctable objects. You can find examples for this in lib.protocol and its documentation. (Generally this is how I do it and what I recommend: look for prior examples in existing code and try to avoid breaking conventions where possible.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do the same, but in this particular case I have been really weirded out by the OO conventions of the protocol libraries; I find them distinctly non-snabby. Compare to core.packet for example. Sometimes I avoid using the protocol libraries just because I find them too weird!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to that I just noticed the hash functions of the ctable module, from an OO perspective these would probably be class methods as well? There is a precedent in lib.protocol where header classes have class and instance methods as well but the distinction is not made in the documentation. I guess this is a FIXME, I am somewhat out of answers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are definitely distinct from core.* (but I don't think that is necessarily an issue), imho the question is if the style works well for the API. But, we are digressing again. I say do as you see fit for now.

Edit: but please add the prefixes, e.g. “Function ctable.new ... returns a ctable.” and “Method ctable:foo ...”

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the hash_32 are class methods as they don't expect to be operating on a class.

At least in a different language (like python), new() would be a class method, returning an instance of the ctable class. But the hash functions don't return an instance of a class, just a value, so they are just functions that happen to be residing in this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more note:

Edit: but please add the prefixes, e.g. “Function ctable.new ... returns a ctable.” and “Method ctable:foo ...”

In Function ctable.new, the left hand side ctable is the module:

local ctable = require('lib.ctable')

But in “Method ctable:foo ...”, the left-hand-side is just a ctable. This method is not available otherwise! For example you can't get to this function except as a property of a ctable. Notably it is not an export of the `ctable' module. You sure you want to document it in these two ways?


Resize the ctable to have *size* total entries, including empty space.

— Method **:insert** *hash* *key* *value* *updates_allowed*

An internal helper method that does the bulk of updates to hash table.
*hash* is the hash of *key*. This method takes the hash as an explicit
parameter because it is used when resizing the table, and that way we
avoid calling the hash function in that case. *key* and *value* are FFI
values for the key and the value, of course.

*updates_allowed* is an optional parameter. If not present or false,
then the `:insert` method will raise an error if the *key* is already
present in the table. If *updates_allowed* is the string `"required"`,
then an error will be raised if *key* is /not/ already in the table.
Any other true value allows updates but does not require them. An
update will replace the existing entry in the table.

Returns the index of the inserted entry.

— Method **:add** *key* *value* *updates_allowed*

Add an entry to the ctable, returning the index of the added entry. See
the documentation for `:insert` for a description of the parameters.

— Method **:update** *key* *value*

Update the entry in a ctable with the key *key* to have the new value
*value*. Throw an error if *key* is not present in the table.

— Method **:lookup_ptr** *key*

Look up *key* in the table, and if found return a pointer to the entry.
Return nil if the value is not found.

An entry pointer has three fields: the `hash` value, which must not be
modified; the `key' itself; and the `value`. Access them as usual in
Lua:

```lua
local ptr = ctab:lookup(key)
if ptr then print(ptr.value) end
```

Note that pointers are only valid until the next modification of a
table.

— Method **:lookup_and_copy** *key* *entry*

Look up *key* in the table, and if found, copy that entry into *entry*
and return true. Otherwise return false.

— Method **:remove_ptr** *entry*

Remove an entry from a ctable. *entry* should be a pointer that points
into the table. Note that pointers are only valid until the next
modification of a table.

— Method **:remove** *key* *missing_allowed*

Remove an entry from a ctable, keyed by *key*.

Return true if we actually do find a value and remove it. Otherwise if
no entry is found in the table and *missing_allowed* is true, then
return false. Otherwise raise an error.

— Method **:selfcheck**

Run an expensive internal diagnostic to verify that the table's internal
invariants are fulfilled.

— Method **:dump**

Print out the entries in a table. Can be expensive if the table is
large.

— Method **:iterate**

Return an iterator for use by `for in`. For example:

```lua
for entry in ctab:iterate() do
print(entry.key, entry.value)
end
```

#### Hash functions

Any hash function will do, as long as it produces values in the right
range. In practice we include some functions for hashing byte sequences
of some common small lengths.

— Function **ctable.hash_32** *number*
Hash a 32-bit integer. As a `hash_fn` parameter, this will only work if
your key type's Lua representation is a Lua number. For example, use
`hash_32` on `ffi.typeof('uint32_t')`, but use `hashv_32` on
`ffi.typeof('uint8_t[4]')`.

— Function **ctable.hashv_32** *ptr*
Hash the first 32 bits of a byte sequence.

— Function **ctable.hashv_48** *ptr*
Hash the first 48 bits of a byte sequence.

— Function **ctable.hashv_64** *ptr*
Hash the first 64 bits of a byte sequence.
Loading