Standard tables for fast lookups #718

plajjan · 2016-01-19T13:39:16Z

I think it would be useful for Snabb to have a couple of standard libs of table implementation that are well suited for network forwarding, e.g. very fast and deterministic lookup and update times, that can be included from an app.

We need equal matching (ARP, ethernet forwarding, potentially firewall stuff) as well as longest prefix matching for routing. While flexible key length would be sweet, I suppose it might be easier to optimise if we keep to a set of standard key lengths. 16 (TCP / UDP ports), 32 (IPv4), 48 (MAC) and 128 (IPv6) comes to mind.

It would also be nice if all tables would behave in roughly the same way, ie the methods they expose and what data can be stored, like if we allow a bit of extra data or a reference to some other data structure where more information can be stored, think next-hop, action or whatever for a key.

First candidate for inclusion is @andywingo's PodHashMap in lwAFTR that implements equal matching for 32 bit keys.

lukego · 2016-01-19T13:45:09Z

YES!

See also LuaJIT/LuaJIT#115 which has a bounty attached.

wingo · 2016-01-19T14:05:59Z

Note that the PodHashMap supports keys of any length. It has optimized equality routines for keys that are numbers and keys that are 4 and 8 bytes long. Adding more optimized comparators is easy.

For longest-prefix matching, you could look up a number of masked key lengths in parallel using a lookup streamer. Or you could use binary search, depending on how many routes there are. Probably parallel lookups of N key lengths are better.

I suspect the PodHashMap fulfills most requirements. It's on my list to submit to Snabb core soonish. Incidentally if you have any opinions about names, they are welcome :) cdatamap, cdatahash, ffimap, ffihash, podmap, etc...

plajjan · 2016-01-19T14:06:32Z

And speaking of LPM, I find it very interesting that Juniper seem to have opted for a combination of bloom filters and equal match tables for implementing LPM. I don't think it's explicitly stated but it can kind of be derived from information, see http://forums.juniper.net/t5/Data-Center-Technologists/Juniper-QFX10002-Technical-Overview/ba-p/270358

My interpretation is that they have as many tables as there are bit lengths, so 32 for IPv4 and then use Bloom filters to quickly evaluate if it's probable that there is a match in a table for the destination address.

for pl in 32..0:
    if bloom_check(dst_address, table[pl]):
        nh = lookup(dst_address, table[pl])
        if nh:
            return nh

So, start with bloom check on /32 table, if "maybe", then do actual equal match in that table and return result. If bloom check == no, then check in next, less specific table and so forth. I suppose you can do parts of this in parallel with outstanding memory lookups et al.

It feels awfully wasteful. With different prefix lengths for the different tables I suppose the hashes used for the bloom check need to be recalculated for every prefix length. Not sure about difference in speed between bloom + lookup vs just the lookup.

They claim 1 search for LPM.

Given these limitations, we still wanted 1 read (search) per packet for LPM. Juniper’s Bloom Filter Technology allows us to get 1 search for LPM in nearly all cases with a high number of entries, which enables the high logical scale.

I don't understand how that would work, or they have different definition of "search". It's probably not "memory accesses", which is what I'm thinking about.

Still, it would be interesting to see a test of this. Guess one could glue together PHM with some bloom stuff for a proof of concept. Like I said, it feels wasteful (and thus slow) but I'm not very familiar with the performance of "normal" LPM tables on a low level. Maybe there are advantages to this approach, or it's only beneficial when you have those HMC chips.

wingo · 2016-01-19T14:14:38Z

plajjan -- it can make sense though. If you are working on 32-bit keys and you associate N bytes of data with each key, then you will have (N*8)GB of data. You can fetch all 32 values in parallel (well, I think L2 can only have 16 reads outstanding on haswell, but that will change at some point, and many of the fewer-bit reads will actually be in l2). So 6 bytes of data per IP, if mapping L2 to L3 address, then you just have 48 GB of data. It's a bit much but it's doable.

wingo · 2016-01-19T14:17:47Z

Alternately you could use a hash table for the longer prefix matches, and use an exhaustive packed table for the shorter matches. That would probably be the sweet spot if memory is at a premium or for routing IPv6.

wingo · 2016-01-19T14:22:10Z

Another possibility would be to store a side table mapping IP to prefix length. This table would probably be much smaller than the entire routing table, so you could store it as a sorted vector and use binary search. If there were 256 distinct ranges of IP addresses corresponding to different prefix lengths in the table, then you get just 8 fetch+compare+cmov instructions, and no branches and the whole thing is likely in l2. Then you use the prefix to look into the hash table.

plajjan · 2016-01-19T14:44:03Z

@andywingo Can you elaborate on the side table to map IP to prefix length? :) I don't understand what you mean. What's the key for that table? Why would there be 256 distinct prefixes in that table if we are dealing with let's say a DFZ table (~700k routes)?

It's worth remembering that the DFZ isn't uniform, e.g. there are way more /24 than there are /8 prefixes. I know the MSC on the CRS-1 used this to optimise the FIB. They did a 5 stage lookup in a TreeBitmap where each stage could contain different amount of bits to match to what is popular on the Internet. Paper: http://cseweb.ucsd.edu/~varghese/PAPERS/ccr2004.pdf

wingo · 2016-01-19T15:11:12Z

@plajjan so i am very ignorant and I want to make that clear :) Thanks for your patience :)

Let's say you have 700K routes. You put them all in a PodHashMap keyed by lowest IP and prefix length. On the side, you sort them in ascending order of lowest IP address, treating them as a map from [ip_low, ip_high) -> prefix_len. You coalesce contiguous ranges that map to the same prefix_len, and fill in any gaps with the outer prefix_len. How many entries would you end up with? If it's a few hundred or a few thousand, then you can consider using binary search over that side table, as the lwaftr does to look up PSID parameters (we use the rangemap for this; https://github.com/Igalia/snabbswitch/blob/lwaftr_rambutan/src/apps/lwaftr/rangemap.lua). If it approaches the cardinality of the routing table itself, it's a lose then.

eugeneia · 2016-05-13T12:08:05Z

Closing because #722 was merged, please reopen on demand.

plajjan · 2016-05-13T12:55:43Z

I just added a new more specific issue to cover LPM tables. I suppose ctable is fine for now, for equal matching.... (it's actually a tad slow).

Remove mandatory statement for list keys

lukego mentioned this issue Jan 20, 2016

Add lib.ctable module. #722

Merged

eugeneia added enhancement idea labels Jan 20, 2016

eugeneia closed this as completed May 13, 2016

plajjan mentioned this issue May 13, 2016

Add standard LPM table(s) #920

Open

dpino added a commit to dpino/snabb that referenced this issue Mar 28, 2017

Merge pull request snabbco#718 from dpino/remove-mandatory-list-keys

26ad8ee

Remove mandatory statement for list keys

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standard tables for fast lookups #718

Standard tables for fast lookups #718

plajjan commented Jan 19, 2016

lukego commented Jan 19, 2016

wingo commented Jan 19, 2016

plajjan commented Jan 19, 2016

wingo commented Jan 19, 2016

wingo commented Jan 19, 2016

wingo commented Jan 19, 2016

plajjan commented Jan 19, 2016

wingo commented Jan 19, 2016

eugeneia commented May 13, 2016

plajjan commented May 13, 2016

Standard tables for fast lookups #718

Standard tables for fast lookups #718

Comments

plajjan commented Jan 19, 2016

lukego commented Jan 19, 2016

wingo commented Jan 19, 2016

plajjan commented Jan 19, 2016

wingo commented Jan 19, 2016

wingo commented Jan 19, 2016

wingo commented Jan 19, 2016

plajjan commented Jan 19, 2016

wingo commented Jan 19, 2016

eugeneia commented May 13, 2016

plajjan commented May 13, 2016