Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove lnd gossip workaround, clean up for modern spec #4184

Conversation

rustyrussell
Copy link
Contributor

We can use a counter not a bitmap since the spec insists replies be in order.

We remove an old LND workaround, and add another.

Then we make our code more efficient for the "query every single channel!" case.

Copy link
Collaborator

@m-schmoock m-schmoock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits and a failing testcase because of the large reply splitting optimization:

tests/test_gossip.py::test_gossip_query_channel_range (line 703)

    # It should definitely have split
    l2.daemon.wait_for_log('queue_channel_ranges full: splitting')

u16 i, old_num, added;
const struct channel_update_timestamps *ts;
/* Zero means "no timestamp" */
const static struct channel_update_timestamps zero_ts;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure if it's really safer on some wired compiler or platform, but a {0} initialization can't hurt:

Suggested change
const static struct channel_update_timestamps zero_ts;
const static struct channel_update_timestamps zero_ts = {0, 0};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; it's implied but better to be clear!

@@ -414,31 +434,14 @@ static void get_checksum_and_timestamp(struct routing_state *rstate,
}

/* FIXME: This assumes that the tlv type encodes into 1 byte! */
static size_t tlv_len(const tal_t *msg)
static size_t tlv_len(size_t num_entries, size_t size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this / should this be inlined?

Suggested change
static size_t tlv_len(size_t num_entries, size_t size)
static inline size_t tlv_len(size_t num_entries, size_t size)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we never inline except in headers; the compiler is pretty smart.

I used to say "inline is the register keyword of the 90s" which just shows how long I've been saying it:)

#include <ccan/list/list.h>
#include <ccan/short_types/short_types.h>
#include <ccan/timer/timer.h>
#include <common/bigsize.h>
#include <common/node_id.h>
#include <wire/peer_wire.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This created a duplicate import in gossipd.c line 67 which fails on the code checks.
I recommend removing it from the c file.

if (timestamps_tlv) {
ts = decode_channel_update_timestamps(tmpctx,
timestamps_tlv);
if (!ts || tal_count(ts) != tal_count(scids)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it we can extract the pure decoding error message from the unequal count:

Suggested change
if (!ts || tal_count(ts) != tal_count(scids)) {
if (!ts) {
return towire_errorfmt(peer, NULL,
"reply_channel_range can't decode timestamps.");
}
if (tal_count(ts) != tal_count(scids)) {
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, OK.

query_option_flags, &tstamps, &csums);

limit = max_entries(query_option_flags);
off = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout the rest of this function the offset variable size_t off is untouched and will always be constant 0. It will only used for adding or subtracting on some integer or indexes. I think this is a leftover of some prior code state of yours ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's for the splitter logic where a single block can't fit the reply, and therefore the later replies will start from off. But as @m-schmoock points out it isn't set anywhere, so presumably it should be set to n on line 605, before decrementing n-- to make it fit :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

@ElementsProject ElementsProject deleted a comment from m-schmoock Nov 6, 2020
@ElementsProject ElementsProject deleted a comment from cdecker Nov 6, 2020
…ge replies.

The spec (since d4bafcb67dcf1e4de4d16224ea4de6b543ae73bf in March
2020) requires that reply_channel_range be in order (and all
implementations did this anyway).

But when I tried this, I found that LND doesn't (always) obey this,
since don't divide on block boundaries.  So we have to loosen the
constraints here a little.

We got rid of the old LND compat handling though, since everyone should
now be upgraded (there are CVEs out for older LNDs).

Signed-off-by: Rusty Russell <[email protected]>
Changelog-Removed: Support for receiving full gossip from ancient LND nodes.
@rustyrussell rustyrussell force-pushed the guilt/remove-lnd-gossip-workaround branch from 287a382 to d1e231f Compare November 9, 2020 06:29
@rustyrussell
Copy link
Contributor Author

Fixed the un-updated off variable, and fixed up the various changes which broke tests and check-source.

It's not (yet?) compulsory to have the timestamps, but handing them around
together makes sense (a missing timestamp has the same effect as a zero
timestamp).

Signed-off-by: Rusty Russell <[email protected]>
We used to create the entire reply, the if it was too big, split in
half and retry.

Now that the main network is larger, this always happens with a full
request, which is inefficient.

Instead, produce a reply assuming no compression, then compress as a
bonus.  This is simpler and more efficient, at cost of sending more
packets.

I also renamed an internal dev var to make it clearer.

Signed-off-by: Rusty Russell <[email protected]>
Thanks to m-schmook's feedback.

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell rustyrussell force-pushed the guilt/remove-lnd-gossip-workaround branch from d1e231f to cefac90 Compare November 9, 2020 09:34
@m-schmoock
Copy link
Collaborator

For some strange reasons I can't resolve comments and suggestions. Just ignore them, everything I found is addressed

Copy link
Contributor

@niftynei niftynei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK cefac90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants