Insteon: ALDB_i2 Link Scan, Link Data Received out of Order May Cause Queue to Stall #258

krkeegan · 2013-09-26T16:40:41Z

This issue was discovered by @pmatis.

The sequence of events appears to be as follows, these events occur in the scanning of an ALDB_i2 link table:

MH issues a read command for link address 0FE7
ACK Received
Link Data for 0FE7 is received
MH issues a read command for link address 0FDF
Ack Received
Link Data for 0FE7 is received <-- duplicate out of order packet
6a. Because of the long delay, this packet is not caught as a duplicate packet
This message is passed to the ALDB_i2 Link parser which perceives this as a corrupt response.
The parser then tries to queue a request to read link address 0FDF again
MH catches this as an attempt to queue a command already in the queue and ignores it
Another ACK is received
10a. It is unclear what this is in response to
The ALDB_i2 Link parser ignores the ACK because it can't correlate it to a sent message
Link Data for 0FDF is finally received
The ALDB_i2 Link parser ignores the link data claiming that an ACK was not received.
At this point, the queue for the device stalls and nothing else happens.

Quick Diagnosis:

At step 8, I don't think the parser should be trying to queue a new message request. Instead, the parser should just fail to acknowledge receiving anything, this should result in the message handler sending a message retry in its normal course of action.
Steps 11 and 13. It is unclear to me why the parser initially claims that it cannot correlate the ACK to anything, but then subsequently claims that an ACK was never received that it was expecting.
It is also unclear why the queue timer is being cleared and never reset. This is what causes the entire process to stall.

on_read_write_aldb now returns a 1/0 corresponding to whether the current message should be cleared. When a bad message arrived, on_read_write_aldb attempted to requeue the message that was currently pending. However, _process_message did not clear the pending message until after this routine was run. As a result, a new message was not queued because it was duplicative, but then the current message was cleared. This resulted in stalling the message queue. Fixes bug hollie#258

Missed one instance in which the queued message should not be cleared. Should not be cleared on an unhandled mem action either. Further Fix to hollie#258

ghost assigned krkeegan Sep 26, 2013

krkeegan mentioned this issue Sep 27, 2013

Insteon: Pass Message Clearing Decision to on_read_write_aldb #260

Merged

krkeegan closed this as completed Oct 21, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insteon: ALDB_i2 Link Scan, Link Data Received out of Order May Cause Queue to Stall #258

Insteon: ALDB_i2 Link Scan, Link Data Received out of Order May Cause Queue to Stall #258

krkeegan commented Sep 26, 2013

Insteon: ALDB_i2 Link Scan, Link Data Received out of Order May Cause Queue to Stall #258

Insteon: ALDB_i2 Link Scan, Link Data Received out of Order May Cause Queue to Stall #258

Comments

krkeegan commented Sep 26, 2013