Sequential PLM Scenes May be Ignored #26

krkeegan · 2012-12-17T17:39:14Z

This is a weird one, and I am still diagnosing it, but this is what I believe is happening so far.

Issue:

If two PLM scene commands are issued sequentially, the second command may never be sent. In this scenario, each Scene controls only one device. Specifically, they are surrogate scenes for KPL buttons. The basic print log shows something as follows:
PLM_Scene_1->set('on')
Received acknowledgement from PLM_SCENE_1
PLM_Scene_2->set('on')
Received acknowledgement from PLM_SCENE_1 (This is not a typo, the print log shows the first scene as acknowledged twice.)

Current Theory:

Setting the insteon debug level to 3 provides some additional detail. If left at the default settings, it looks like the PLM_Scene_1 command is issued and an acknowledgement is received. MH then tries to send the PLM_Scene_2 command but receives a "PLM is extremely busy" error 15 and claims that the command will be retried in 1 second.

While waiting to retry the command, MH receives a command from the PLM. The PLM reports this as being a second acknowledgement from PLM_SCENE_1. (I suspect at this point that MH erroneously deletes the pending command for PLM_Scene_2 that is waiting, believing that this command has been acknowledged).

If I up the Insteon_xmit setting to .7 seconds. Everything seems to work fine. The Insteon:3 debug log shows the following general sequence:
PLM_Scene_1->set('on')
Received acknowledgement from PLM_SCENE_1
Received All_Link Cleanup success from PLM_SCENE_1

PLM_Scene_2->set('on')
Received acknowledgement from PLM_SCENE_2
Received All_Link Cleanup success from PLM_SCENE_2

So I am not an Insteon guru, but it looks like we should be expecting two acknowledgments to PLM Scenes (at least this specific type of scene). While the first acknowledgment is specific to the one linked device. I believe the second acknowledgement is basically saying "All of the linked devices in this scene have reported in successfully."

I suspect that with the Insteon_xmit setting at the default level, that the All-Link Cleanup success message is why the PLM is busy when we attempt to send the PLM_Scene_2 command.

As noted above, I further suspect that at the default setting, the All-Link Cleanup success message causes MH to delete the pending PLM_SCENE_2 command.

Solutions:

Fix the code so that MH only removes a pending message if it receives a correct acknowledgement. Right now it looks like the comparison is not working perfectly.
If PLM Scenes will always receive an ALL-Link Cleanup Success message, maybe we should wait for it to come in.
Some combination of 1 and 2.

Status:

I am still working on diagnosing where in the code this all happens, if anyone smarter than me knows where to look or has some suggestion, please let me know.

Previously, insteon cleanup messages received by MH, would cause MH to clear the active message withoug confirming if the cleanup message corresponded to the active message. If multiple PLM scenes were sent sequentially, a cleanup message from one scene may have caused MH to clear out the following scenes command. This patch only clears the active message if the cleanup message matches the active message.

mstovenour · 2012-12-22T14:33:00Z

Great catch, I think I've seen this too when sending two commands in a row to the same device. Both the existing code and the proposed fix concern me. I'm concerned that the existing behavior tried to remove a message that hasn't been sent. IMO, this should never happen and should be part of the check. At the same time I agree that the response matching logic needs to be stronger (to cover my case). I think this should be fixed in two ways. We could add (if it doesn't already exist) a "sent" flag in the insteon message object that is false initially and set when the message is sent. Then the insteon message object should have a match_ack() function. This function returns true if the ack matches the message and "always" returns false if the "sent" flag is false (to cover your case). The matching logic in the match_ack() function is otherwise very similar to what you wrote. We could then call the match_ack() function with the ack message while scanning the message stack. Would you be willing to try to fix it this way? I don't mind giving it a go otherwise.

krkeegan · 2012-12-22T17:15:32Z

I agree that checking to see if a message has even been sent would be a nice addition. However, at the moment, it is not on the top of my list to add to the code. I am not the world's greatest perl programmer and I am struggling as it is to figure out the structure of greg's design.

Even without checking if a message has been sent, I think my proposed code still works. In order for a message to be cleared, it has to be from the same device, for the same group, with the same command. A message which has not been sent which matches these criteria would be a duplicate and unnecessary anyways.

A quick solution I can think of, is to check the send_attempts. However, send_attempts iterates when the PLM reports a busy code "15". So a number greater than 0 is not necessarily a sign that a message was sent. We could alter this behavior, but it could result in an infinite loop on the off chance that the PLM continually fails and responds only with busy codes.

mstovenour, if you have the ability to add your suggested code, please by all means do so. Otherwise, I will mark this as an issue for me to come back to in the future.

mstovenour · 2013-01-03T12:59:12Z

I'm still wading through the code to get a better feel for how it works and haven not come to a well formed conclusion for this issue. I will admit that the code you introduced is worse case benign; I don't see any harm in the changes. However I still think maybe something more interesting is going on. The 2412U PLM dev guide from 2007 (i.e. covers i1 and i2 but not i2CS) uses the term "all-link group" for a mh "scene". Apparently there are a number of things that happen when we ask the PLM to send an all-link group command. 1) PLM sends an all-link group broadcast towards the responders. 2) the PLM then sends individual clean up messages to each linked responder. 3a) each linked device that gets the all-link clean up will send an all-link clean up ack (this arrives at MH as an insteon message plm command 0x50). 3b) for each responder that doesn't respond to the PLM's all link clean up, the PLM will send a All-link clean up failure report (0x56) to MH. 4) Finally the PLM will send an all-link cleanup status report (0x58) to MH with either ACK(06) or NACK(15). ACK indicates that the clean up completed. NACK indicates that the clean up was aborted due to other insteon traffic. The interesting part is that the code to process this last PLM command has a semantic error. The author tried to use == to compare a string. I'm not sure if this is part of your issue or not. I'll come back to this once I get scenes working with my I2CS devices. Right now I can't test any of the PLM all-link logic.
lib/Insteon_PLM.pm line 648:
if ($self->active_message && $self->active_message->isa('Insteon::InsteonMessage')
&&($self->active_message->command_type == 'all_link_send'))

krkeegan · 2013-01-03T18:15:32Z

Good catch Michael, I hadn't noticed the attempt to compare a string with ==.

I tested it out and this by itself doesn't solve the above issue. This is because, best I can tell the message that I found to be the cause of the problem is interpreted as a "standard insteon message" and is handled prior to reaching this logic.

Best I can tell the short all_link_clean_status message is simply dumped in the current code. Correcting the logic to use "eq" doesn't seem to have any effect because the current code doesn't really do much with the cleanup messages. I think Greg intended on coming back to this issue and never did.

…df74f1fb Fix for Issue #26. Mered after krkeegan has been running this fix for 3 weeks without issues.

Harmon

This reverts commit ca118e5, reversing changes made to d096140.

This was referenced Dec 22, 2012

Fix for Issue #27 #30

Closed

Update debug to note I2CS devices as potential sources of error #31

Closed

hollie added a commit that referenced this issue Jan 13, 2013

Merge pull request #29 from krkeegan/253afad710f4c8bac5f08a842e14dc4a…

88c7b8f

…df74f1fb Fix for Issue #26. Mered after krkeegan has been running this fix for 3 weeks without issues.

krkeegan closed this as completed Jan 18, 2013

hplato pushed a commit that referenced this issue Feb 2, 2022

Merge pull request #26 from waynieack/Harmon

ca118e5

Harmon

hplato pushed a commit that referenced this issue Feb 2, 2022

Revert "Merge pull request #26 from waynieack/Harmon"

d04d123

This reverts commit ca118e5, reversing changes made to d096140.

hplato mentioned this issue Aug 22, 2023

Mqtt fixes #861

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequential PLM Scenes May be Ignored #26

Sequential PLM Scenes May be Ignored #26

krkeegan commented Dec 17, 2012

mstovenour commented Dec 22, 2012

krkeegan commented Dec 22, 2012

mstovenour commented Jan 3, 2013

krkeegan commented Jan 3, 2013

Sequential PLM Scenes May be Ignored #26

Sequential PLM Scenes May be Ignored #26

Comments

krkeegan commented Dec 17, 2012

Issue:

Current Theory:

Solutions:

Status:

mstovenour commented Dec 22, 2012

krkeegan commented Dec 22, 2012

mstovenour commented Jan 3, 2013

krkeegan commented Jan 3, 2013