Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QCA chip reset, Link detection, Version information #599

Merged
merged 19 commits into from
Apr 2, 2024

Conversation

corneliusclaussen
Copy link
Contributor

@corneliusclaussen corneliusclaussen commented Mar 25, 2024

Describe your changes

In some scenarios, QCA7k requires a chip reset after setting up keys. This PR adds an option (not enabled by default) to send a RS_DEV.REQ custom vendor MMU to reset the chip. Works at least on QCA7k.

Requires add-qca-reset branch from libslac as well.

Update: Now also waits for link to actually come up using LINK_STATUS message before switching to MATCHED state. Also this option is disabled by default for now as it may only work on Qualcomm chips.

Issue ticket number and link

Checklist before requesting a review

  • I have performed a self-review of my code
  • I have made corresponding changes to the documentation
  • I read the contribution documentation and made sure that my changes meet its requirements

@corneliusclaussen corneliusclaussen added the include-in-release Tag that this PR should be included in the current merge window for the upcoming release if possible label Mar 25, 2024
@lategoodbye
Copy link

I heard about this problem, but never saw it by myself on a setup. So i'm interested in the details of the cause (QCA7K firmware version, Linux QCA driver, what's the symptom of the QCA7K which requires a reset).

@corneliusclaussen
Copy link
Contributor Author

I heard about this problem, but never saw it by myself on a setup. So i'm interested in the details of the cause (QCA7K firmware version, Linux QCA driver, what's the symptom of the QCA7K which requires a reset).

I tried to understand for quite a while and gave up. It depends on the EV. Some EV never trigger the bug, some trigger it almost always (E.g. Volvo XC40 with a PLC sniffer attached, it does not do this without the sniffer though).
On most cars it happens quite seldomly (maybe once in 100 charging sessions or so).

It happens both with firmware 1.2.5 and 3.0.0 on the EVSE side, linux driver version:

qcaspi spi0.0: ver=0.2.7-i, clkspeed=12000000, burst_len=5000, pluggable=0

The symptoms are as follows:

After unplug, a new key is set using CM_SET_KEY.REQ. A matching CNF is received. I also verified that the correct key is set by reading it back using plctool. If the bug occurs, the chip seems to behave normally from a linux sw point of view, it replies to all requests, no errors are logged in debugfs/dmesg or ethtool.

On the next plug in, SLAC is performed correctly. After the matching cnf, the EV tries to send an association request (CM_ASSOC.REQ or so) to the EVSE PLC and it does not reply to it (we saw this on an external PLC sniffer). Since this is PLC to PLC chip communication we cannot do much about it, it is a firmware bug in the QCA as far as I understand. The result is that the link does not come up and no SDP is possible etc.

The link is never reported as up (using plctool -L). EVerest does not verify this at the moment as it is a non-standard PLC message and hence gets stuck after slac. But even if, the chip never recovers, so even restarting SLAC would not help at all in this case.

I verified in a lot of testing on the Charin events in Valencia and Cleveland (where i had good reproducability) that resetting the chip after setting new keys after unplug works 100% of the time, this is what is implemented here.
This is a cleaned-up version of the hacks we did during the testival.

It is disabled by default though, I have not verified if other PLC chips (lumissil, vertexcom) implement the RS_DEV command (they also don't need it as I never had problems with stability with those).

Next step would be to implement link up detection in a separate PR and restart SLAC in that case as there are probably still cases which are not solved by this PR.

@corneliusclaussen
Copy link
Contributor Author

I also discussed this with several car OEMs and they said that also on the EV side they reset the QCA on every unplug due to some firmware bugs and said it is good practice to do so. I prefer to switch to Lumissil or VertexCom instead ;-)

@lategoodbye
Copy link

Thanks for the detailed explanation

slac::messages::HomeplugMessage hp_message;
hp_message.set_protocol_version(protocol_version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new method, defined in libslac? If so, the dependencies.yaml should also be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it depends on EVerest/libslac#7 which needs to be merged first. Then dependency yaml can be updated

@corneliusclaussen
Copy link
Contributor Author

Added another commit to also wait for link to be established before entering matched state. Successfully tested on real cars that it retries if the link does not come up. Also this option is disabled by default as it may only work on Qualcomm chips.

@corneliusclaussen corneliusclaussen requested a review from a-w50 March 26, 2024 11:19
type: boolean
default: false
chip_reset_delay_ms:
description: Delay between SET_KEY.CNF and RS_DEV.REQ

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: i've noticed that the QCA7K does not always send a SET_KEY.CNF (or the host has a chance to receive it). Unfortunately i never had the time to investigate this further. AFAIR this was the case for firmware before 3.0.0 and non-fast SLAC configuration which does a real reset after CM_SET_KEY, but i'm not totally sure.

This is just a warning, not a request to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The non-fast SLAC config, does that mean the QCA resets itself when setting a key? I think we always used a config where the QCA does not automatically reset when the key is changed, that is why we do a reset here manually afterwards. I have not seen the issue of missing CNF in our setup, but it would make sense that e.g. the QCA itself resets before the CNF packet is really transmitted to the host

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe using fast config option is preffered as we have more control over when (or if at all) we need to reset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The non-fast SLAC config, does that mean the QCA resets itself when setting a key? I think we always used a config where the QCA does not automatically reset when the key is changed, that is why we do a reset here manually afterwards. I have not seen the issue of missing CNF in our setup, but it would make sense that e.g. the QCA itself resets before the CNF packet is really transmitted to the host

From my understanding the behavior in this case is the following:

  • EVSE host send CM_SET_KEY.REQ
  • QCA7000 stores the the NMK in the internal flash
  • QCA7000 send CM_SET_KEY.CNF
  • QCA7000 triggers a reset, which clears the messages buffers

Yes, i agree this process is not recommend for production because of the flash wearout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fast way also writes it to flash memory, after power cycle the key is still there. I think the chip was designed for home usage where the PLC modems are paired a few times per lifetime. It was not really designed for changing a key very often.

Copy link

@lategoodbye lategoodbye Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, the other advantage of fast way was avoiding the reset which takes ~ 7 second before SLAC is really operational again.

@corneliusclaussen
Copy link
Contributor Author

I checked on VertexCom that it still works when the reset and link detect options are disabled. I'm not aware of VertexCom or Lumissil vendor specific messages that could allow for reset/link detection/version information etc. Anyone knows?

@lategoodbye
Copy link

lategoodbye commented Mar 27, 2024

I checked on VertexCom that it still works when the reset and link detect options are disabled. I'm not aware of VertexCom or Lumissil vendor specific messages that could allow for reset/link detection/version information etc. Anyone knows?

We implemented the link detection in a GreenPHY compatible way (not vendor specific) in our propietary stack, which has been successful tested on QCA7000 and VertexCom. Is the involved MME sufficient?

Version information is always vendor specific AFAIK.

@corneliusclaussen
Copy link
Contributor Author

Is the involved MME sufficient?

Which one did you use?

@corneliusclaussen
Copy link
Contributor Author

corneliusclaussen commented Mar 27, 2024

Is the involved MME sufficient?

Which one did you use?

CM_NW_INFO? NSCM_LINK_STATS?

@barsnick
Copy link
Contributor

CM_NW_INFO? NSCM_LINK_STATS?

CM_NW_STATS.CNF->NumSTAs

@corneliusclaussen
Copy link
Contributor Author

corneliusclaussen commented Mar 27, 2024

CM_NW_INFO? NSCM_LINK_STATS?

CM_NW_STATS.CNF->NumSTAs

ok. Funny to see how many ways there are to find out if there is a link. Lumissil has a bunch of different packages implemented as well.

Edit: Lumissil supports CM_NW_INFO but not CM_NW_STATS

@corneliusclaussen corneliusclaussen changed the title Add QCA chip reset Add QCA chip reset, Link detection, Version information Mar 27, 2024
@corneliusclaussen corneliusclaussen force-pushed the add-qca-reset branch 2 times, most recently from 69bb59a to 5ad58f6 Compare March 28, 2024 15:18
@corneliusclaussen
Copy link
Contributor Author

Now link detection and version information also working for Lumissil. VertexCom is supported, but only without link detection/version information/reset as of now. Will add vendor specific support for VertexCom in a later pull request when we get access to the documentation.

Signed-off-by: Cornelius Claussen <[email protected]>
Signed-off-by: Cornelius Claussen <[email protected]>
a-w50 and others added 13 commits March 28, 2024 19:34
Signed-off-by: aw <[email protected]>
Signed-off-by: aw <[email protected]>
Signed-off-by: Cornelius Claussen <[email protected]>
Signed-off-by: Cornelius Claussen <[email protected]>
Signed-off-by: Cornelius Claussen <[email protected]>
Signed-off-by: Cornelius Claussen <[email protected]>
Signed-off-by: Cornelius Claussen <[email protected]>
Copy link
Contributor

@a-w50 a-w50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine so far. Please test with the latest changes. If everything works as expected, feel free to merge. Remember to squash.

@hikinggrass hikinggrass merged commit 3d2b001 into main Apr 2, 2024
4 of 5 checks passed
@hikinggrass hikinggrass deleted the add-qca-reset branch April 2, 2024 12:57
hikinggrass added a commit that referenced this pull request Apr 2, 2024
* Add QCA chip reset
* Option: wait for link status ready before signalling DLINK_READY=connected

Signed-off-by: Cornelius Claussen <[email protected]>

* chore: using type traits
* refactoring
* formatting

Signed-off-by: aw <[email protected]>

* Slac: print PLC device attributes for log analysis
* Update description in manifest
* Move qualcomm specific messages to namespace
* Move version logging into separate function
* set default to false
* Add Lumissil vendor messages
* fix lumissil link detection and version information
* Update description in manifest

Signed-off-by: Cornelius Claussen <[email protected]>

* Bump libslac dependency version to v0.3.0

Signed-off-by: Kai-Uwe Hermann <[email protected]>

* chore(code-quality)
* chore(clang-format)

Signed-off-by: aw <[email protected]>

---------

Signed-off-by: Cornelius Claussen <[email protected]>
Signed-off-by: aw <[email protected]>
Signed-off-by: Kai-Uwe Hermann <[email protected]>
Co-authored-by: aw <[email protected]>
Co-authored-by: Kai-Uwe Hermann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include-in-release Tag that this PR should be included in the current merge window for the upcoming release if possible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants