Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X10Express] Serial connection to the MPM is sporadically lost #4411

Open
1 task done
ParkerEde opened this issue Dec 12, 2023 · 42 comments
Open
1 task done

[X10Express] Serial connection to the MPM is sporadically lost #4411

ParkerEde opened this issue Dec 12, 2023 · 42 comments
Labels
bug 🪲 Something isn't working

Comments

@ParkerEde
Copy link
Contributor

Is there an existing issue for this problem?

  • I have searched the existing issues

What part of EdgeTX is the focus of this bug?

Transmitter firmware

Current Behavior

2 weeks ago I have ETX2.9.2 and my Horus X10S Express with an IRange IRX+ (MPM) connected to a Blade130S and suddenly lost all control. This has happened twice in a short space of time. Luckily nothing happened to the Blade 130S (DSMX, it happily fell out of the air. The red control LED then flashed on the Irange IRX+ module. Normally it lights up permanently. What happened that the serial connection to the MPM can no longer be kept stable?
Last weekend I got the same problem again. There is definitely still a problem in the serial control of the MPM.
Unfortunately, the problem occurs very sporadically.

I have now carried out further tests on the workbench. I have now connected the MPM to the Horus X10S Express with a Y-cable and connected a logic analyzer to it. Initially, the CPPM signal curve is displayed as normal. When the error occurred almost two days later, I looked at the CPPM signal and it was consistently LOW.

Perhaps this problem only occurs on transmitters that have the fast serial connection for external ACCESS in the ext. module bay?

Expected Behavior

MPM CPPM Signal should be stable

Steps To Reproduce

  1. Power on X10 Express and select a model with ext.MPM DSMX protocol
  2. wait for minutes, hours, days....
  3. suddenly the CPPM signal is lost (low), red LED on irangeX+ Module starts flashing

Version

2.9.2

Transmitter

FrSky X10 Express / X10S Express (ACCESS)

Operating System (OS)

No response

OS Version

No response

Anything else?

may there is a connection to issue #4357

@ParkerEde ParkerEde added bug 🪲 Something isn't working triage Bug report awaiting review / sorting labels Dec 12, 2023
@ParkerEde
Copy link
Contributor Author

ParkerEde commented Dec 12, 2023

I have just reproduced the problem with a Radiomaster TX16S MKII and the ext. IRangeX+. This occurs in exactly the same way as with the Horus X10 Express and has nothing to do with the serial highspeed for ACCESS.

The only difference is that after a few seconds the CPPM connection was automatically re-established and at that moment the Blade130S (DSMX) started up the rotors. So the whole thing is very dangerous.
Addendum: This happened because I had not checked "deactivate Ch. mapping" in the settings.

With the Horus X10 Express, the connection was never re-established automatically.

@ParkerEde
Copy link
Contributor Author

@raphaelcoeffic what do think about this?

@3djc
Copy link
Collaborator

3djc commented Dec 12, 2023

Any chance the issue lies with your MPM module? Since 2.9.2 has been live some time already, and you seem to be the only one with this issue, I wonder

@ParkerEde
Copy link
Contributor Author

I had also considered this at first, hence the check with the logic analyzer. I think that if it were the MPM, the correct signal should be present at the CPPM and the MPM would have the error. But the fact is that CPPM is completely LOW.
In addition, I never had any problems with the module before with ETX2.9.1 and older ETX versions.

@3djc
Copy link
Collaborator

3djc commented Dec 12, 2023

There are hardware failures of mpm that could short signal pin, that could look like that. Have you tried putting 2.9.1 back and leave it enough time?

@mha1
Copy link
Contributor

mha1 commented Dec 12, 2023

@ParkerEde shot a video of his IRange IRX+ MPM showing a flashing red LED at 0 .5s on/off interval. https://www.multi-module.org/using-the-module/troubleshooting lists this as "slow blink":

image

Of course it might still be a faulty MPM But I wouldn't bet on it. @ParkerEde: do you have access to another exetrnal MPM?

@ParkerEde
Copy link
Contributor Author

I will continue to observe and report back. At the moment, however, I would say that the problem should not be coming from the MPM. Also, if the error occurs again, I can pull the CPPM signal line out of the MPM and then measure it on the module bay side. If the line is still LOW, it should be clear that the fault is not in the MPM. Do you see it the same way?

No this the only one.

@ParkerEde
Copy link
Contributor Author

ParkerEde commented Dec 13, 2023

Now I have had the problem again. I disconnected the CPPM line from the MPM and evaluated it in logicanalyzer. It is completely LOW. The MPM can therefore be ruled out as the cause.
After measuring the CPPM line, I plugged it back into the MPM and the LED continued to flash. I left the system in this state. After about 15-20 minutes, the LED was permanently on again and the module worked again. So it is the case that the CPPM line not only drops to LOW sporadically, but also comes back on by itself at some point.

@ParkerEde
Copy link
Contributor Author

I have now been running the HorusX10S Express with the MPM for hours without powering on the Blade130S (DSMX). Then the error does not occur. I am now running the system with the Blade130S powered on but have not connected the S.PORT line to the MPM. I assume that the error will then not occur.

@ParkerEde
Copy link
Contributor Author

In the case that the S.PORT line is not connected to the MPM, the connection is completely maintained and everything is OK. I can now say with certainty that the telemetry transmitted to ETX via S.PORT causes the CPPM line to drop to LOW at some point (in my case always between 1 minute and 2 hours).

Something seems to be going wrong with the telemetry in ETX so that the entire CPPM connection subsequently breaks down.

These tests are all based on a Spektrum DSMX connection to a Blade130S.

@ParkerEde
Copy link
Contributor Author

And finally, here is some additional information. If the S.Port line is connected, but I deactivate telemetry in the MPM settings, the error does not occur. This means that the activated telemetry must be causing something in ETX to go wrong, which results in CPPM changing to LOW level.

@ParkerEde
Copy link
Contributor Author

I have now tested an mpm from a friend. It behaves exactly like my own

@richardclli
Copy link
Collaborator

Could you please check what is the latest version of EdgeTX that is good, and which version is bad. This can help to identify the problem.

@ParkerEde
Copy link
Contributor Author

Could you please check what is the latest version of EdgeTX that is good, and which version is bad. This can help to identify the problem.

yes I will try. But this will be a hard job.

Here you will find a recording of CPPM (D0) and S.PORT (D1). At the end you can see where CPPM goes low. Since there is definitely a dependency of the error on the activated telemetry, I think it can help to see what data was sent on the S.PORT before.
I used Sigrok pulseview for the recording.
Spektrum-CPPM-SPORT-CPPMlow.zip

@ParkerEde
Copy link
Contributor Author

Could you please check what is the latest version of EdgeTX that is good, and which version is bad. This can help to identify the problem.

With v2.8.5 the error has not yet occurred for me. I have rebuilt v2.9.0 and v2.9.1 via GITpod for X10Express. The error occurs within a few seconds.

@richardclli
Copy link
Collaborator

richardclli commented Dec 17, 2023

Could you please check what is the latest version of EdgeTX that is good, and which version is bad. This can help to identify the problem.

With v2.8.5 the error has not yet occurred for me. I have rebuilt v2.9.0 and v2.9.1 via GITpod for X10Express. The error occurs within a few seconds.

So it happens in 2.9.0 as well, right?

Interesting, I used to fly using my TX16S + MPM + SPM4649T with battery voltage telemetry using 2.9.0 firmware for some time and did not observer any problems. What model of receiver you are using?

@pfeerick
Copy link
Member

pfeerick commented Dec 17, 2023 via email

@richardclli
Copy link
Collaborator

Another question, did the problem only affect external modules? Any internal modules confirmed to have this problem?

@ParkerEde
Copy link
Contributor Author

ParkerEde commented Dec 18, 2023

only external MPM and only DSMX with telemetry. In my case Blade130S and Blade 230 smart.
External MPM and Frsky X2 LBT with telemetry for example works well.

@raphaelcoeffic
Copy link
Member

raphaelcoeffic commented Dec 18, 2023

only external MPM and only DSMX with telemetry. In my case Blade130S and Blade 230 smart. External MPM and Frsky X2 LBT with telemetry for example works well.

Do you think you could catch some sample of the telemetry packets? We have an issue in the parsers that is poorly handled.

@ParkerEde
Copy link
Contributor Author

Could you please check what is the latest version of EdgeTX that is good, and which version is bad. This can help to identify the problem.

yes I will try. But this will be a hard job.

Here you will find a recording of CPPM (D0) and S.PORT (D1). At the end you can see where CPPM goes low. Since there is definitely a dependency of the error on the activated telemetry, I think it can help to see what data was sent on the S.PORT before. I used Sigrok pulseview for the recording. Spektrum-CPPM-SPORT-CPPMlow.zip

Here you'll find a trace

@mha1
Copy link
Contributor

mha1 commented Dec 18, 2023

@raphaelcoeffic have a close look at the la trace. decode it with the mpm sport uart settings 100000,8,e,1 and you'll find bit errors indicated by parity and framing errors. the reason is a high of only 1.8V. disconnecting sport and measuring mpm output directly shows the expected 3.3V. looks like the mpm can't drive a proper high on the sport pin. can you think of a misconfiguration of the radio's uart (pullup/down?).

crosschecking this with my tx16s on 2.9.2 and a rm external mpm shows proper mpm sport levels with or without connection to the radio's sport pin.

@ParkerEde
Copy link
Contributor Author

I now have an X10S Express from my friend here, and the S.PORT level is exactly the same as mine, only 1.8V.

@mha1
Copy link
Contributor

mha1 commented Dec 18, 2023

@raphaelcoeffic @3djc hypothesis about the error this issues describes, the loss of serial communication to the MPM: As the trace shows there are a lot of UART parity and frame errors, i.e. corrupt bytes. Assuming those corrupt bytes make it into the MPM telemetry frame which is then accepted as good MPM telemetry frame, a protocol decoder having no or insufficient means to filter the corrupt protocol frame might cause follow up issues, e.g. buffer overruns which in turn might cause unpredictable other issues.

I have no experience with Spektrum telemetry so I can't if this is a likely scenario. Can you?

@raphaelcoeffic
Copy link
Member

crosschecking this with my tx16s on 2.9.2 and a rm external mpm shows proper mpm sport levels with or without connection to the radio's sport pin.

Yeah, maybe some pull-up / pull-down settings, that could be. But not sure why the thing would then fully break after a while...

@gagarinlg
Copy link
Member

So two issues
Check pull up/pull down settings
Fix parser to not crash when receiving bad packets

@mha1
Copy link
Contributor

mha1 commented Dec 18, 2023

@raphaelcoeffic

Yeah, maybe some pull-up / pull-down settings, that could be. But not sure why the thing would then fully break after a while...

read my hypothesis? If a protocol decoder doesn't throw out corrupt frames it's just a matter of probability if and when some data combination might be hit that can cause unpredictable errors, like writing to memory it shouldn't. Again just a hypothesis but fueled by the fact that there is no loss of serial if there is no connection made to the S.Port pin (no telemetry processing) or if telemetry is disabled in the settings. It really looks like something wrong fed to the protocol decoder cause this issue.

I believe we are looking at two problematic areas. One being the electrical issuse, the other being the question is the Sepktrum protocol resilient against corrupt data.

@mha1
Copy link
Contributor

mha1 commented Dec 18, 2023

talking about probability, check out the vast number of UART warnings. They are all parity and framing errors most likely due to the insufficient high level.

image

@ParkerEde
Copy link
Contributor Author

Just to make sure we're all on the same page. My friend gave me an X10S Express with his IRangeX+ MPM and his Blade130S to test. So the complete test setup. I have now tested with his components and the result is exactly the same as with my own components. I am very happy that we have all parts in duplicate and can therefore completely rule out a hardware error.

@raphaelcoeffic
Copy link
Member

@ParkerEde Did you try with all sensors deleted, meaning, you connect everything as usual, telemetry is ON, but no sensors are listed?

@raphaelcoeffic
Copy link
Member

Also, did you happen to test with the latest nightly build? (please backup everything before.... you will need the backup when going back to 2.9.2)

@ParkerEde
Copy link
Contributor Author

@ParkerEde Did you try with all sensors deleted, meaning, you connect everything as usual, telemetry is ON, but no sensors are listed?

yes, it's the same

Also, did you happen to test with the latest nightly build? (please backup everything before.... you will need the backup when going back to 2.9.2)

Yes, it's the same with main

@raphaelcoeffic
Copy link
Member

@ParkerEde Did you try with all sensors deleted, meaning, you connect everything as usual, telemetry is ON, but no sensors are listed?

yes, it's the same

Also, did you happen to test with the latest nightly build? (please backup everything before.... you will need the backup when going back to 2.9.2)

Yes, it's the same with main

Ok, so it might not be related to the Spektrum telemetry parsing as I though it could be....

@gagarinlg
Copy link
Member

Are you sure? They packets are coming anyways, so some parsing shouldbe done, right?

@raphaelcoeffic
Copy link
Member

Are you sure? They packets are coming anyways, so some parsing should be done, right?

Indeed, I just realised that in the case where the sensor is not present, we just won't store the value, but still do everything else.

@raphaelcoeffic
Copy link
Member

@ParkerEde I made you 2 firmwares, one with the Spektrum telemetry packet processing disabled and one where it is enabled. Both firmware are based on latest 2.9 branch. I added the regular one just to see if my GCC version has any influence on the issue (I use GCC 11.3).

X10 Express Test Firmware.zip

@ParkerEde
Copy link
Contributor Author

I have now run the "x10e-enabled-spektrum.bin" for 2 hours, 1 hour and 4 hours (three different tests). The problem did not occur once.

@ParkerEde
Copy link
Contributor Author

@raphaelcoeffic
I have now run the "x10e-disabled-spektrum.bin" for 2 hours, 1,5 hour and 5 hours (three different tests). The problem did not occur once.

@ParkerEde
Copy link
Contributor Author

I have now done a lot, a lot, a lot of tests and can say very precisely that it is due to PR #3055. The problem does not occur with commit d80adc6. From commit 72193ed I was able to reproduce the problem.

@ParkerEde
Copy link
Contributor Author

Hi @raphaelcoeffic , is there anything else I can do or find out about this problem that will bring us one step closer to a solution?

@pfeerick pfeerick removed the triage Bug report awaiting review / sorting label Feb 11, 2024
@ParkerEde
Copy link
Contributor Author

@raphaelcoeffic @3djc @mha1 @gagarinlg
Are there still efforts to solve this problem?

@ParkerEde
Copy link
Contributor Author

I have now sold my Horus X10S Express and am therefore no longer available for testing as I no longer have the hardware

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants