Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESP32 OTA] Critical fix - Make sure that the BT coreTask gets suspended when not needed #795

Merged
merged 2 commits into from
Nov 6, 2020
Merged

Conversation

mtiutiu
Copy link
Contributor

@mtiutiu mtiutiu commented Nov 6, 2020

After some tinkering with this project (which I very much like now so congrats for the great work) I spotted a flaw which seems to be pretty critical for me. Maybe someone else is experiencing it also but let's try to give some details first.

Steps to reproduce:

  • Build a ESP32 gateway that supports Bluetooth also
  • Upload firmware via serial
  • Configure WiFi and other stuff if this is the first time configuration
  • Let it run and monitor the system via platformio run --target monitor in another terminal
  • After it gets configured and connected to your personal access point try to upload the firmware again using only OTA this time

Expected results:

  • The upload process should go smooth until the end with no failures

Actual results:

  • OTA process fails somewhere in the middle or almost at the end sometimes (even in the beginning from time to time)

When inspecting the monitoring terminal we see this:

N: Stop BLE processing
N: Found 9 devices, scan number 2 end deinit controller
E (64831) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (64890) task_wdt:  - IDLE0 (CPU 0)
E (64901) task_wdt: Tasks currently running:
E (64901) task_wdt: CPU 0: ipc0
E (64901) task_wdt: CPU 1: IDLE1
E (64901) task_wdt: Aborting.
abort() was called at PC 0x401039f8 on core 0

Backtrace: 0x400942f8:0x3ffbe190 0x40094529:0x3ffbe1b0 0x401039f8:0x3ffbe1d0 0x400876fd:0x3ffbe1f0 0x40083139:0x3ffb9d60 0x40089b82:0x3ffb9d80 0x40082e3f:0x3ffb9da0 0x4009071d:0x3ffb9dc0
  #0  0x400942f8:0x3ffbe190 in invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:707
  #1  0x40094529:0x3ffbe1b0 in abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:707
  #2  0x401039f8:0x3ffbe1d0 in task_wdt_isr at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/task_wdt.c:252
  #3  0x400876fd:0x3ffbe1f0 in _xt_lowint1 at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/xtensa_vectors.S:1154
  #4  0x40083139:0x3ffb9d60 in esp_intr_noniram_enable at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/intr_alloc.c:776
  #5  0x40089b82:0x3ffb9d80 in spi_flash_op_block_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/spi_flash/cache_utils.c:88
  #6  0x40082e3f:0x3ffb9da0 in ipc_task at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/ipc.c:62
  #7  0x4009071d:0x3ffb9dc0 in vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)

After digging in I noticed that there's a dedicated coreTask that gets pinned to the core 0 of the ESP32 wonder chip and has a priority of 1. Added to that there's a flag called ProcessLock which makes the coreTask trigger the BLE detection or not. So far so good but its logic has a flaw in the sense that we cannot let a FreeRTOS task sitting in an infinite loop and just keep it doing almost nothing in the else branch when OTA is in progress.

Let's paste here the original code so that we can see what the else branch does:

if (!ProcessLock) {
 ....
 do the BLE stuff
 ...
} else {
  Log.trace(F("BLE core task canceled by processLock" CR));
}

If the debug level is not set to TRACE then the else branch does NOTHING !!! If I set the debug level to TRACE then the task does something more or less useful - I let you decide here 😃.

So in the end it must be either suspended because it doesn't do anything useful anyway or maybe use a vTaskDelay so that other tasks pinned to the same core get a chance to run and in this case one is important in particular - the IDLE task which has priority 0 and that task also feeds the watchdog. See where I'm going? That's why the above ugly thing happens and the watchdog gets triggered which resets the chip and bye bye OTA.

This PR should fix this issue and I already tested it actually because I wasn't able to reflash my GW anymore via OTA after enabling BT support as it was crashing constantly with the above exception.

The fix is pretty simple without changing too much the existing logic and what it does basically is to suspend the coreTask when locking is requested and resume it when locking is not required anymore - simple as that. After applying this fix I was able to perform OTA updates without a glitch.
The only thing that happens and which is normal BTW is when your sketch occupies more that 90% of flash memory and that is - [E][ArduinoOTA.cpp:344] _runUpdate(): Update ERROR: Not Enough Space. But this is not related to the current issue anyways (for me it happened because I wanted a Gateway that supports both BLE and RF - but it worked after a few retries 😃 ).

Hope it helps. Please feel free to add comments or if there's something wrong in what I said above the corrections are welcome.

Thanks and keep up the good work!

@1technophile 1technophile added this to the v0.9.6 milestone Nov 6, 2020
@1technophile
Copy link
Owner

Great finding, thanks!

@1technophile 1technophile merged commit a6e6d98 into 1technophile:development Nov 6, 2020
@DigiH
Copy link
Collaborator

DigiH commented Mar 27, 2021

Thanks for this. Now I can delete my "Set scan interval to 2 minutes" shortcut in MQTT.fx, which I always used just before OTA updates :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants