Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Proposal: FOTAService #46

Open
1 of 2 tasks
AGlass0fMilk opened this issue Mar 17, 2021 · 25 comments · May be fixed by #60
Open
1 of 2 tasks

Service Proposal: FOTAService #46

AGlass0fMilk opened this issue Mar 17, 2021 · 25 comments · May be fixed by #60
Labels
service Service proposal or submission

Comments

@AGlass0fMilk
Copy link
Member

BLE Standard Service:

  • Yes
  • No

Proposed UUID(s) or link to specification document:

See specification and documentation on fota-service branch here:

https://github.com/ARMmbed/mbed-os-experimental-ble-services/tree/5328cacea69452901a691b7380366b6a188cf503/services/FOTA

@pan- @noonfom @paul-szczepanek-arm I took aspects from our discussion in #13 and wrote up a more formal specification README, taking the LinkLossService as an example.

The newly-renamed FOTAService strips out some complexity from my original DFUService proposal. I think the new specification covers the most basic use cases while providing for more advanced features and extensions in the future.

I have yet to refactor the code, but I'm planning to split off the BlockDevice dependency into standard FOTAService::EventHandlers that we provide. A user can implement their own custom logic if they wish, but we can provide a basic event handler for writing to a given block device. This also serves as a way for us to provide modular extensions (eg: delta updates, trigger connection parameter updates, etc) in the future as part of our standard library of event handlers.

Before I go much further on the code and tests, I wanted to get your feedback.

Let me know what you think and if I missed any corner cases.

@AGlass0fMilk AGlass0fMilk added the service Service proposal or submission label Mar 17, 2021
@pan-
Copy link
Member

pan- commented Mar 17, 2021

Thanks for the comprehensive description. It looks excellent and it is easy to understand how the software would work thanks to the diagrams you provided.

I'm afraid of an overflow of the sequential fragment ID. Maybe we can pack fragments into block (not in the sense of BockDevice block) or packet of data that should not exceed a size of 256 * MTU -1. A new opcode can be introduced to start the packet transaction; it includes the packet size. A new status can be added to report that a packet has been successfully received.

To extend on that idea, it should be possible for a server to indicate to the client the recommended block size to avoid XOFF due to internal processing.

When XOFF is sent by the server, the client may have queued several fragment. In such case it won't know which block must be retransmitted. The fragment ID would be a valuable information to communicate when the server resume transfer with XON. In general, XOFF and XON just like SYNC_LOST could include the expected fragment ID byte.

One last thing: it would be useful to report to the client the version of the update service so it can adapt if something has to be changed in the flow.

@AGlass0fMilk
Copy link
Member Author

Thanks for the comprehensive description. It looks excellent and it is easy to understand how the software would work thanks to the diagrams you provided.

Thanks 😄

I'm afraid of an overflow of the sequential fragment ID. Maybe we can pack fragments into block (not in the sense of BockDevice block) or packet of data that should not exceed a size of 256 * MTU -1. A new opcode can be introduced to start the packet transaction; it includes the packet size. A new status can be added to report that a packet has been successfully received.

To extend on that idea, it should be possible for a server to indicate to the client the recommended block size to avoid XOFF due to internal processing.

What is your fear of the overflow? The DFUService (#13) I tested worked fine with an overflow at 127. For anything bad to happen, 256 SYNC_LOST notifications would need to be dropped, in which case the link integrity is probably so poor that an update will likely fail anyways.

I don't like the idea of introducing another complexity, the "block size" parameter.

@pan- Can you expand on what your fear of the fragment ID roll over is? I might be convinced 😁

When XOFF is sent by the server, the client may have queued several fragment. In such case it won't know which block must be retransmitted. The fragment ID would be a valuable information to communicate when the server resume transfer with XON. In general, XOFF and XON just like SYNC_LOST could include the expected fragment ID byte.

I think this is a reasonable and simple change.

One last thing: it would be useful to report to the client the version of the update service so it can adapt if something has to be changed in the flow.

Also reasonable and simple change. Should it be a separate characteristic or something like an op-code with a status notification response?

@pan-
Copy link
Member

pan- commented Mar 18, 2021

I'm afraid of clients with high latency which might miss the SYNC_LOST signal and continue while continuing to send fragments (I'm looking at you BlueZ!).
With a very similar transfer architecture, I've seen over a 100 fragments dropped before the client receive the error message.

We can try what is proposed for a start and let integrity checking to the firmware installer.

The client should be able to go backward if the fragment id reported in SYNC_LOSS is greater than the current fragment id on the server.

@AGlass0fMilk
Copy link
Member Author

@pan- Added fragment ID to flow control packets as suggested. 0843ce3

@AGlass0fMilk
Copy link
Member Author

@pan- Added FOTAVersionStringCharacteristic to specification

@AGlass0fMilk
Copy link
Member Author

@pan- @noonfom

I'm having trouble writing unit tests for the FOTA service. The service requires mbed::Span, which then includes mbed_assert.h, which subsequently includes a bunch of platform retargeting headers that then cause redefinition errors with platform defines. It also requires mbed_trace.h, which may complicate things.

Right now I have it at the point where this is the build error:

(python3-mbed) gdbeckstein@magic-man:~/Documents/ble-fota-dev/mbed-os-experimental-ble-services/tests$ cmake --build cmake_build
[3/3] Linking CXX executable UNITTESTS/FOTA/ble-service-fota-unittest
FAILED: UNITTESTS/FOTA/ble-service-fota-unittest 
: && /usr/bin/c++ -g -O0 --coverage -DUNITTEST -g -rdynamic UNITTESTS/FOTA/CMakeFiles/ble-service-fota-unittest.dir/test_FOTA.cpp.o UNITTESTS/FOTA/CMakeFiles/ble-service-fota-unittest.dir/home/gdbeckstein/Documents/ble-fota-dev/mbed-os-experimental-ble-services/services/FOTA/source/FOTAService.cpp.o -o UNITTESTS/FOTA/ble-service-fota-unittest  mbed-os/UNITTESTS/fakes/ble/libmbed-os-fakes-ble.a  mbed-os/UNITTESTS/fakes/libmbed-os-fakes-event-queue.a  lib/libgmock_maind.a  lib/libgmockd.a  lib/libgtestd.a  -lpthread && :
/usr/bin/ld: UNITTESTS/FOTA/CMakeFiles/ble-service-fota-unittest.dir/home/gdbeckstein/Documents/ble-fota-dev/mbed-os-experimental-ble-services/services/FOTA/source/FOTAService.cpp.o: in function `mbed::Span<unsigned char const, -1l>::Span(unsigned char const*, long)':
/home/gdbeckstein/Documents/ble-fota-dev/mbed-os-experimental-ble-services/tests/cmake_build/../mbed-os/platform/include/platform/Span.h:629: undefined reference to `mbed_assert_internal'
/usr/bin/ld: /home/gdbeckstein/Documents/ble-fota-dev/mbed-os-experimental-ble-services/tests/cmake_build/../mbed-os/platform/include/platform/Span.h:630: undefined reference to `mbed_assert_internal'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

I had to add

target_compile_definitions(
${TEST_NAME} PUBLIC MBED_CONF_BLE_SERVICE_FOTA_BSC_BUFFER_SIZE=128
MBED_CONF_BLE_SERVICE_FOTA_CONTROL_BUFFER_SIZE=16
MBED_CONF_BLE_SERVICE_FOTA_STATUS_BUFFER_SIZE=16
BLE_FEATURE_GATT_SERVER=1
)

to get the service to compile -- the LinkLossService CMake file does not seem to require these configurations.

I've pushed up my most recent changes to the fota-service branch so you can try to build/run the tests yourself if necessary.

Any assistance would be greatly appreciated! I will have to move onto integration testing at this point -- I have customer projects that need this service.

@pan-
Copy link
Member

pan- commented Mar 24, 2021

That's odd, I was thinking Span.h was part of the update we made on Mbed OS. @paul-szczepanek-arm Can you advise on this ?

@paul-szczepanek-arm
Copy link
Member

update we made on Mbed OS? You mean the cmake unit tests update? I don't think we use spans in tests anywhere so it's possible it doesn't work and will require a fake. Can you make a PR with the test? I'll see if we can include the span fix in the cmake unit PR.

@paul-szczepanek-arm
Copy link
Member

maybe the solution is to have a mbed_assert.h replacement

@noonfom
Copy link

noonfom commented Mar 24, 2021

@AGlass0fMilk It looks like the assert stub is not being compiled. It's a bit strange because other files in the same location are being linked. Also, we appear to include all stubs in the top-level unit testing cmake. We will need to investigate this further...

For now, can you add mbed_assert_stub.cpp to the target sources:

target_sources(${TEST_NAME}
    PRIVATE
        test_FOTA.cpp
        ${SERVICES_PATH}/FOTA/source/FOTAService.cpp
        ${MBED_PATH}/UNITTESTS/stubs/mbed_assert_stub.cpp
)

In relation to the config macros, I think you have to use target_compile_definitions because our setup currently does not support the mbed config system, i.e. we do not set the config path. However, I'd like to bounce this off @paul-szczepanek-arm as he's better positioned to comment on such issues.

@paul-szczepanek-arm
Copy link
Member

Misunderstood the problem, turns out there's an issue with mbed-os stubs. This is now tracked in jira.

@paul-szczepanek-arm
Copy link
Member

Add
mbed-stubs
into target_link_libraries(${TEST_NAME}
in your unit test.

The bigger problem is rebase onto github-ci. Are you OK with me recreating this branch with your code changes squashed on a branch from the github-ci?

@AGlass0fMilk
Copy link
Member Author

@pan- @paul-szczepanek-arm @noonfom
I wrote some integration tests for the FOTA service. Let me know if you see anything missing and if you can run the tests that would be fantastic:

https://github.com/ARMmbed/mbed-os-experimental-ble-services/tree/a7978c5af65868e66bc0c7aac1a7ef5e940d3387

@paul-szczepanek-arm
Copy link
Member

I enabled the tests on the new branch: ARMmbed:fota-service-github-ci please use that for your PR

@noonfom
Copy link

noonfom commented Mar 30, 2021

@AGlass0fMilk Thanks for adding the integration tests. I ran them on Windows 10; and, except for the usual problems with the bleak .NET backend, everything worked fine. However, I am curious: why do you wrap BSC writes in an asyncio.wait_for() w/timeout=0.2:

await asyncio.wait_for(self.client.write_gatt_char(UUID_BINARY_STREAM_CHAR, packet), timeout=0.2)

Other comments:

  • I ran the tests on a nucleo-wb55rg, which does not have the extra cordio-ll label, and this is likely to the be case for other targets too. Could you please move the cordio-ll.max-acl-size override into the nRF-specific config. in your mbed_app.json?

  • Do we need a data integrity test: test_data_correctness, as it's implicitly performed in the other tests?

  • I wonder should we provide a mechanism for testing FOTA Commit and other teardown sequences? For example, the client could initiate a disconnection once the binary has been transferred, like you suggest here.

  • I think we should move the common and descriptors folders into the new dependencies folder. Or, maybe we can add them to mbed-os-ble-utils, and then clone that during bootstrap? @pan- thoughts?

By the way, I really like the README. The sequence diagrams make it easy to understand the service and tests. IMO, we should add similar documentation for the other services.

@AGlass0fMilk
Copy link
Member Author

AGlass0fMilk commented Mar 30, 2021

However, I am curious: why do you wrap BSC writes in an asyncio.wait_for() w/timeout=0.2:

In my time working with the bleak library, I've encountered a strange issue where writes can take a very long time. The delay seems to be arbitrary and adding a small delay using asyncio.wait_for seems to resolve the issue.

See this comment here: hbldh/bleak#338 (comment)

I ran the tests on a nucleo-wb55rg, which does not have the extra cordio-ll label, and this is likely to the be case for other targets too. Could you please move the cordio-ll.max-acl-size override into the nRF-specific config. in your mbed_app.json?

Essentially what I'm trying to accomplish with these configurations is to set the preferred MTU to 128 bytes. Can you comment on what similar configuration parameters for the other BLE-enabled targets would be? I should mention I do all my testing on nRF52-based targets. I do have a Nucleo WB55 that I could test on however.

Do we need a data integrity test: test_data_correctness, as it's implicitly performed in the other tests?

We can remove it.

I wonder should we provide a mechanism for testing FOTA Commit and other teardown sequences? For example, the client could initiate a disconnection once the binary has been transferred, like you suggest here.

This is application-specific behavior and so I didn't think it should be tested. There is no real firmware update happening, just a testing of the transfer and error correction mechanisms.

I think we should move the common and descriptors folders into the new dependencies folder. Or, maybe we can add them to mbed-os-ble-utils, and then clone that during bootstrap? @pan- thoughts?

I think we can open the scope of this library a little bit to include things like Descriptor classes and extensions on the base GattAttributes and such. This discussion is also relevant for #18

By the way, I really like the README. The sequence diagrams make it easy to understand the service and tests. IMO, we should add similar documentation for the other services.

Thanks! I based the basic format off of the LinkLossService documentation. The README served as a guide as I developed the service and tests.

@pan-
Copy link
Member

pan- commented Apr 1, 2021

@AGlass0fMilk We really like the proposal and are willing to help to get this through. I think a first step would be to open a pull request, it is easier to comment code here. A second step could be to tell us which area you would need help with. We've been thinking of several:

  • The Mbed team can create a complete example from the service so we can exercise the API and improve the documentation explaining how to integrate the service in an application.
  • Unit testing: Contrary to integration tests which covers lots of ground, the service would benefits from unit testing. The Mbed team can complement what is present today but we need to know which area you want to cover.
  • Client: To use easily this service a client is required. I think we should explore how we can offer clients that would work on several platforms. But maybe that should happen in the discussions section.

@AGlass0fMilk
Copy link
Member Author

Hi @pan-, I just got back from a long weekend holiday. I will also be out the 9th through the 19th FYI.

I will open a pull request this week so your team can continue work on this.

* The Mbed team can create a complete example from the service so we can exercise the API and improve the documentation explaining how to integrate the service in an application.

Yes, that would be useful. Are you planning to actually showcase firmware being updated? If so, do you plan on bringing in something like mcuboot or something simpler (ie: not secure, for demonstration only)?

* Unit testing: Contrary to integration tests which covers lots of ground, the service would benefits from unit testing. The Mbed team can complement what is present today but we need to know which area you want to cover.

I was going to base the unit tests of those for the LinkLossService, eg: test the UUIDs added to the service and such. Additionally, the unit tests could make sure the service logic conforms to what is specified in the flow diagrams in the FOTAService readme.

* Client: To use easily this service a client is required. I think we should explore how we can offer clients that would work on several platforms. But maybe that should happen in the discussions section.

The easiest client to make would be to simply take the integration test python code and instead make a real FOTA client for it. This would cover use cases on Windows, Linux, and Mac, since bleak is cross-platform. If someone wants to use it with a different language like a C++ desktop app, they can base their implementation off the python code.

I would love to have example clients for Android/iOS. I actually got a Macbook set up recently to play with this on iOS but I haven't had time to yet.

Ultimately, what I'd like to do is make a cookiecutter or copier template that would let users quick start an Mbed application that uses the Mbed-OS mcuboot bootloader, BLE, and the FOTAService here.

@pan-
Copy link
Member

pan- commented Apr 7, 2021

Yes, that would be useful. Are you planning to actually showcase firmware being updated? If so, do you plan on bringing in something like mcuboot or something simpler (ie: not secure, for demonstration only)?

I think we would start with a mock then replace it with an mcuboot implementation.

A decent client written in python and bleak could be a good first step and it shouldn't be too complex to write.
However, the main targets for consumers of the API are Android/iOS. On a different project we did a client in C++ and interfaced it with the underlying platform. It worked well with Mbed OS, Android or even the browser with Emscripten and webbluetooth. However there's more works required to get there.

I really like the idea of a template generator for a Bluetooth project where all the boilerplate for firmware update and setting up BLE is made for you.

@AGlass0fMilk
Copy link
Member Author

@pan- Can you create an ARMMbed repository for the BLE DFU example? I'm working on the implementation now and I can put the work there so your team can view it and work on it as well.

@pan-
Copy link
Member

pan- commented Apr 7, 2021

I created one here and invited you for collaboration. As soon as we have something ready we will move it into mbed-os-example-ble.

@AGlass0fMilk
Copy link
Member Author

@pan- I see some work being done by @noonfom on some FOTA mocks and testing infrastructure.

Is there anything I could help with? I just got back from vacation. I have a lot of customer projects ongoing but many of them require BLE FOTA so I would like to continue pushing this to production quality.

@noonfom
Copy link

noonfom commented Apr 22, 2021

@AGlass0fMilk I hope you enjoyed your vacation!

@pan- and I were thinking that it would be good to add two FOTA Service examples to mbed-os-example-ble:

  1. A mock example that demonstrates the functionality of the service without a bootloader. This might showcase the sending of the binary via the Binary Stream Characteristic and the various mechanisms for ensuring data correctness, such as flow control and resynchronisation. The uploading of the binary could be mocked during a FOTA commit by reading it from the BlockDevice and writing it to the serial port to be checked by the client. This is similar to the application you wrote for the FOTA integration tests except for the inclusion of the BlockDevice. The idea here is to create an example that can be used as a starting point when integrating the FOTA Service with a bootloader.
  2. An example that demonstrates integration with MCUboot. This might be similar, if not identical, to the example you are working on in mbed-os-experimental-ble-dfu-example. It would be great if this example was a natural extension of the mock example.

I started working on (1) during our last team sprint. Basically, I took your example and stripped out MCUboot.

I also encountered the issue you reported here. Turns out the BLE trace helper for UUIDs was performing rogue memory accesses. Perhaps, you could cherry-pick this commit and try your example again?

Please let me know what you think of our plan for the examples and if there's anyway we can help with the FOTA Service PR.

@AGlass0fMilk
Copy link
Member Author

@noonfom Sorry for the long delay, I've been working on customer projects constantly since I got back.

I agree with the path forward that you've set -- it makes complete sense to have a basic example that doesn't put any constraints on the bootloader.

As for the FOTA service PR, I will open one soon.

According to the Mbed roadmap (and as was discussed on the governance call), there's a plan to integrate mcuboot into Mbed in Q4 of this year. I'm hoping we can move that up as that's what I've been working on at present. I think it's relevant to this service as well.

I have a bunch of scripts to automate building the bootloader and application, signing the application, and creating factory programming binaries (bootloader + signed app merged) as well as OTA update binaries (just signed application). These are all shell scripts written on Ubuntu though and I think we can streamline the whole process using CLI2/cmake. I am not yet familiar enough with cmake to come up with exactly how though.

@pan- @noonfom would you be interested in setting up a call where I can demo the state of my mcuboot + Mbed integration and we can formulate a plan to refine it so it's up to Mbed's quality standards and is easier to use?

@pan-
Copy link
Member

pan- commented May 6, 2021

According to the Mbed roadmap (and as was discussed on the governance call), there's a plan to integrate mcuboot into Mbed in Q4 of this year. I'm hoping we can move that up as that's what I've been working on at present. I think it's relevant to this service as well.

Definitely, that's why I don't want to wait Q4 to start working on the Mbed OS example with MCU boot, what you've done, what has been done with the CSPs example should help in moving MCU boot integration into Mbed faster. So what happen in Q4 is mostly upstreaming.

@pan- @noonfom would you be interested in setting up a call where I can demo the state of my mcuboot + Mbed integration and we can formulate a plan to refine it so it's up to Mbed's quality standards and is easier to use?

Please send an invite. I would be interested in discussing about MCUboot but also continuous integration and CMake integration.

@AGlass0fMilk AGlass0fMilk linked a pull request Jun 3, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service Service proposal or submission
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants