Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB MSC fails 'Command Set Test' from USB3CV. #26275

Closed
pabigot opened this issue Jun 18, 2020 · 16 comments · Fixed by #30293
Closed

USB MSC fails 'Command Set Test' from USB3CV. #26275

pabigot opened this issue Jun 18, 2020 · 16 comments · Fixed by #30293
Assignees
Labels
area: USB Universal Serial Bus bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug

Comments

@pabigot
Copy link
Collaborator

pabigot commented Jun 18, 2020

USB MSC fails 'Command Set Test' from USB3CV.
Platform: nRF52840dk (nRF52840dk_nrf52840).

Issuing DATA OUT
Issuing CSW : try 1
CSW residue returned = 0x0
CSW status returned = 0x0
Issuing Command Set Test for Op Code 0xaa, Test Variation #5
Issuing CBW (attempt #1):
|----- CBW LUN                  = 0x0
|----- CBW Flags                = 0x0
|----- CBW Data Transfer Length = 0x10000
|----- CBW CDB Length           = 0xc
|----- CBW CDB-00 = 0xaa
|----- CBW CDB-01 = 0x0
|----- CBW CDB-02 = 0x0
|----- CBW CDB-03 = 0x0
|----- CBW CDB-04 = 0x0
|----- CBW CDB-05 = 0x0
|----- CBW CDB-06 = 0x0
|----- CBW CDB-07 = 0x0
|----- CBW CDB-08 = 0x0
|----- CBW CDB-09 = 0x80
|----- CBW CDB-10 = 0x0
|----- CBW CDB-11 = 0x0
Issuing DATA OUT
DATA Bulk Request timed out!
Failed DATA phase : should have been success or stall!
(MSC: 5.2.3) Devices must consider the CBW meaningful if no reserved bits are set, the LUN number indicates a LUN supported by the device, bCBWCBLength is in the range of 1 through 16, and the length and content of the CBWCB field are appropriate to the SubClass.
Re-enumerating device
Detected hard error, issuing BOT MSC Reset


+ + + + + 

Getting Device Type
Issuing INQUIRY
Issuing CBW (attempt #1):
|----- CBW LUN                  = 0x0
|----- CBW Flags                = 0x80
|----- CBW Data Transfer Length = 0x24
|----- CBW CDB Length           = 0x6
|----- CBW CDB-00 = 0x12
|----- CBW CDB-01 = 0x0
|----- CBW CDB-02 = 0x0
|----- CBW CDB-03 = 0x0
|----- CBW CDB-04 = 0x24
|----- CBW CDB-05 = 0x0
CBW Bulk Request timed out!
Failed CBW phase, should have been successful!
(MSC: 5.2.3) Devices must consider the CBW meaningful if no reserved bits are set, the LUN number indicates a LUN supported by the device, bCBWCBLength is in the range of 1 through 16, and the length and content of the CBWCB field are appropriate to the SubClass.
Re-enumerating device
Detected hard error, issuing BOT MSC Reset
Failed INQUIRY request
(MSC: 5.7.1) All devices must support the commands marked as required in 4.2 based on PDT.  Devices that support all the commands marked as required for Bootable devices will be reported as Bootable.



READ CAPACITY(16) not supported.

The issue was introduced by #24696. cc @pabigot.

Originally posted by @emob-nordic in #14302 (comment)

@eobalski eobalski added area: USB Universal Serial Bus platform: nRF Nordic nRFx labels Jun 18, 2020
@pabigot
Copy link
Collaborator Author

pabigot commented Jun 18, 2020

I'm unable to replicate this on a Linux host with current master with the nrf52840dk_nrf52840 board-specific configurations removed and building samples/subsys/usb/mass with:

west build -b nrf52840dk_nrf52840 -- -DOVERLAY_CONFIG=overlay-ram-disk.conf
*** Booting Zephyr OS build zephyr-v2.3.0-442-g150e18de6e18  ***
[00:00:00.005,157] <inf> main: The device is put in USB mass storage mode.
tirzah[72]$ ll /media/pab/ZEPHYR\ \ USB/
total 1
-rw-r--r-- 1 pab pab 62 May 27  2005 README.TXT
tirzah[73]$ echo foo > !$/test
echo foo > /media/pab/ZEPHYR\ \ USB//test
tirzah[74]$ ll /media/pab/ZEPHYR\ \ USB/
total 1
-rw-r--r-- 1 pab pab 62 May 27  2005 README.TXT
-rw-r--r-- 1 pab pab  4 Jun 18 07:13 test
tirzah[75]$ 

If the overlay and configuration in samples/subsys/usb/mass/boards are not removed the sample uses the external flash even with the overlay (edit: however the exposed mass disk is from RAM, not external flash).

In neither case do I get the output described.

Presumably there's some external tool being used to produce the failure, so it'll have to be diagnosed by somebody who has access to it.

@eobalski
Copy link
Collaborator

Additional info: It looks like the Host is unable to perform 64kB write to the USB disk, it stops ~43kB
Here are logs from USB logic analyzer just after the fixes for USB MSC (author @jfischer-phytec-iot , #23307):
(7fdcdd8)
image

And here build from (88c0545) #24696
image

@pabigot
Copy link
Collaborator Author

pabigot commented Jun 18, 2020

What about at 28a6829, just before #24696's commits?

I really don't see how the changes in that commit could introduce the problem. The only change to the mass subsystem was aligning the page buffer.

@eobalski
Copy link
Collaborator

Neither do I but from that commit the test starts to fail. I will take some time to investigate it.

@eobalski eobalski self-assigned this Jun 18, 2020
@eobalski
Copy link
Collaborator

Works just fine with overlay-ram-disk.conf. Something is fishy in the way the device exposes the flash drive through USB interface.

@pabigot
Copy link
Collaborator Author

pabigot commented Jun 18, 2020

Works just fine with overlay-ram-disk.conf. Something is fishy in the way the device exposes the flash drive through USB interface.

That's a lot more plausible. I suspect a mismatch between the 4 KiBy block size and the block size the test and Windows is expecting. Maybe setting CONFIG_MASS_STORAGE_BULK_EP_MPS=4 would have an effect?

I'll look into this when I'm working zephyr next week.

@eobalski
Copy link
Collaborator

I don't think so. EP size is well defined by USB spec and its just number of Bytes that could be send at once. It has nothing to the block sizes of flash storage. Furthermore FS USB device Bulk endpoints could be 8, 16, 32 or 64 bytes in size.

@eobalski
Copy link
Collaborator

I'll look into this when I'm working zephyr next week.

I will do a pre-study of that.

@carlescufi carlescufi added the bug The issue is a bug, or the PR is fixing a bug label Jun 18, 2020
@pabigot
Copy link
Collaborator Author

pabigot commented Jun 23, 2020

Looking at the details of the test image capture above it appears the test involves trying to write FAT file system data to the mass storage device.

That isn't going to work with the littlefs file system, because the flash device is configured for littlefs operations using 16 4 KiBy blocks.

Does a generic mass storage device have to present a disk architecture that is compatible with a FAT file system? If not, the test failure is not due to a bug in Zephyr.

@eobalski
Copy link
Collaborator

Hi, the test tool is using FAT fs data to write/read form the device so I expected it to work when I change:

diff --git a/samples/subsys/usb/mass/boards/nrf52840dk_nrf52840.conf b/samples/subsys/usb/mass/boards/nrf52840dk_nrf52840.conf
index ae48b5c29f..8f6f69ff4c 100644
--- a/samples/subsys/usb/mass/boards/nrf52840dk_nrf52840.conf
+++ b/samples/subsys/usb/mass/boards/nrf52840dk_nrf52840.conf
@@ -8,7 +8,7 @@ CONFIG_NORDIC_QSPI_NOR_FLASH_LAYOUT_PAGE_SIZE=4096
 CONFIG_MASS_STORAGE_DISK_NAME="NAND"
 
 CONFIG_FILE_SYSTEM=y
-CONFIG_FILE_SYSTEM_LITTLEFS=y
+CONFIG_FAT_FILESYSTEM_ELM=y
 CONFIG_FLASH_LOG_LEVEL_INF=y
 
 CONFIG_FLASH_MAP=y

Same result as with littlefs. I think the file system may have nothing to it. I tried to find requirements for USB 3CV test tool when it comes to filesystem but I can not find any info about it. @jfischer-phytec-iot any clues?

@eobalski eobalski added the priority: low Low impact/importance bug label Jun 24, 2020
@eobalski
Copy link
Collaborator

eobalski commented Jul 9, 2020

I checked with Flash based app (internal flash, not the MX25R64) with LITTLEFS file system and the tests are passing. I am not sure what is going on here.
Patch applied before testing:
Removed ./samples/subsys/usb/mass/boards/nrf52840dk_nrf52840.overlay

diff --git a/samples/subsys/usb/mass/boards/nrf52840dk_nrf52840.conf b/samples/subsys/usb/mass/boards/nrf52840dk_nrf52840.conf
index ae48b5c29f..0f42340af7 100644
--- a/samples/subsys/usb/mass/boards/nrf52840dk_nrf52840.conf
+++ b/samples/subsys/usb/mass/boards/nrf52840dk_nrf52840.conf
@@ -1,7 +1,3 @@
-# Storage is on MX25R64
-CONFIG_NORDIC_QSPI_NOR=y
-CONFIG_NORDIC_QSPI_NOR_FLASH_LAYOUT_PAGE_SIZE=4096
-
 # Must match the registered name for the selected disk access variant:
 # "RAM" via DISK_RAM_VOLUME_NAME for DISK_ACCESS_RAM
 # "NAND" via DISK_FLASH_VOLUME_NAME for DISK_ACCESS_FLASH
@@ -11,12 +7,14 @@ CONFIG_FILE_SYSTEM=y
 CONFIG_FILE_SYSTEM_LITTLEFS=y
 CONFIG_FLASH_LOG_LEVEL_INF=y
 
+CONFIG_MPU_ALLOW_FLASH_WRITE=y
+
 CONFIG_FLASH_MAP=y
 CONFIG_FLASH_PAGE_LAYOUT=y
 
 CONFIG_DISK_ACCESS_FLASH=y
-CONFIG_DISK_FLASH_DEV_NAME="MX25R64"
-CONFIG_DISK_FLASH_START=0x0
+CONFIG_DISK_FLASH_DEV_NAME="NRF_FLASH_DRV_NAME"
+CONFIG_DISK_FLASH_START=0xf8000
 CONFIG_DISK_VOLUME_SIZE=0x10000
 CONFIG_DISK_FLASH_MAX_RW_SIZE=4096
 CONFIG_DISK_FLASH_ERASE_ALIGNMENT=0x1000

There are tons of errors form mass storage class saying:

<err> usb_msc: !! Disk Read Error 127 !
. . .
<err> usb_msc: !!!!! Disk Write Error 124 !!!!!

@github-actions
Copy link

github-actions bot commented Sep 8, 2020

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Sep 8, 2020
@eobalski eobalski removed the Stale label Sep 16, 2020
@eobalski
Copy link
Collaborator

I had another look into this issue. The fact that the test is failing is because the write is so slow that it gets time outed. This is not the fail by itself. The test fails due to fact than in case of timeout the device is being Reset through MSC reset. This is the part that does not work and need fix. Im working to find a solution for that.

@eobalski
Copy link
Collaborator

The fact that the test is failing exposed two weaknesses within the sample:

  • mass storage has no good mechanism for reset recovery (I have workaround that for Nordic).
  • mass storage for nRF52840dk is using external onboard flash memory (64kB) accessed with QSPI. The sample uses disk access API to access the exposed disk for writes and reads. I didn't go much into detail but AFAICT the qspi API is blocking and writes/reads are pretty slow.
    Documentation for external flash used in nRF52840DK:
    Screenshot from 2020-10-30 12-02-06

Measurements:
saleale_mass_storage
legend:
get 64B - time measure of reading from USB Ep
ep_out_cb - toggled each time the callback is called, meaning that the data has arrived (64B)
Qd write - writing to disk time.
After each 8th read form EP there is exactly 512 Bytes of data stored, 64 (ep buf size) * 8 = 512. Mass storage queues the task to write the data to the disk. This is blocking everything until finished. The write takes up to 120ms. Because of that the USB cannot respond to any traffic for up to this time making it fail the test.

Additional context

Screenshot from USB logic analyzer of the USB traffic:
ellisys_mass_storage
The writing is blocked that over 1100 packets are dropped because USB is not able get the CPU.

Screenshot from USB3CV test tool:
usb3cv_mass_storage

This is not easy task to be fixed as a simple bug fix. To fix that we need:

  • async QSPI API
    and/or
  • async disk access API which IMO should internally handle async writes/reads build on top of backends used.

Summon @pabigot, @kamlaz, @carlescufi (fyi) + whoever is responsible for disk access API. Could not any Codeowner for that.

@carlescufi
Copy link
Member

This is not easy task to be fixed as a simple bug fix. To fix that we need:

  • async QSPI API
    and/or
  • async disk access API which IMO should internally handle async writes/reads build on top of backends used.

For the time being the proposed solution would be instead to offload the calls to the disk access API to a separate queue or thread, and find a way to report to the Host that the data has not been written yet to prevent new data from coming in.

@eobalski
Copy link
Collaborator

...

For the time being the proposed solution would be instead to offload the calls to the disk access API to a separate queue or thread, and find a way to report to the Host that the data has not been written yet to prevent new data from coming in.

There is no documented way to stop Host from sending when the device is busy. USB is driven by Host and the only thing we can do i not accepting the data, what we clearly do. The problem is writes are slow and the test timeouts.

@jfischer-no jfischer-no assigned jfischer-no and unassigned eobalski Nov 16, 2020
jfischer-no added a commit to jfischer-no/zephyr that referenced this issue Nov 25, 2020
Due to relatively long sector-erase times of the QSPI in SoC Flash,
and the fact that MSC block size and Flash sector size are different,
and disk subsystem accesses are not cached,
this sample fails during USB CV test.
Reduce partition size for nrf52840dk_nrf52840 as a workaround.

The RAM disk configuration is not affected by this
and has shorter runtimes during testing.

Resolves: zephyrproject-rtos#26275

Signed-off-by: Johann Fischer <[email protected]>
jfischer-no added a commit to jfischer-no/zephyr that referenced this issue Nov 27, 2020
USB MSC sample has been expanded over time. Config overlays
for RAM disk and flash disks were added. Board specific overlays
followed. It was overlooked that they forced a specific
configuration for the nrf52840dk_nrf52840 board,
even if it was not explicitly desired
(for example RAM-disk). This also caused strange behaviour
during automatic MSC USB3CV tests (which explicit selects RAM-disk)
so that nrf5340dk_nrf5340_cpuapp board passed test
but nrf52840dk_nrf52840 failed.

Rework disk and file system configuration, and initialization
code, allow to use FAT file system on top of RAM disk.

This sample can be built with none or one of two supported
file systems, LittleFS or FATFS. Disk subsystem can be flash
or RAM based. LittleFS only works with flash disk.
There are four useful possibilities:
- RAM disk without any file system for testing (default)
- RAM disk with FAT file system
- flash disk with FAT file system
- flash disk with LittleFS
Flash disk configurations is only available (as before) for
nordic,qspi-nor compatible, but only need the device tree overlay
per platform without a config overlay.

This path also revises test cases.
Remove invalid "flash" tag from depends_on key.
Remove unnecessary gpio tag and exclude native platform.

Resolves: zephyrproject-rtos#26275

Signed-off-by: Johann Fischer <[email protected]>
jfischer-no added a commit to jfischer-no/zephyr that referenced this issue Nov 30, 2020
USB MSC sample has been expanded over time. Config overlays
for RAM disk and flash disks were added. Board specific overlays
followed. It was overlooked that they forced a specific
configuration for the nrf52840dk_nrf52840 board,
even if it was not explicitly desired
(for example RAM-disk). This also caused strange behaviour
during automatic MSC USB3CV tests (which explicit selects RAM-disk)
so that nrf5340dk_nrf5340_cpuapp board passed test
but nrf52840dk_nrf52840 failed.

Rework disk and file system configuration, and initialization
code, allow to use FAT file system on top of RAM disk.

This sample can be built with none or one of two supported
file systems, LittleFS or FATFS. Disk subsystem can be flash
or RAM based. LittleFS only works with flash disk.
There are four useful possibilities:
- RAM disk without any file system for testing (default)
- RAM disk with FAT file system
- flash disk with FAT file system
- flash disk with LittleFS
Flash disk configurations is only available (as before) for
nordic,qspi-nor compatible, but only need the device tree overlay
per platform without a config overlay.

This path also revises test cases.
Remove invalid "flash" tag from depends_on key.
Remove unnecessary gpio tag and exclude native platform.

Resolves: zephyrproject-rtos#26275

Signed-off-by: Johann Fischer <[email protected]>
carlescufi pushed a commit that referenced this issue Dec 2, 2020
USB MSC sample has been expanded over time. Config overlays
for RAM disk and flash disks were added. Board specific overlays
followed. It was overlooked that they forced a specific
configuration for the nrf52840dk_nrf52840 board,
even if it was not explicitly desired
(for example RAM-disk). This also caused strange behaviour
during automatic MSC USB3CV tests (which explicit selects RAM-disk)
so that nrf5340dk_nrf5340_cpuapp board passed test
but nrf52840dk_nrf52840 failed.

Rework disk and file system configuration, and initialization
code, allow to use FAT file system on top of RAM disk.

This sample can be built with none or one of two supported
file systems, LittleFS or FATFS. Disk subsystem can be flash
or RAM based. LittleFS only works with flash disk.
There are four useful possibilities:
- RAM disk without any file system for testing (default)
- RAM disk with FAT file system
- flash disk with FAT file system
- flash disk with LittleFS
Flash disk configurations is only available (as before) for
nordic,qspi-nor compatible, but only need the device tree overlay
per platform without a config overlay.

This path also revises test cases.
Remove invalid "flash" tag from depends_on key.
Remove unnecessary gpio tag and exclude native platform.

Resolves: #26275

Signed-off-by: Johann Fischer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: USB Universal Serial Bus bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: low Low impact/importance bug
Projects
None yet
4 participants