Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: spi: spi_mcux_lpspi: inconsistent chip select behaviour #16544

Closed
henrikbrixandersen opened this issue Jun 2, 2019 · 35 comments · Fixed by #82877
Closed

drivers: spi: spi_mcux_lpspi: inconsistent chip select behaviour #16544

henrikbrixandersen opened this issue Jun 2, 2019 · 35 comments · Fixed by #82877
Assignees
Labels
area: GPIO area: SPI SPI bus bug The issue is a bug, or the PR is fixing a bug platform: NXP Drivers NXP Semiconductors, drivers platform: NXP NXP priority: low Low impact/importance bug

Comments

@henrikbrixandersen
Copy link
Member

Describe the bug
The MCUX LPSPI SPI driver handles SPI chip selects inconsistently when comparing GPIO CS and "native" controller CS handling over multipart transfers.

With GPIO CS enabled (where the LPSPI controller does not control the CS line), the CS line is kept asserted through the entire transfer. This is opposed to what happens when the CS line is controlled by the LPSPI controller itself; then the CS line is deasserted between the different parts of the transfer.

To Reproduce

#include <zephyr.h>
#include <misc/printk.h>
#include <spi.h>

#define SPI_LABEL     "SPI_1"
#define GPIO_CS_LABEL "GPIO_0"
#define GPIO_CS_PIN   16

void main(void)
{
	struct spi_config spi_cfg;
	struct device *spi_dev;
	struct spi_cs_control spi_cs;
	u8_t cmd[2] = { 0xaa, 0x55 }; /* 2 bytes command */
	u8_t response[2];             /* 2 bytes response */

	const struct spi_buf tx_bufs[] = {
		{
			.buf = cmd,
			.len = sizeof(cmd),
		},
	};
	const struct spi_buf_set tx = {
		.buffers = tx_bufs,
		.count = ARRAY_SIZE(tx_bufs),
	};
	const struct spi_buf rx_bufs[] = {
		{
			.buf = NULL,
			.len = sizeof(cmd), /* 2 dummy bytes */
		},
		{
			.buf = response,
			.len = sizeof(response),
		},
	};
	const struct spi_buf_set rx = {
		.buffers = rx_bufs,
		.count = ARRAY_SIZE(rx_bufs),
	};

	spi_dev = device_get_binding(SPI_LABEL);
	if (!spi_dev) {
		printk("SPI device not found\n");
		return;
	}

	spi_cfg.operation = SPI_OP_MODE_MASTER | SPI_TRANSFER_MSB |
		SPI_WORD_SET(8);
	spi_cfg.frequency = 1000000;
	spi_cfg.slave = 2;

#if 0
	spi_cs.gpio_dev = device_get_binding(GPIO_CS_LABEL);
	if (!spi_cs.gpio_dev) {
		printk("SPI GPIO CS device not found\n");
		return;
	}
	spi_cs.gpio_pin = GPIO_CS_PIN;
	spi_cs.delay = 1; /* us */
	spi_cfg.cs = &spi_cs;
#else
	spi_cfg.cs = NULL;
#endif

	spi_transceive(spi_dev, &spi_cfg, &tx, &rx);
}

Expected behavior
The CS line remains asserted through all the parts of the multipart transfer.

Impact
Deasserting the CS line in the middle of a transfer causes problem e.g. when communicating with SPI EEPROMs.

Screenshots or console output
When using GPIO CS:
GPIO CS

When using LPSPI CS:
SPI CS

Environment (please complete the following information):

  • OS: GNU/Linux
  • Toolchain Zephyr SDK
  • Commit SHA or Version used: c88c919

Additional context
This was spotted when trying to use a SPI EEPROM with the TWR-KE18F board, but it is not limited to that board nor to the KE1xF SoC series, as far as I can tell.

@henrikbrixandersen henrikbrixandersen added bug The issue is a bug, or the PR is fixing a bug area: SPI SPI bus platform: NXP NXP labels Jun 2, 2019
@nashif nashif added the priority: medium Medium impact/importance bug label Jun 4, 2019
@ioannisg
Copy link
Member

@MaureenHelm is this issue being looked at?

@MaureenHelm
Copy link
Member

@MaureenHelm is this issue being looked at?

Not yet, but going to try to look at it tomorrow.

@MaureenHelm
Copy link
Member

This is caused by the underlying MCUX SDK LPSPI driver missing a feature to hold the chip select active after a transfer. The underlying MCUX SDK DSPI driver used on frdm_k64f has a kDSPI_MasterActiveAfterTransfer flag that we need in the LPSPI driver used on KE1xF and i.MX RT10xx. I will file an internal NXP ticket to add this feature.

Decreasing the priority of this bug from medium to low since we can work around it with GPIO CS.

@MaureenHelm MaureenHelm added priority: low Low impact/importance bug and removed priority: medium Medium impact/importance bug labels Aug 28, 2019
@henrikbrixandersen
Copy link
Member Author

This is still an issue with the most recent MCUX SDK (I was just bitten by this issue on bringing up a newly developed board today). Should we change all affected in-tree boards to use cs-gpios until this is resolved?

@agansari
Copy link
Collaborator

@henrikbrixandersen i've encountered similar issues with SPI on LPC55xxx, see my 2 pulls:

@pdgendt
Copy link
Collaborator

pdgendt commented May 17, 2023

Re-opening after discord discussion, as this issue is still relevant.

@pdgendt pdgendt reopened this May 17, 2023
@pdgendt pdgendt assigned decsny and unassigned MaureenHelm May 17, 2023
@decsny decsny removed the Stale label May 17, 2023
@aedancullen
Copy link
Contributor

aedancullen commented May 18, 2023

If CONFIG_SPI_MCUX_FLEXCOMM_DMA=y, this issue also appears with the spi_mcux_flexcomm driver.

(If CONFIG_SPI_MCUX_FLEXCOMM_DMA is not enabled, then spi_mcux_flexcomm has correct CS behavior regardless of whether GPIO or hardware CS is used.)

@github-actions github-actions bot added the Stale label Jul 18, 2023
@pdgendt pdgendt removed the Stale label Jul 18, 2023
@github-actions github-actions bot added the Stale label Sep 17, 2023
@DerekSnell DerekSnell removed the Stale label Sep 18, 2023
@zephyrproject-rtos zephyrproject-rtos deleted a comment from github-actions bot Sep 18, 2023
@zephyrproject-rtos zephyrproject-rtos deleted a comment from github-actions bot Sep 18, 2023
@decsny
Copy link
Member

decsny commented Jun 17, 2024

@henrikbrixandersen to be clear, I am finding the wrong behavior now with both gpios and native

... sorry about this, I just returned to this in the spirit of LTS bug fixing and realized I accidentally left the pin muxed to the native CS before when I tested the GPIO (both mux options were in the overlay), now I see the difference, woops

@decsny
Copy link
Member

decsny commented Jun 21, 2024

@henrikbrixandersen @pdgendt I found the cause of the problem, but fixing it requires relatively major changes, can you describe the reasons why do you want to use the native chip select instead of GPIO so we can determine priority of this

@henrikbrixandersen
Copy link
Member Author

@henrikbrixandersen @pdgendt I found the cause of the problem, but fixing it requires relatively major changes, can you describe the reasons why do you want to use the native chip select instead of GPIO so we can determine priority of this

We have been using GPIO CS until
now because of this issue, but they result in an increased latency compared to using the controller provided CS lines.

@decsny
Copy link
Member

decsny commented Jun 21, 2024

@henrikbrixandersen @pdgendt I found the cause of the problem, but fixing it requires relatively major changes, can you describe the reasons why do you want to use the native chip select instead of GPIO so we can determine priority of this

We have been using GPIO CS until now because of this issue, but they result in an increased latency compared to using the controller provided CS lines.

I am assuming you are using the synchronous API, so by latency do you mean the transceive function cpu time is significantly longer because of the need to control GPIO CS at the beginning and end of the transfer?

@henrikbrixandersen
Copy link
Member Author

I am assuming you are using the synchronous API, so by latency do you mean the transceive function cpu time is significantly longer because of the need to control GPIO CS at the beginning and end of the transfer?

Yes. The controlling the GPIOs at each end of the transaction takes up a significant portion of the total transaction time.

We've learnt to work around it. It would be nice to see it fixed, but it no longer has high priority for us.

@decsny
Copy link
Member

decsny commented Jun 21, 2024

I am assuming you are using the synchronous API, so by latency do you mean the transceive function cpu time is significantly longer because of the need to control GPIO CS at the beginning and end of the transfer?

Yes. The controlling the GPIOs at each end of the transaction takes up a significant portion of the total transaction time.

We've learnt to work around it. It would be nice to see it fixed, but it no longer has high priority for us.

Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to spi_transceive being clocked in/out, or after clocking out the the number of bits specified in the data frame size of the struct spi_config . operation flag?

@henrikbrixandersen
Copy link
Member Author

Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to spi_transceive being clocked in/out, or after clocking out the the number of bits specified in the data frame size of the struct spi_config . operation flag?

I would expect the chip select to be deasserted only after all buffers in a transaction has been clocked in/out.

@decsny
Copy link
Member

decsny commented Jun 21, 2024

Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to spi_transceive being clocked in/out, or after clocking out the the number of bits specified in the data frame size of the struct spi_config . operation flag?

I would expect the chip select to be deasserted only after all buffers in a transaction has been clocked in/out.

Okay, but if we take that to be the contract of the zephyr spi_transceive api, then I am wondering what is the purpose of the data frame size in the operation field of the spi_config.

To me it seems like the proper use of the api is to set the SPI_HOLD_ON_CS bit in the operation field of the spi_config struct, then release it with the spi_release api. I think otherwise it seems like the chip select delimits the amount of bits specified in the frame size. With the hold_on_cs bit set, then I guess as far as I can see the size of the frame is only seen to determine the appropriate return value in spi_transceive? Maybe @tbursztyka can help explain to me what is the expected behavior here. Either way the lpspi driver is currently implemented wrong.

@henrikbrixandersen
Copy link
Member Author

Hold on CS is for keeping the CS line asserted after the transaction (or across multiple transactions).

@tbursztyka
Copy link
Collaborator

Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to spi_transceive being clocked in/out, or after clocking out the the number of bits specified in the data frame size of the struct spi_config . operation flag?

I would expect the chip select to be deasserted only after all buffers in a transaction has been clocked in/out.

Okay, but if we take that to be the contract of the zephyr spi_transceive api, then I am wondering what is the purpose of the data frame size in the operation field of the spi_config.

To me it seems like the proper use of the api is to set the SPI_HOLD_ON_CS bit in the operation field of the spi_config struct, then release it with the spi_release api. I think otherwise it seems like the chip select delimits the amount of bits specified in the frame size. With the hold_on_cs bit set, then I guess as far as I can see the size of the frame is only seen to determine the appropriate return value in spi_transceive? Maybe @tbursztyka can help explain to me what is the expected behavior here. Either way the lpspi driver is currently implemented wrong.

As @henrikbrixandersen mentioned, CS line has to be asserted for the whole transaction. The buffers given to spi_transceive() are scatter-gather type, so the overall anyway represent one and only one transaction.

(The data frame size name in the documentation relates to the word size. This is mandatory to know how to interpret the buffers and how to r/w to/from the controller.
The SPI_HOLD_ON_CS is a specific configuration bit where you can request the CS line to stay asserted after the spi_transceive() call. There were a few use case supporting that feature back then.)

@decsny
Copy link
Member

decsny commented Jun 25, 2024

Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to spi_transceive being clocked in/out, or after clocking out the the number of bits specified in the data frame size of the struct spi_config . operation flag?

I would expect the chip select to be deasserted only after all buffers in a transaction has been clocked in/out.

Okay, but if we take that to be the contract of the zephyr spi_transceive api, then I am wondering what is the purpose of the data frame size in the operation field of the spi_config.
To me it seems like the proper use of the api is to set the SPI_HOLD_ON_CS bit in the operation field of the spi_config struct, then release it with the spi_release api. I think otherwise it seems like the chip select delimits the amount of bits specified in the frame size. With the hold_on_cs bit set, then I guess as far as I can see the size of the frame is only seen to determine the appropriate return value in spi_transceive? Maybe @tbursztyka can help explain to me what is the expected behavior here. Either way the lpspi driver is currently implemented wrong.

As @henrikbrixandersen mentioned, CS line has to be asserted for the whole transaction. The buffers given to spi_transceive() are scatter-gather type, so the overall anyway represent one and only one transaction.

(The data frame size name in the documentation relates to the word size. This is mandatory to know how to interpret the buffers and how to r/w to/from the controller. The SPI_HOLD_ON_CS is a specific configuration bit where you can request the CS line to stay asserted after the spi_transceive() call. There were a few use case supporting that feature back then.)

Okay, thanks, I think I was confused because in the LPSPI hardware, a word, frame, and transfer are 3 different things, then the NXP SDK has a different meaning of what a "transfer" is, and then the zephyr API has a different definition for these things as well than those. In the case of the LPSPI hardware, a "word" is how wide the writes to the transmit register are, whereas a "frame" consists of anywhere between 8-4K bits and therefore potentially multiple writes. I see the maximum data frame size in zephyr is 64 bits, is this because it's expected the register width will be up to 64 bits on some platforms? My question basically is, when I implement this driver, based on what I described, would it make sense to correlate the zephyr frame size to the LPSPI word size, and choose the lpspi "frame" size to be whatever is most convenient? Does the data frame size in the zephyr API impose/imply any structure on the contents of the spi buffers?

BTW I reread some of the comments in the header around the HOLD_ON_CS bit and what you are saying about it does make sense now when I read it like that. Again, I have been swimming in competing definitions of the same terms, so I was slightly confused at first.

@tbursztyka
Copy link
Collaborator

I see the maximum data frame size in zephyr is 64 bits, is this because it's expected the register width will be up to 64 bits on some platforms? My question basically is, when I implement this driver, based on what I described, would it make sense to correlate the zephyr frame size to the LPSPI word size, and choose the lpspi "frame" size to be whatever is most convenient? Does the data frame size in the zephyr API impose/imply any structure on the contents of the spi buffers?

The dfs size is all about the spi device you are dealing with and what the controller can support. When this API was designed, there were no identified devices or controllers able to deal with more than 32bits dfs. And I think on the controller's side I haven't seen any supporting 64bits since. Many are stuck to 8bits only even.
It does affect how you structure the buffers yes. An hypothetical 4bits dfs protocol on a spi device would require to use only 4 bits per-buffer byte. You cannot cram 2 frames on one byte, the api does not support such optimization (it would most likely require to shift the byte in the driver before giving it to the controller, it's a waste of time). And anyway since we make spi device drivers to be portable, they are using the most common controller dfs supported and buffers are meant to follow this.
Actually, I quickly looked, there is only one driver on a spi device that uses 16 bits as word size.

@decsny
Copy link
Member

decsny commented Jun 25, 2024

I see the maximum data frame size in zephyr is 64 bits, is this because it's expected the register width will be up to 64 bits on some platforms? My question basically is, when I implement this driver, based on what I described, would it make sense to correlate the zephyr frame size to the LPSPI word size, and choose the lpspi "frame" size to be whatever is most convenient? Does the data frame size in the zephyr API impose/imply any structure on the contents of the spi buffers?

The dfs size is all about the spi device you are dealing with and what the controller can support. When this API was designed, there were no identified devices or controllers able to deal with more than 32bits dfs. And I think on the controller's side I haven't seen any supporting 64bits since. Many are stuck to 8bits only even. It does affect how you structure the buffers yes. An hypothetical 4bits dfs protocol on a spi device would require to use only 4 bits per-buffer byte. You cannot cram 2 frames on one byte, the api does not support such optimization (it would most likely require to shift the byte in the driver before giving it to the controller, it's a waste of time). And anyway since we make spi device drivers to be portable, they are using the most common controller dfs supported and buffers are meant to follow this. Actually, I quickly looked, there is only one driver on a spi device that uses 16 bits as word size.

Given this, can I make the assumption that the data frames will be multiples of 8 bits? Because that would greatly simplify the implementation. LPSPI supports odd numbers of bits to be in the frame all the way up to 4K as I mentioned, and our HAL driver has a lot of control logic to account for this, if I can just make the assumption that the data frame / word size in the buffers from the zephyr API will be 8, 16, or 32 bits that would greatly simplify things. Is it expected by for the dfs in the operation field to ever be something like 9, 15, 27 or anything weird like this in zephyr?

BTW, the minimum word size for the LPSPI is 2 bits :o

@tbursztyka
Copy link
Collaborator

in practice expect multiple of 8bits yes. Many controllers or HAL, afaik, do not propose the possibility of a finer grain config of the word size. It was meant to be flexible, but the reality is that if you want to make portable code you need to go for the most commonly understood config.

Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Aug 31, 2024
@ofirshe
Copy link
Contributor

ofirshe commented Sep 4, 2024

I'm encountering the same issue as I described here #77999.
Any suggestions on how to resolve it ?

@decsny
Copy link
Member

decsny commented Sep 4, 2024

I'm encountering the same issue as I described here #77999. Any suggestions on how to resolve it ?

NXP is planning to contribute some rework of the lpspi driver soon, there is no quick fix, the existing driver just doesn't meet the zephyr api and is broken in a lot of ways. This issue is actually boiling over lately on a lot of fronts so it will be addressed soon

@ofirshe
Copy link
Contributor

ofirshe commented Sep 4, 2024

Thanks for your response, @decsny. I hope this issue gets resolved soon.

Copy link

github-actions bot commented Nov 4, 2024

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Nov 4, 2024
@decsny decsny removed the Stale label Nov 4, 2024
@ofirshe
Copy link
Contributor

ofirshe commented Nov 27, 2024

Hey,

Could you please let us know when NXP plans to address this issue? Currently, the driver defaults to relying on the hardware CS mode, which may not function as expected for new users attempting to utilize it.

Thanks.

@decsny
Copy link
Member

decsny commented Dec 5, 2024

I am working on it now.

@decsny
Copy link
Member

decsny commented Dec 13, 2024

after 6 years, we may have a solution finally to the oldest nxp platform issue in zephyr...

can anyone following this issue please try out the version of the driver in #82877 to see if it is working for your use case? as far as I can tell the native CS works with that now and the testing is all passing, but would like someone to see if it works when someone actually tries using it.

I am concerned if there is some race conditions that I have not found that the test doesn't catch or something. I found it difficult to support this API properly for the LPSPI due to the lpspi control fifo scheme being the CS control along with the stalling TX behavior. And I had to basically rewrite the entire driver to get this to work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: GPIO area: SPI SPI bus bug The issue is a bug, or the PR is fixing a bug platform: NXP Drivers NXP Semiconductors, drivers platform: NXP NXP priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.