-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EasyDMA doesn't read from Flash #37
Comments
Experienced the same just today. |
4th option: Copy to ram When implementing embedded_hal traits where one do not have control over the external calling code it could be necessary to copy from flash to ram and then EasyDMA (which of course feels wrong) I needed to solve this problem so I created some rough implementation as prof of concept: (It needs some clean up before I can turn it into a pull request) |
@simonsso true, copying to RAM is always an option, however I'd like to avoid excess copies when not needed. I think the approach you're taking in simonsso@cee4745 is definitely interesting! |
I like the fact that your implementation throws an error when the What I do not really like is the implicit copy. It's nice if you just expect the SPI bytes to be pumped out. But for some cases (for example SPI abuse for driving LEDs) this can kill you. At least on the nRF52832 there was a very tiny time-gap between SPI transactions, which really can kill you in terms of timings (we had some major issues, even when directly chaining the transactions via the PPI on-chip bus). So I would rather have it such that we have a separate method to assemble that buffer which you can use in case of an erroneous transaction due to RAM boundaries. You could even chain that easily via results like this:
If you really always want to copy if things go south. Maybe you all see that differently, but I would prefer such a variant. |
Please elaborate, I don't understand your concern. If you are using this trait and have your buffer in RAM there will be no copying. The check in do_spi_dma_transfer is a double check when using the traits where buffer copy have already been done - it is needed for the traits which do not yet implement this feature. |
No, actually the trait will do the copying for you: https://github.com/nrf-rs/nrf52-hal/blob/master/nrf52-hal-common/src/spim.rs#L95 . It is 100% copying ;) This happens implicitly inside the write function, which I highly dislike. |
Yes at but at Line 84 there is the same check - The first part is the no copy version and the second part the copy version -- at least that is what the code is intended to do It must be done silently inside this write function otherwise this trait will not work for code using embedded_hal having the buffer in flash -- to rewrite all such code is not posible |
Yes I know it doesn't work from flash. And no, it doesn't have to happen silently. you can easily have it to be requiring a conscious copy before the transfer and otherwise it'll error. |
@Yatekii How do you mean we could we solve this in another way for a embedded_hal trait user? -- the same code must work on several platforms? If the buffer is required to be mutable it will end up in accessible memory without a copy - but this would not implement the traits as specified, or is there a way I am not aware of? |
Maybe to help you unstand why I don't like the implicit copy inside the
Above I already gave you an example how this could be done explicitely:
This is NOT a problem that the trait has/must fix. All the others don't have to copy either. Actually, I would argue that it's absolutely not in the traits spirit to copy implicitely. So as shown in the code sample, I would simply separate the copy code (also the batching code for above reason) form the |
In general, I think there are two opposing thoughts here:
In general, IMO the users of the In terms of performance impact, at 64MHz, copying 255 bytes takes about 64 clock cycles, plus a little bit more to retrigger the transaction. Let's call it 128 clock cycles. This is roughly the amount of time taken to send 1 bit at 500khz, or 1 byte at 4MHz. This is certainly nonzero, though likely not significant for most use cases. @Yatekii do you think people with high performance needs are likely to use the embedded hal traits, rather than either: using the nrf crate directly, or creating a wrapper type that implements the behavior they need? I feel like most people will investigate the actual source code when their hardware doesn't work, and if they realize that they should pre-buffer in RAM, they will remove the unexpected overhead. I also feel like most people will never notice this, or more likely notice the RAM used for buffering first. |
Sure, it's hard to build a working has-it-all interface. I don't expect that to be a thing. But when I give a buffer to a I agree that the performance overhead should be negligible, still if it can be avoided, I like that option. The trait is intentionally designed in such a way, that it can return a custom Error, so why not use that feat here? I really don't think it is asked too much to copy manually and just return an error if someone forgets that. I mean one of the cooler features are built in Result types that you HAVE to check to avoid a warning. Whenever I have to do that, I actually look in the docs what kind of errors could happen, which will actually tell me there is a caveat. If you just use the The problem is that investigating a seemingly trivial problem always takes the most time in my experience. I am not so much concerned about performance for the general case. But there is some cases (I had some of those) where it mattered. And writing your own interface even tho there is two unfavorable ones already seems kinda silly to me if we can prevent his simply by explicitly calling a second function :) One real good example where I would mind all of this: I wanted to build my Neopixel driver on the embedded-hal traits so any chip with an UART/SPI can use them. This would most likely not possible with this implementation :) It's not exactly fair to field this example here tho, as my driver needs non blocking DMA interface to enable the high throughput whilst still being able to calculate animations, but it's a real usecase I know of. I think both approaches are totally valid and I wont be pissed if either one gets chosen, but my vote goes to the explicit one. |
@Yatekii You make a very good point. I'm undecided right now, but you've almost got me convinced.
I think an argument can be made that a HAL, as the lowest-level safe API, shouldn't hide these things from the user. Instead it should make any potential problem explicit, and leave the "just works" to higher-level layers.
I think that depends on whether the people with high performance needs are writing a library or an application. I think it's reasonable to assume that a driver with very precise timing needs would be written using embedded-hal. I also wouldn't be surprised if some poor soul would be driven half-mad by weird timing issues, before thinking to look at the right code and discovering the implicit copy. So yeah, I'm still undecided, but definitely leaning in @Yatekii's direction. |
Right now in SPI driver there are both approaches the low level write function - (the one which takes a pin as argument) - It will fail with the error if buffer is not in RAM and the embedded_hal trait which copies the buffer. For me the case is very clear - when working with code from other sources implementing the same interfaces then it MUST work for the same for all platforms and all drivers. If in every driver I bring in to my project I must check not only what traits they need but also if they handle custom errors the intended way or not incompatibility will very quickly be an unmanageable problem. I was intending to implement the copy buffer for all functions but as I have been putting my arguments down in writing in this thread I see they are only important for the external traits in for instance embedded_hal. - @hannobraun I think embedded_hal should be this "just works"-higher level layer. @Yatekii wrote:
Correct only if the buffer is readonly in flash - this is not done if the caller provides a buffer in RAM then there is no copy and the buffer for 840 will be split only when it reaches the 16 bit limit. |
I am aware of all of this :) I never talked about any other case than the one where flash contents are copied to RAM. You mention that all HAL implementations should behave the same way. I agree completely, but the current implementation with the hidden copy fails to do exactly that. It silently adds an overhead and potentially different timing behavior when it decides it's necessary without ever notifying the user. Having a custom error was intended by the writers of the trait. I think it should be used. If we want perfect simplicity with random implicit behavior, we chose the wrong language I think. Also, if you write a lib that can potentially fail exactly that write, you can simply wrap the generic error type in your own error type and hand that out. And the final user of your lib gets perfect error handling. I don't even think there is a simplicity problem :) What I mean is this:
That's a very nice way to handle things and not make them complicated. |
I still agree with @Yatekii. I think the fact of the matter is, we can't make it behave the same on all platforms, because the platforms have different capabilities. The implicit copy does not make it behave the same, because on the level we're working on, timing matters. So the question simply becomes, which problem are we going to cause our users?
My vote is for option B, the explicit error. In my opinion, driving a few people crazy to shield a larger group from straight-forward problems is a bad trade-off. |
I prefer Option A. I write use some code which I want to run on several platforms and it also uses third party drivers for some hardware components. Those drivers use embedded_hal traits. I cannot put hardware dependent code where the trait is called. The With option B, the 3rd party driver will receive an Error when writing some of its read-only data to SPI but this will be detected in a place where it cannot be handled. I don't see how I could get those drivers working without allowing write with buffer in flash. |
I agree that this isn't the solution. A driver would need to pass that error to its back calling code. The solution would then be to copy the data to RAM in the first place, wherever that is appropriate. This could be in the driver code or in the calling code. A driver that receives a slice from its calling code could also decide to implement the same implicit copy that is currently implemented in our code. The difference would be that the driver author would have knowledge of the use case and its timing requirements, which we don't (and can't) have. I realize this sucks, and that it has a negative impact. However, that negative impact takes the form of an obvious error. Either I can fix it in my own code, or I at least I know where to open an issue or send a pull request. With the implicit copy, the fix for the problem would be exactly the same, except it would be non-obvious and require hours, days, maybe even weeks, of debugging. Granted, it would affect a smaller number of people. You also said the following earlier, which I think I have an answer for now:
I don't think it can be. It operates on the level of peripherals and it can't know whether those peripherals are used for something non-critical, or to bit-bang a highly time-sensitive protocol. A higher level layer that "just works" would have to be above that, in a driver API or something equally high-level. |
Maybe @jamesmunns @jscarrott & @wez can provide their opinion too. I think #50 depends on this too (does implicit batching). |
@Yatekii @hannobraun I haven't had a chance to look at this project for a few months (Hopefully work will quiet down again soon) so I don't feel I can really comment on the specifics of this issue. But I think I can talk generally. I do think this raises an interesting architecture issue, we have a library that runs on multiple platforms but can't always behave the same way on all platforms. This can't be the only place this sort of issue will arise and I would prefer a consistent approach. It might be simple in this case to add an implicit workaround but it might not be in other cases and so I would like to see all cases made explicit. As an aside I can't remember being upset at a library for forcing me to deal with an error case, but I definitely remember the times a library has tried to be clever and implicitly 'help' me. |
@nrf-rs/nrf52 do we maybe want to bring this up to the embedded wg, or maybe the embedded-hal team? I think this is probably worth discussing for the proposed "patterns book". I think the approach made by @simonsso is very good if we want to take a "works at all cost" approach (though, I think that is too strong of a wording, you can get optimal behavior if you know to send data out of RAM, though I admit it is also possible to get wrong and have no notification). It might be good to discuss with the embedded-hal team whether the goal is portability at all costs, or portable with optimal behavior as the primary goal. |
I found this issue after some confusion when SummaryI think the two main points brought up in this thread are both equally valid (summarized below):
To these points, issues have been brought up that seem contradictory to the above points:
ExampleTake a hypothetical example of a sensor that communicates via UART. This driver uses a 4-byte sync word before each command. For efficiency, the device driver crate stores this 4-byte sync word as a constant and sends it before each command to the device. The sync word is stored in flash, but commands are RAM-based. When an end-user comes and utilizes the provided device crate, it won't work without a copy from flash to RAM, so the user has to clone the device crate and modify it specifically for use with this platform, which breaks the modularity that crates provide. ProposalPerforming implicit copies without the end-users knowledge should be avoided, since this would not be a zero-cost abstraction. Instead, I propose that we develop a cargo feature within the
|
Makes sense to me. |
Update CI for nrf-rs
In general, EasyDMA does not work with items in Flash, e.g. a
&'static str
. From the datasheet:I did this accidentally, and did not receive any visible error when sending via UARTE. I think there would be 3 ways to address this:
debug_assert
that the slice ptr is in the RAM regionif !( RAM_LOWER <= buf.as_ptr() <= RAM_UPPER ) { return Err(...); }
The text was updated successfully, but these errors were encountered: