Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/esp8266: improvements of I2C implementation #10127

Merged
merged 1 commit into from
Oct 10, 2018

Conversation

gschorcht
Copy link
Contributor

Contribution description

This PR

  1. improves the timing of the SDA and SCL signals which seems to critical in the I2C software implementation of the ESP8266,
  2. introduces the internal function _i2c_clear which clears the bus, and
  3. renames internal _i2c_*_sda and _i2c_*_scl functions to function names that are more clear, e.g., _i2c_clear_sda to _i2c_sda_low.

On 1:
The use of GPIOs in GPIO_OD_PU mode seems to be too slow for a some slaves when switching from PASSIVE HIGH to ACTIVE LOW and vice versa. Therefore, the GPIOs are now configured in GPIO_IN_PU mode with open-drain output driver. Signal levels are then realized as follows:

  • HIGH: The GPIO is used in the configured GPIO_IN_PU mode. In this mode, the output driver is in open drain mode and the GPIO is pulled up.
  • LOW: The output driver of the GPIO is temporarily activated (corresponds to the GPIO_OD_PU mode). The output value 0, which is written during initialization, then actively controls the output to low.

On 2:
Sometimes a slave blocks and drives the SDA line permanently low while SCL is kept high. Send some clock pulses in that case to clear the bus. This change improves the stability.

Testing procedure

Compile and flash the test application for a sensor which uses I2C. For example, I2C communication was not possible for MPU9250. MPU9150 driver can be used to test it:

make flash -C tests/driver_mpu9150 BOARD=esp8266-...

Issues/PRs references

Fixes #10115.

@miri64 miri64 added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: drivers Area: Device drivers Platform: ESP Platform: This PR/issue effects ESP-based platforms labels Oct 9, 2018
@miri64 miri64 requested review from MrKevinWeiss and dylad October 9, 2018 12:32
*/
gpio_init (_i2c_bus[dev].scl, GPIO_IN_PU);
gpio_init (_i2c_bus[dev].sda, GPIO_IN_PU);
gpio_clear (_i2c_bus[dev].scl);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an ESP user, but I am wondering why you need to use gpio_clear after declare it as input ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just for performance reasons. Logic HIGH is realized by disabling the output driver and leaving the output floating and pulled up to HIGH. Logically LOW is realized by activating the output driver and writing a 0 to drive the output active low. To avoid having to write the output every time you change from HIGH to LOW, this is done only once during initialization.

Copy link
Member

@dylad dylad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes look OK to me but I am not familiar with this family and I don't have any hardware to test..
What about you @MrKevinWeiss ?

@MrKevinWeiss
Copy link
Contributor

Yup yup, I will test and confirm. I have all of the hardware just a matter of having the time.

@MrKevinWeiss MrKevinWeiss added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Oct 10, 2018
@MrKevinWeiss
Copy link
Contributor

Test Findings with NodeMCU board (esp8266-esp-12x)

Manual testing was done with tests/periph_i2c firmware with the slave board being PHiLIP-N with fw rev 10009

So I played around with this and was not able to see a difference in quality on the scope between this and master. (white is master and yellow is this pr scl line)
esp8266i2c
esp8266i2coverlay

I don't know if the internal pullup resistors are different when using the input vs the od output with pullups.

Furthermore I was able to read constantly from PHiLIP (the testing tool) but when attaching the scope I recieved errors (this is probably because the scope adds capacitance).

From the signal quality measured, do you think it would be better to slow it down so it is more robust?

Regarding the lockup problem. I believe I know what you are talking about. It can be reproduced on many sensors by sending a i2c_read with NO_STOP flag followed by an i2c read and if the read data for the following byte has a high MSB (ex 0x80) and the address is low (ex <= 0x3F) it will lock up the scl line.

That behavior shouldn't happen with normal i2c transactions (no need for a repeated start on a read, only write).

I am not sure if sending in extra clocks is the correct thing to do, however, if we aren't thinking about multi-master setups I can't think why this would be a problem and it does seem to prevent an unpleasant lockup.

@gschorcht
Copy link
Contributor Author

I don't know if the internal pullup resistors are different when using the input vs the od output with pullups.

The pullups should be the same, but since there is no official documentation for ESP8266 and its pullups, nobody knows that.

From the signal quality measured, do you think it would be better to slow it down so it is more robust?

Unfortunatly, I couldn't take a look on signal quality with an oscilloscope. I only had a logic analyzer. What I can tell is that I had problems to communicate with a number of slaves with the master implementation, e.g., HTU21D, MPU9250. Downsizing the speed to I2C_SPEED_NORMAL or even I2C_SPEED_LOW didn't help. Since these slaves worked with the Ardunio core for ESP8266 without any problems, I took a look to the source. The difference was that the Arduino core uses the GPIOs in GPIO_IN_PU mode instead of GPIO_OD_PU for passive HIGH. After implementing passive HIGH in the same way, all these slaves are working fine.

From the signal quality measured, do you think it would be better to slow it down so it is more robust?

I don't think that it is necessary. The problem described above happened also at lower speeds. Also, I could use other slaves that support higher speeds with I2C_SPEED_FAST_PLUS. Other I2C implementations for ESP8266, for example esp-open-rtos, also use 400 kbps as the default speed. So I don't think that it is necessary to slow down.

Regarding the lockup

The problem was not that the SCL line was locked at low, but the SDA line. I could observe this sporadically with the master implementation. When I was looking through different I2C implementations because of this problem, I found the following comment in one I2C implementation: "Some slave devices will die by accident and keep the SDA in low level, in this case, master should send several clock to make the slave release the bus." There is no comment on how to deal with this situation in I2C specification.

Maybe it's not a problem anymore, but it does not bother. So I would leave it as it is.

if we aren't thinking about multi-master setups

If we would have no multi-master setups, a lot of things would be easier an more effiecient, e.g., the checks for arbitration losts. Should the RIOT I2C be multi-master capable? I didn't find any comment in I2C API documentation.

@MrKevinWeiss
Copy link
Contributor

Good enough reason for me for switching to the IN_PU.

I did mean the sda line not scl line actually (sorry). This is because the slave gets caught in the middle of clocking out it's data, holding the line down. The master can't do anything because in order to start or clock the sda must be free. It sounds like the problem I had a while back. Either way I think that is a good enough reason to include that in.

I don't think multi-master is that supported in RIOT (I am sure it can be used but more may be needed for proper error handling and recovery).

Copy link
Contributor

@MrKevinWeiss MrKevinWeiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvments, thanks for the work!
ACK!

@MrKevinWeiss MrKevinWeiss added Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines CI: needs squashing labels Oct 10, 2018
@MrKevinWeiss
Copy link
Contributor

Maybe squash and have a bit more detailed commit message then we should be good to go.

@dylad
Copy link
Member

dylad commented Oct 10, 2018

I don't think multi-master is that supported in RIOT (I am sure it can be used but more may be needed for proper error handling and recovery).

@MrKevinWeiss is right, we had multi-master support in mind during the I2C refactoring and we made small improvements with check for bus arbitration but the refactoring was already too big for handling it there. So there is still more to do if we really want to implement it.

@gschorcht
Copy link
Contributor Author

I don't think multi-master is that supported in RIOT (I am sure it can be used but more may be needed for proper error handling and recovery).

Perhaps a pseudomodule periph_i2c_mm should be defined in order to be able to explicitly activate the multi-master support if required. This would save some code size if multi-master setup is not needed. But maybe it would only be an advantage for software implementations and esp8266 i2c is the only software implementation where the code size is not a problem.

@dylad
Copy link
Member

dylad commented Oct 10, 2018

Perhaps a pseudomodule periph_i2c_mm should be defined in order to be able to explicitly activate the multi-master support if required

Actually this is not a bad idea but I think this can also benefit to hardware implementation.

The commit
- improves the timing of the SDA and SCL signals that fixes communication problems with some slaves (RIOT-OS#10115),
- introduces the internal function _i2c_clear which clears the bus when SDA line is locked at LOW, and
- renames internal _i2c_*_sda and _i2c_*_scl functions to function names that are more clear, e.g., _i2c_clear_sda to _i2c_sda_low.
@MrKevinWeiss
Copy link
Contributor

Yup but outside the scope of this PR. @dylad are we good to merge this one? Maybe we open a separate issue for better multi-master support?

@dylad
Copy link
Member

dylad commented Oct 10, 2018

@dylad are we good to merge this one

If your tests are satisfying then go !

Maybe we open a separate issue for better multi-master support?

Good idea, let's see if people are interested by such feature.

@MrKevinWeiss MrKevinWeiss merged commit 9a75ee0 into RIOT-OS:master Oct 10, 2018
@MrKevinWeiss MrKevinWeiss added the Impact: minor The PR is small in size and might only require a quick look of a knowledgeable reviewer label Oct 10, 2018
@gschorcht
Copy link
Contributor Author

Thanks

@gschorcht gschorcht deleted the esp8266_i2c_fix branch October 12, 2018 12:07
@gschorcht
Copy link
Contributor Author

@MrKevinWeiss

I am not sure if sending in extra clocks is the correct thing to do, however, if we aren't thinking about

Just for information, the bus clear procedure is described in I2C specification [1]:

"3.1.16 Bus clear
...
If the data line (SDA) is stuck LOW, the master should send nine clock pulses. The device
that held the bus LOW should release it sometime within those nine clocks. If not, then
use the HW reset or cycle power to clear the bus.
"
[1] NXP Semiconductors: The I²C-Bus Specification and User Manual, Rev. 6 — 4 April 2014

@MrKevinWeiss
Copy link
Contributor

Nice find! I wonder if the hardware i2c periphs can do that too. Something for me to experiment/read up on! (I am happy this is a master problem and not a slave)

@gschorcht
Copy link
Contributor Author

I wonder if the hardware i2c periphs can do that too.

When the SoC or MCU intergrates some standard I2C chip as IP core, probably yes. Otherwise, probably not. But even if all masters support it, I doubt that all slaves can. Therefore, the fallback option specified is to do a HW reset (if there is a pin at all) or to cycle power to clear the bus.

Unfortunately, I had to learn that my _i2c_clear implementation is not 100% compliant with the specification because I generate 10 clock pulses instead of 9. That's probably not a problem, but I should change when other changes are become necessary.

@gschorcht
Copy link
Contributor Author

@MrKevinWeiss While writing a driver for a sensor that is connected via I2C, I ran into a problem with mutex-based aquire/release handling. I'm asking you because of my impression that you were involved in the refactoring when the aquire/release approach was introduced and your are now responsible for i2c.

Let's suppose the current task is reading a number of data bytes from the sensor by calling any i2c_read_* function. The i2c device is aquired (the mutex is locked) and reading the data bytes is started. Now, the sensor generates an interrupt and the ISR has to read the interrupt status from the sensor. When it tries to access the i2c device by any i2c_read_* function, the device is busy and the trial to lock the mutex blocks the current thread in which the interrupt was generated, in our case the thread that is reading the data bytes from the sensor. That is, the ISR is blocked and does not return to the data reading thread which leads to a deadlock.

Was this case considered during the redesign?

@MrKevinWeiss
Copy link
Contributor

I think the i2c_acquire and release were there before the refactoring and functionality was not changed as far as I know (aside from exposing it in the tests). @dylad lead the refactoring but I will do my best maintain it.

Regarding the scenarios I am not sure how the scheduling in the riot kernel goes. I would assume if the mutex is locked (or the bus is acquired) when the ISR is trying to use the i2c bus it should try to acquire then yield due to the mutex being locked and (eventually) when the thread that has the mutex finally releases it it should be able to execute a read. This would be a priority inversion case (common with mutexes) but not a deadlock case (some RTOS's have special mutexes that raise the priority of the blocking thread until the mutex is free).

Am I thinking of the right scenario?

i2c_acquire
i2c_read(sensor_0)
>>> interrupt from sensor_1
>>> i2c_acquire (but can't because mutex is locked)
>> tasks that are higher priority than sensor_0 but less or equal to sensor_1 irq
(finish reading)
i2c_release
>>> i2c_read(sensor_1)
>>> i2c_release

@kaspar030
Copy link
Contributor

ISR's cannot call blocking mutex_lock()!

@gschorcht
Copy link
Contributor Author

ISR's cannot call blocking mutex_lock()!

Yes, this is exactly what I thought. However, this implies that an ISR must not use any I2C function. A possible approach would be to have a separate thread for the device driver and the ISR simply sends a message just to inform that an interrupt happened and has to be handled.
But, does it make really sense to create a separate thread for each device driver? Surely not. It would be in contradiction to the RIOT design rule to keep the number of threads small. And of course in principle, a device driver shouldn't be a separate process in an operating systems.

IMHO it is a very common case for interrupt driven devices that the source of the interrupt has to be determined within an ISR by accessing the device, in case of sensors accessing the i2c bus.

Might it be an architecture design question? IPC's msg_send for example is automatically non blocking in interrupt context.

@dylad
Copy link
Member

dylad commented Oct 19, 2018

this implies that an ISR must not use any I2C function

TBH I think this is a good approach because read I2C in a ISR context take much time (specially at 100kHz if we must read several bytes) I don't really like to block the whole program because ISR want to retrieve some data.

@kaspar030
Copy link
Contributor

Might it be an architecture design question?

There's not much we can do. ISR's share one stack, thus they cannot block without blocking all other ISRs, too. Would we implement a per-ISR stacks, that would have the same RAM overhead as threads.

IPC's msg_send for example is automatically non blocking in interrupt context.

Yes, and that is a source for hard-to-detect bugs. msg_send() from ISR does not send the message if the receiver cannot receive it (directly or via queue). That means if msg_send() is used to convey ISR events to user-space, they might get lost if the return value is not checked.

How would a mutex_lock() from ISR behave if the mutex is not free? Just become "mutex_try_lock()" automatically?

i2c_acquire() would need to check the return value for each mutex_lock() call, each caller of i2c_acquire() would need to check for the corresponding error, ...

All that added complexity for allowing i2c access from within ISRs, which kills any scheduling guarantees anyways.

The way to save threads would be to use a shared event handling thread, e.g., using the event module.

@gschorcht
Copy link
Contributor Author

@kaspar030

All that added complexity for allowing i2c access from within ISRs, which kills any scheduling guarantees anyways.

Agreed.

The way to save threads would be to use a shared event handling thread, e.g., using the event module.

Ok, I see, thanks for this hint, it was not easy to find it in documentataion. I will try to solve the ISR I2C access problem of the driver using the event queue.

@gschorcht
Copy link
Contributor Author

@dylad

I don't really like to block the whole program because ISR want to retrieve some data.

Although it takes only about 0.1 ms to get one interrupt status byte at 400 kHz, you are right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: drivers Area: Device drivers CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Impact: minor The PR is small in size and might only require a quick look of a knowledgeable reviewer Platform: ESP Platform: This PR/issue effects ESP-based platforms Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants