Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer.read_us takes 8.5µs, making wait_us unable to wait for less than that on Nucleo F103RB #7397

Closed
LaurentLouf opened this issue Jul 2, 2018 · 17 comments

Comments

@LaurentLouf
Copy link
Contributor

LaurentLouf commented Jul 2, 2018

Description

  • Target : Nucleo F103RB
  • Toolchain : GNU ARM Embedded Toolchain 7-2017q4-major Release
  • Mbed-CLI 1.7.3 (mbed --version), mbed compile --profile release
  • Mbed-OS : #5bf483ee1
  • Code to reproduce issue
#include "mbed.h"
Timer timer;

int main() {

    timer.start();
    DigitalOut test(PA_1);
    int t1, t2, t3, t4, t5, t6, t7, t8;
    int t11, t12, t13, t14, t15, t16, t17, t18;

    while (1) {
        t1 = timer.read_us();
        wait_us(1);
        t2 = timer.read_us();
        wait_us(1);
        t3 = timer.read_us();
        wait_us(1);
        t4 = timer.read_us();
        wait_us(1);
        t5 = timer.read_us();
        wait_us(1);
        t6 = timer.read_us();
        wait_us(1);
        t7 = timer.read_us();
        wait_us(1);
        t8 = timer.read_us();

        t11 = timer.read_us();
        test = 0;
        test = 1;
        test = 0;
        test = 1;
        test = 0;
        test = 1;
        test = 0;
        test = 1;
        test = 0;
        test = 1;
        test = 0;
        test = 1;
        test = 0;
        test = 1;
        t12 = timer.read_us();
        test = 0;

        printf("%ld \t%ld \t%ld \t%ld \t%ld \t%ld \t%ld \t%ld\r\n\t%ld \t %ld\r\n", t1, t2, t3, t4,
               t5, t6, t7, t8, t11, t12);
        wait(0.5);
    }
}

Which produces

38 55 73 90 107 125 142
150 158

And more graphically ee9464984aba2d169cc824ea930dd76c.png
9a4e4eaf0f4b1d23e2317ddaa078e9b5.png

So I was investigating problems with timings, and one thing led to another, I realized that timer.read_us and wait_us seem to be quite slow on my board. Since they both use ticker_read_us, I guess the problem is somewhere there. Thing is, I've Googled and it seems this problem isn't widespread (one thing among others, it breaks OneWire timings, so there should be a few reports on that), so that may come from me, but the example code is pretty straightforward. Any idea ?

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug

@jeromecoutant
Copy link
Collaborator

jeromecoutant commented Jul 2, 2018

Hi

Yes, we got an issue since version 5.9.1

This is tracked by #7316

First patch is #7352, but every thing looks not solved.
Maybe you can try on your side ?

Thx

[Mirrored to Jira]

@LaurentLouf
Copy link
Contributor Author

LaurentLouf commented Jul 2, 2018

I applied the patch #7352 , re-compiled using the same settings and the results are the same.
[Mirrored to Jira]

@jeromecoutant
Copy link
Collaborator

jeromecoutant commented Jul 4, 2018

Hi @LaurentLouf
Could you try latest version of #7352 ?
Thx

[Mirrored to Jira]

@bcostm
Copy link
Contributor

bcostm commented Jul 5, 2018

I don't think this PR will change something because it modifies the HAL_GetTick() which is only used by HAL_Delay() and not wait_us().

I changed a little bit your example and I did this instead:

    while (1) {
        test = 1;
        t1 = timer.read_us();
        wait_us(1);
        test = 0;
        t2 = timer.read_us();
        wait_us(5);
        test = 1;
        t3 = timer.read_us();
        wait_us(10);
        test = 0;
        t4 = timer.read_us();
        wait_us(50);
        test = 1;
        t5 = timer.read_us();
        wait_us(100);
        test = 0;
        t6 = timer.read_us();
        wait_us(500);
        test = 1;
        t7 = timer.read_us();
        wait_us(1000);
        test = 0;
        t8 = timer.read_us();
        printf("1us=%ld  5us=%ld  10us=%ld  50us=%ld  100us=%ld  500us=%ld  1000us=%ld\r\n", t2-t1, t3-t2, t4-t3, t5-t4, t6-t5, t7-t6, t8-t7);
        wait(0.5);
    }

and the result is:

1us=16  5us=19  10us=24  50us=65  100us=114  500us=514  1000us=1012
1us=16  5us=19  10us=24  50us=65  100us=113  500us=514  1000us=1013
1us=15  5us=20  10us=24  50us=65  100us=114  500us=513  1000us=1014
1us=16  5us=20  10us=23  50us=65  100us=114  500us=514  1000us=1013
1us=15  5us=20  10us=24  50us=65  100us=114  500us=513  1000us=1014
1us=16  5us=16  10us=23  50us=65  100us=114  500us=513  1000us=1014
1us=16  5us=23  10us=24  50us=65  100us=114  500us=514  1000us=1013
1us=16  5us=16  10us=24  50us=64  100us=114  500us=513  1000us=1014

Below 100us delay, the wait_us() is not precise. I don't know what precision we can expect with such low delays ?

[Mirrored to Jira]

@LaurentLouf
Copy link
Contributor Author

LaurentLouf commented Jul 6, 2018

Investigating a little bit further, the problems seems to come from ticker_read_us . I used this very simple sketch and watched on the scope the result. Without any modification, the burst's length is around 8µs. Commenting out the call to slicetime() in Timer::read_high_resolution_us makes the burst length shrink to a bit less than 2µs.

#include "mbed.h"
Timer timer;

int main() {
    timer.start();
    DigitalOut test(PA_1);

    while (1) {
        test = 1;
        timer.read_us();
        test = 0;
        wait(0.5);
    }
}

So it basically comes down to that

us_timestamp_t ticker_read_us(const ticker_data_t *const ticker)
{
    initialize(ticker);

    core_util_critical_section_enter();
    update_present_time(ticker);
    core_util_critical_section_exit();

    return ticker->queue->present_time;
}

Since initialize is only performed once, this leaves us with update_present_time that may be too slow.

[Mirrored to Jira]

@jeromecoutant
Copy link
Collaborator

jeromecoutant commented Jul 6, 2018

Hi @bulislaw @c1728p9
Can you check last comment ?
Thx
[Mirrored to Jira]

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 13, 2018

Nucleo F103RB - what is core clock for this target? How does this compare to F4 for instance?
What profile is being used for this measurements (how high are optimization on) ?

@LaurentLouf What are your timing requirements ?
[Mirrored to Jira]

@LaurentLouf
Copy link
Contributor Author

LaurentLouf commented Jul 16, 2018

@0xc0170 the core clock for this target is 72MHz, compared to 84MHz for the F4. I use the standard release profile (--profile release). The timing requirements are those of the OneWire protocol, so typically for the DS18B20, I should be able to wait 15µs reliably (or else I'll only read 1s)
screenshot from 2018-06-29 16-26-09

If you check the results from @bcostm , you see that waiting 10µs already yields to an effective 24µs wait. So if I try to use wait for this protocol, I can't just wait for the times specified in the protocol, making it quite hard to maintain (and to port from one platform to the other).
[Mirrored to Jira]

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 17, 2018

Below 100us delay, the wait_us() is not precise. I don't know what precision we can expect with such low delays ?

@ARMmbed/mbed-os-hal Can you review these requirements? I do not see documented what shall we expect from wait API, only resolution.

@LaurentLouf From the requirements your app has, I would use ticker API. In this case, us ticker should be used
[Mirrored to Jira]

@jeromecoutant
Copy link
Collaborator

jeromecoutant commented Aug 14, 2018

@ARMmbed/mbed-os-hal Can you review these requirements? I do not see documented what shall we expect from wait API, only resolution.

bump :-)

Thx

[Mirrored to Jira]

@ARMmbed ARMmbed deleted a comment from ciarmcom Oct 2, 2018
@jeromecoutant
Copy link
Collaborator

jeromecoutant commented Oct 2, 2018

@ARMmbed/mbed-os-hal Can you review these requirements?
[Mirrored to Jira]

@kjbracey
Copy link
Contributor

kjbracey commented Oct 3, 2018

Software takes time to run, and that includes reading the time. At that clock rate, you're getting <=72 instructions in 1us. Almost any moderate piece of code will take a couple of microseconds.

Maybe this platform timer read is a tad slow, but I don't think it's being unreasonable.

Code that uses "wait_us(15)" to space things will always wait longer than 15us. The delay is a minimum, and it should normally be absolutely legal to run slow for any protocol that can be done using software-wiggling like this.

In this case it seems you do need a maximum pulse. That is an unusual requirement for something using manual pin-wiggling.

To achieve that I would suggest a software CPU loop, not a timer loop. I don't think the us_timer can be realistically be guaranteed that sort of precision. And you must be doing it with interrupts disabled. Unfortunately we don't have a wait_ns call to do this in the OS - maybe we should add it.

There is a local delay_ns implementation in https://github.com/ARMmbed/mbed-os/blob/master/components/802.15.4_RF/atmel-rf-driver/source/NanostackRfPhyAtmel.cpp that works for M0-M7. I suggest using that as delay_ns(1000), and remembering to disable interrupts around it.

@kjbracey
Copy link
Contributor

kjbracey commented Jul 1, 2019

I believe this is addressed by #10609 - wait_us has a fast path, and it's possible for other users to use that fast path. There is not (yet) any fast path for Timer::read_us itself though.

@ciarmcom
Copy link
Member

ciarmcom commented Oct 2, 2020

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/IOTOSM-2242

@kjbracey
Copy link
Contributor

Just realised this is still open. I'd say it can be closed - #10609 and #12897 have squished out all the inefficiencies in the usticker paths, so I think it's as fast as it's going to get, and #9812 brought the wait_ns alternative mentioned above into the main tree.

@kjbracey
Copy link
Contributor

Ping @0xc0170

@adbridge
Copy link
Contributor

adbridge commented Jun 7, 2021

Just realised this is still open. I'd say it can be closed - #10609 and #12897 have squished out all the inefficiencies in the usticker paths, so I think it's as fast as it's going to get, and #9812 brought the wait_ns alternative mentioned above into the main tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants