Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] uORB::Subscription subscribe directly to uORB device node object #11176

Closed
wants to merge 5 commits into from

Conversation

dagar
Copy link
Member

@dagar dagar commented Jan 9, 2019

TL;DR - we can save a relatively large amount of memory (maybe 20KB) and some cpu by updating uORB to bypass file descriptors.

Background

On NuttX PX4 uORB uses file descriptors to subscribe to a publication (open file), copy new data (read file), and various other operations (ioctl). This is flexible, but requires a relatively large amount of memory on a small system that doesn't scale well with additional tasks. It's also led to the invention of virtual files to extend px4 to Linux and QuRT.

On NuttX a file descriptor requires...

  • 4 bytes for the fd int
  • 16 bytes for the actual fd (stored in a fixed size array per NuttX task)
  • 8 bytes for the uORB SubscriberData (+ 8 bytes overhead for the allocation)
    = 36 bytes per orb subscribe

The larger problem is that each task has an array of file descriptors (16 bytes each) and we currently have a maximum of 54 file descriptors per task (CONFIG_NFILE_DESCRIPTORS), which still isn't enough for some logging configurations. A typical system has 20-25 tasks, which is 17-20 KB of memory.

Proposal

In this pull request I've updated the uORB Subscription wrapper class to be able to "subscribe" directly to a uORB device node. This bypasses the file descriptor issues entirely and offers a number of advantages.

Memory

Instead of a file descriptor the uORB::Subscription class now contains a pointer to the device node, required metadata to find the node before it exists (more on this later), and the equivalent of SubscriberData (13 bytes total). I also have a template variation in mind on the actual ORB_ID type that would remove for need for storing metadata in memory for a total of 8 bytes per subscription.

This PR updates the mavlink and logger modules to use uORB::Scription and also reduces the number of file descriptors from 54 -> 30. The result on a pixracer configured as a generic multicopter is an ~11 KB memory savings, and a ~4% decrease in cpu.

master

nsh> free
	     total       used       free    largest
Mem:        232928     200672      32256      30624
nsh> logger status
INFO  [logger] Running in mode: all
INFO  [logger] Full File Logging Running:
INFO  [logger] Log file: /fs/microsd/log/2019-01-09/04_26_58.ulg
INFO  [logger] Wrote 4.20 MiB (avg 26.36 KiB/s)
INFO  [logger] Since last status: dropouts: 89 (max len: 0.754 s), max used buffer: 4476 / 14336 B

PR (includes a 6 KB increase in the logger buffer, total free ~ 11 KB more than master)

nsh> free
	     total       used       free    largest
Mem:        232928     195680      37248      22160
nsh> logger status
INFO  [logger] Running in mode: all
INFO  [logger] Full File Logging Running:
INFO  [logger] Log file: /fs/microsd/log/2019-01-09/04_22_55.ulg
INFO  [logger] Wrote 2.62 MiB (avg 28.44 KiB/s)
INFO  [logger] Since last status: dropouts: 36 (max len: 0.574 s), max used buffer: 18870 / 20480 B

A secondary benefit is that subscribing via uORB::Subscription will not force the creation of a uORB::DeviceNode before publication. Device nodes created with no publishers is another few kilobytes of wasted memory in a typical system.

Performance

Additionally, bypassing file descriptors is also slightly faster. It doesn't seem like much, but these functions are called thousands of times every second. Below you can see the difference system wide after having only converted mavlink and logger. This will be helpful when the high rate modules are updated to use uORB::Subscription.

master

orb_check vehicle_local_position: 1000 events, 2527us elapsed, 2.53us avg, min 2us max 3us 0.500us rms
orb_stat vehicle_local_position: 1000 events, 2165us elapsed, 2.17us avg, min 2us max 3us 0.371us rms
orb_copy vehicle_local_position: 1000 events, 4269us elapsed, 4.27us avg, min 4us max 5us 0.444us rms

PR (same test, but using uORB::Subscription)

orb_check vehicle_local_position: 1000 events, 904us elapsed, 0.90us avg, min 1us max 1us 0.295us rms
orb_stat vehicle_local_position: 1000 events, 890us elapsed, 0.89us avg, min 1us max 1us 0.313us rms
orb_copy vehicle_local_position: 1000 events, 2691us elapsed, 2.69us avg, min 2us max 3us 0.462us rms

Overall system before

 PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE FD
   0 Idle Task                   66337 52.296   476/  748   0 (  0)  READY  3
   1 hpwork                       1853  1.678   680/ 1780 249 (249)  w:sig 16
   2 lpwork                        122  0.088   720/ 1780  50 ( 50)  w:sig  8
   3 init                        12158  0.000  1856/ 2580 100 (100)  w:sem  3
 440 mavlink_shell                9870  0.000   960/ 2020 100 (100)  w:sem  3
  11 dataman                         2  0.000   728/ 1180  90 ( 90)  w:sem  4
  40 mavlink_if0                  5562  4.946  1656/ 2540 100 (100)  READY 33
  41 mavlink_rcv_if0               418  0.441  1096/ 2836 175 (175)  w:sem 33
 461 top                           202  1.501  1304/ 1684 255 (255)  RUN    3
 436 log_writer_file               351  0.971   592/ 1148  60 ( 60)  READY 34
 165 sensors                      6583  3.180  1328/ 1964 249 (249)  w:sem 14
 167 commander                     988  0.883  1800/ 3212 140 (140)  w:sig 31
 170 commander_low_prio              2  0.000   552/ 2996  50 ( 50)  w:sem 31
 178 mavlink_if1                  9518  9.717  1720/ 2532 100 (100)  READY 36
 179 mavlink_rcv_if1               605  0.441  1504/ 2836 175 (175)  w:sem 36
 241 gps                            90  0.088  1064/ 1444 220 (220)  w:sig  5
 270 mavlink_if2                  1040  0.971  1488/ 2484 100 (100)  w:sig 24
 271 mavlink_rcv_if2               426  0.353  1120/ 2836 175 (175)  w:sem 24
 285 fmu                          2258  2.031   752/ 1324 252 (252)  w:sem  8
 380 ekf2                        10763  9.628  4536/ 6572 250 (250)  w:sem 20
 388 mc_att_control               4465  4.151   968/ 1660 251 (251)  w:sem 17
 396 mc_pos_control                933  0.883  1000/ 1860 250 (250)  w:sem 11
 405 navigator                     124  0.088   936/ 1764 105 (105)  w:sem 16
 432 logger                       2252  4.593  3032/ 3540 245 (245)  w:sem 34

Processes: 24 total, 5 running, 19 sleeping, max FDs: 54
CPU usage: 46.64% tasks, 1.06% sched, 52.30% idle
DMA Memory: 5120 total, 2048 used 2048 peak
Uptime: 115.100s total, 66.337s idle
nsh> 

Overall system after (includes 6KB increase in logger buffer)

 PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE FD
   0 Idle Task                   68793 56.183   476/  748   0 (  0)  READY  3
   1 hpwork                       1941  1.590   680/ 1780 249 (249)  w:sig 16
   2 lpwork                        122  0.176   784/ 1780  50 ( 50)  w:sig  8
   3 init                        12328  0.000  1816/ 2580 100 (100)  w:sem  3
 439 mavlink_shell                9210  0.000   960/ 2020 100 (100)  w:sem  3
  11 dataman                         1  0.000   728/ 1180  90 ( 90)  w:sem  4
  40 mavlink_if0                  5299  4.328  1608/ 2540 100 (100)  w:sig  6
  41 mavlink_rcv_if0               442  0.353  2304/ 3916 175 (175)  w:sem  6
 446 top                           668  1.413  1312/ 1684 255 (255)  RUN    3
 432 log_writer_file               429  0.441   592/ 1148  60 ( 60)  w:sem  8
 165 sensors                      6718  3.180  1288/ 1964 249 (249)  w:sem 14
 167 commander                    1020  0.795  1776/ 3212 140 (140)  w:sig 22
 170 commander_low_prio              3  0.088   552/ 2996  50 ( 50)  w:sem 22
 178 mavlink_if1                  9809  8.745  1720/ 2532 100 (100)  READY  8
 179 mavlink_rcv_if1               469  0.353  2520/ 3916 175 (175)  w:sem  8
 241 gps                            94  0.000  1064/ 1444 220 (220)  w:sig  5
 270 mavlink_if2                   945  0.795  1488/ 2484 100 (100)  w:sig  6
 271 mavlink_rcv_if2               461  0.353  2264/ 3916 175 (175)  w:sem  6
 285 fmu                          2442  2.120   848/ 1324 252 (252)  w:sem  8
 380 ekf2                        11008  9.717  4536/ 6572 250 (250)  w:sem 20
 388 mc_att_control               4428  3.886   952/ 1660 251 (251)  w:sem 17
 396 mc_pos_control               1038  0.883  1024/ 1860 250 (250)  w:sem 11
 405 navigator                     162  0.176   856/ 1764 105 (105)  w:sem 14
 422 logger                       1975  3.180  3024/ 3540 245 (245)  w:sem  8

Processes: 24 total, 3 running, 21 sleeping, max FDs: 30
CPU usage: 42.58% tasks, 1.24% sched, 56.18% idle
DMA Memory: 5120 total, 2048 used 2048 peak
Uptime: 119.744s total, 68.794s idle

Future Ideas

  • enum class generation for template types
  • bitset for orb_exists
  • some mechanism to declare topics as multi or not?
  • updateBlocking() to replace poll?

TODO

  • rename SubscriptionBase -> Subscription, Subscription -> SubscriptionData
  • look into removing the linked list
  • copy and move constructors needed (logger array push_back)
  • fmu-v2 flash
  • can we get rid of virtualization? Dropping the LinkedList helps.
  • test plan, in particular concurrency issues

@dagar
Copy link
Member Author

dagar commented Jan 9, 2019

Note - this is still early and needs quite a bit more work. I'd mainly like to start getting feedback on the overall design/direction and any other thoughts.

@dagar dagar added the proposal label Jan 9, 2019
@julianoes
Copy link
Contributor

Nice work! I think this is viable but I might not have all the context because I don't know why these file descriptors were used in the first place. I looked at the diff but it's quite big and seems to contain a lot of overall cleanup, so it's hard to see the meat of the change.

What I wonder is if this changes anything about the semantics of polling on topics. I was always assuming that was one of the reasons file descriptors were used.

@TSC21
Copy link
Member

TSC21 commented Jan 9, 2019

What's the impact of this on the FastRTPS communications?

Copy link
Member

@bkueng bkueng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your analysis is spot-on! And it's exactly where we need to reduce overhead to get a lean system.

Some initial thoughts:

  • I don't see major problems with the design. Obviously we need to be careful with Subscription, since it accesses uORB internals now (e.g. see next point, and also regarding address separation if we ever do that).
  • We'll need to check the impact on concurrency/thread-safety.
  • I'd make the default Subscription class not contain the topic data, and add a SubscriptionData (or whatever name) class that contains it, to encourage the use of subscription w/o data (for the cases where we can keep the data on the stack for example).
  • can we avoid the virtual overhead of Subscription?
  • Detail: if we could keep the add_topic in Logger generic to add all instances automatically, I'd prefer that. No big deal though.

I also have a template variation in mind on the actual ORB_ID type that would remove for need for storing metadata in memory for a total of 8 bytes per subscription.

I was also thinking of doing that but never got to it so far.

What I wonder is if this changes anything about the semantics of polling on topics.

I assume we'd still use the existing interface for that?

@@ -104,7 +166,7 @@ typedef SubscriptionBase SubscriptionTiny;
/**
* The subscription base class as a list node.
*/
class __EXPORT SubscriptionNode : public SubscriptionBase, public ListNode<SubscriptionNode *>
class SubscriptionNode : public SubscriptionInterval, public ListNode<SubscriptionNode *>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While at it, can you remove the linked list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'd like to, but need to review existing uses (mostly LPE at this point).

@dagar
Copy link
Member Author

dagar commented Jan 9, 2019

Nice work! I think this is viable but I might not have all the context because I don't know why these file descriptors were used in the first place.

I don't know the full history, but in theory it's nice if you can rely on existing system facilities for nearly everything (open/close/read/write/poll). We've gradually grown out various levels of wrapper and obfuscation though.

I looked at the diff but it's quite big and seems to contain a lot of overall cleanup, so it's hard to see the meat of the change.

The first commit is the uORB change itself, the rest are updates to use it. In particular look at uORBDeviceNode.

What I wonder is if this changes anything about the semantics of polling on topics. I was always assuming that was one of the reasons file descriptors were used.

For now we'd continue using a regular subscription and poll, but we'll be able to do it through these eventually. It could be something like an alternate update call that blocks (eg updateBlocking().

@dagar
Copy link
Member Author

dagar commented Jan 9, 2019

What's the impact of this on the FastRTPS communications?

None initially, but uORB::Subscription could grow a blocking update method that would then be good to consider within the RTPS bridge.

@dagar
Copy link
Member Author

dagar commented Jan 9, 2019

  • I'd make the default Subscription class not contain the topic data, and add a SubscriptionData (or whatever name) class that contains it, to encourage the use of subscription w/o data (for the cases where we can keep the data on the stack for example).

Good idea.

  • can we avoid the virtual overhead of Subscription?

Probably not in the first pass, but I think we'll be able to get there eventually. At the moment SubscriptionBase is fairly flexible to support the mavlink and logger usage, but a template version intended for embedding directly in modules (the vast majority of our orb usage) can be much simpler.

  • Detail: if we could keep the add_topic in Logger generic to add all instances automatically, I'd prefer that. No big deal though.

Kind of a side note, but it seems to me that topics should be explicitly multi-capable or single only. I think I'll come back to this idea later. For logger I was trying to avoid creating 4x SubscriptionBases for every topic when only a small portion of them are multi. Any suggestions?

I also have a template variation in mind on the actual ORB_ID type that would remove for need for storing metadata in memory for a total of 8 bytes per subscription.

I was also thinking of doing that but never got to it so far.

We need to update the generation to produce something like the enum class you created for params. I was also hoping to find a way to tie the topic and data structure together (not sure if this will be controversial) to make the declaration trivial. For example uORB::Subscription<px4::vehicle_status> would use vehicle_status_s references for update, and perhaps even have a getter that simply returns a vehicle_status_s by value.

What I wonder is if this changes anything about the semantics of polling on topics.

I assume we'd still use the existing interface for that?

Yes for now, but I'd like to add a blocking update function here as well (later). I think we can do slightly better than the performance of NuttX poll, while making it hard to misuse.

@dagar dagar force-pushed the pr-uorb_subscribe_direct branch from 738d2b0 to 20b2896 Compare January 11, 2019 15:25
@dagar dagar changed the title [RFC] uORB::Subscription subscribe directly to device node [WIP] uORB::Subscription subscribe directly to uORB device node object Jan 16, 2019
@dagar dagar removed the proposal label Jan 18, 2019
@dagar dagar force-pushed the pr-uorb_subscribe_direct branch from 20b2896 to 8d66a16 Compare January 18, 2019 18:01
@dagar dagar added this to the Release v1.9.0 milestone Jan 18, 2019
@dagar dagar force-pushed the pr-uorb_subscribe_direct branch 2 times, most recently from 8c90f4c to cc1575e Compare January 22, 2019 19:32
@dagar dagar force-pushed the pr-uorb_subscribe_direct branch 2 times, most recently from 32193b9 to 8d054a6 Compare January 23, 2019 23:30
@dagar dagar mentioned this pull request Jan 24, 2019
@dagar dagar force-pushed the pr-uorb_subscribe_direct branch 2 times, most recently from 7104888 to 3735558 Compare February 5, 2019 21:05
@dagar dagar modified the milestones: Release v1.9.0, Release v1.10.0 Feb 11, 2019
@dagar dagar force-pushed the pr-uorb_subscribe_direct branch from 3735558 to 13b1ce2 Compare February 11, 2019 01:01
@dagar dagar mentioned this pull request Mar 2, 2019
2 tasks
@dagar
Copy link
Member Author

dagar commented Jun 7, 2019

Implemented! #12040

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants