-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: cbprintf: add support for deferred formatting #30675
Conversation
8f1d04b
to
13cb193
Compare
e530096
to
d7d9b71
Compare
@pabigot thanks for posting this, looks very good. I think that it could be extended with 2 macros:
I can look into static packing as i think that it will give significant performance boost. Currently it takes 22us to pack |
I don't understand how static package generation could work since you have to know what the formatting is going to do to determine what value gets packed. Also the But go ahead and put together what you have in mind so it's more clear. |
@pabigot please take a look at https://github.com/nordic-krch/zephyr/commits/nordic/20201211b |
Thanks. I've fixed the swapped ldbl/dbl packing in this PR. I've also adopted your fix for Two requests for when you move to a PR for the log infrastructure:
|
7948d1f
to
f27c585
Compare
f27c585
to
2542d7a
Compare
Rebased on #31133; handling |
The extraction of values from varargs was inlined in cbvprintf() because va_list arguments cannot be passed into subroutines by reference, and there was only one place that needed this functionality. An upcoming enhancement requires extracting the arguments but processing them later, so outline the extraction code. Signed-off-by: Peter Bigot <[email protected]>
Format width and precision can be extracted from arguments. A negative value is interpreted by changing flag state and using the non-negative value. Refactor pull_va_args() so the raw value from the stack is preserved in the state so it can be packed, deferring the interpretation to point-of-use. Signed-off-by: Peter Bigot <[email protected]>
An upcoming enhancement supports conversion with information provided from sources other than a va_list. Add a level of indirection so the core conversion operation isn't tied to the stdarg API. Signed-off-by: Peter Bigot <[email protected]>
Several tests require additional stack space to accommodate a deeper call stack due to inability to inline functions in an upcoming enhancement. Signed-off-by: Peter Bigot <[email protected]>
In applications like logging the call site where arguments to formatting are available may not be suitable for performing the formatting, e.g. when the output operation can sleep. Add API that supports capturing data that may be transient into a buffer that can be saved, and API that then produces the output later using the packaged arguments. Signed-off-by: Peter Bigot <[email protected]>
2542d7a
to
7534d1f
Compare
I'm looking at this PR and something makes me feel uneasy. Let me explain. I don't dispute the need for deferred printouts, either for log gathering or Firstly is code complexity. This is using non negligible code footprint and Secondly I think the Lastly this makes coupling between the log facility and the print facility Now here's what I think could be done. I might be completely in the weeds Each pieces should be optimized to do only one thing and do it well. The logging facility should do just that: gathering data, store it Coupling between those two should be as small as possible so any of those parts So what the logging facility should gather is actually the format string On the log consumption side then we only need to reconstruct the va_list Am I missing something obvious? |
The whole thing makes me uneasy.
I'm not sure that's true, but I'm happy to withdraw this and have somebody else come up with an alternative. |
On Wed, 23 Feb 2021, Peter A. Bigot wrote:
I'm not sure that's true, but I'm happy to withdraw this and have somebody else come up with an alternative.
I'll try to prototype something to validate the idea.
|
@npitre please not that this PR is followed by #31102 which introduces "static" packaging (with the one here would be called "runtime"). Static packaging is attempting to form a package at compile time as simple assignments to an array. It is utilizing Before @pabigot proposed that i was experimenting with just dumping va_list since it is on the stack but there are two main issues there:
|
Also when pursuing alternative solutions keep in mind #32373 which identifies that |
On Wed, 24 Feb 2021, Krzysztof Chruscinski wrote:
Before @pabigot proposed that i was experimenting with just dumping va_list since it is on the stack but there are two main issues there:
- platform dependency, in case of 86_64 it is not even continues array on the stack
Yeah... same thing on Aarch64. That's a real abomination!
However there is a way around it. You could do:
```
aarch64_va_list_packer(int a, int b, int c, int d,
int e, int f, int g, char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
void content = ap.__stack;
...
}
#define va_list_packer(fmt, ...) \
aarch64_va_list_packer(0, 0, 0, 0, 0, 0, 0, fmt, __VA_ARGS__)
}
```
This way all arguments are overflowed to the stack. I'm sure a similar
trick can be used on x86-64.
- storing string arguments by value cannot be done. It is very important from deferred logging perspective to have string arguments in self-contained package
The trick of only storing pointers to strings when they live in
read-only memory is a nice one. Otherwise you may still append those
strings to the va_list blob, and prefix them with an offset value
indicating where the pointer to those string is located in the va_list
blob. This way, when unpacking you just have to stuff the new string
location at the indicated va_list position and the va_list consumer
won't care.
And if you want to compact things further, you could even remove string
pointers from the va_list buffer since you would already have the
information about where to re-insert them (although this would need a
some memmove() cycles).
Is this worth it? The code for this would be rather simple. Something
along those lines:
```
int length = 0;
signed char mode = -1;
while (*++fmt) {
if (mode < 0) {
if (*fmt == '%') {
mode = 0;
}
continue;
} else {
switch (*fmt) {
case '%':
mode = -1;
continue;
case 'd':
case 'u':
case 'x':
case 'c':
mode = -1;
case '*':
length += sizeof(int);
continue;
case 'f':
case 'g':
length = ROUND_UP(__alignof(double));
length += sizeof(double);
break;
case 'l':
if (fmt[1] == 'l') {
ALIGN_SIZE(length, long long);
fmt++;
} else {
ALIGN_SIZE(length, long);
}
break;
case 's':
char *s = va_list.__stack + length;
queue_string(s);
...
break;
}
mode = -1;
}
}
memcpy(buffer, va_list.__stack, length);
length += append_queued_strings(buffer + length);
return length;
```
What do you think?
|
OK! As always this wasn't as simple as anticipated, but I think I
managed something that isn't too bad.
Please see PR #32734 for an alternative to this one.
|
Superseded by #32734 |
In applications like logging the call site where arguments to formatting are available may not be suitable for performing the formatting, e.g. when the output operation can sleep. Add API that supports capturing transient formatting data into a buffer.
Also API that produces the output later using the packaged data.
These changes increase stack requirements on the vprintk path from 152 bytes to 208 bytes on ARM, in part because code that could formerly be inlined must now be a separate function as it can be invoked from multiple locations. Other architectures have different magnitude changes, particularly SPARC which incurs a 96 byte cost for every nested call.
Storing the data could be changed fairly easily to use a ring buffer if that's preferred.
Relates to #30353 (comment).