Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Logging subsystem overhaul #30353

Closed
nordic-krch opened this issue Dec 1, 2020 · 16 comments · Fixed by #31535
Closed

RFC: Logging subsystem overhaul #30353

nordic-krch opened this issue Dec 1, 2020 · 16 comments · Fixed by #31535
Assignees
Labels
area: Logging LTS Long term release branch related RFC Request For Comments: want input from the community

Comments

@nordic-krch
Copy link
Contributor

nordic-krch commented Dec 1, 2020

Introduction

Logging subsystem has been in zephyr for more than 2 years. During that time there were feature requests which could not be fulfilled with current design. As number of those requests has grown I've started to look into overhaul of logger internals. Overhaul does not plan to change logger application API but it will change backend API. Currently logger has following limitations:

Further weaknesses:

  • processing log message takes a lot of stack and flash as log message arguments need to be resolved to vla function:
    static void std_print(struct log_msg *msg,
  • internal handling of log messages is troublesome because of fragmented nature.

There were suggestions (@andyross) to switch to use ring buffer instead of current fragmented messages.

Proposed change

Plan is to rework internals of the logger to store log message in a ring buffer. This will require that backends handle message synchronously (make a copy if needed).

There will be only one type of a log message (instead of current distinction between standard and hexdump). Message will consist of header, arguments, data (optional), transient strings (optional). Logger will attempt to store arguments in a way that it can be directly casted to va_list (validated on majority of platforms). String arguments will point to space within the message.

_Generic keyword (https://en.cppreference.com/w/c/language/generic) will be used to detect types of logger message arguments at compile time. Any float arguments will be promoted to double, any char * arguments will trigger further processing which will involve detecting %s arguments, calculating length needed to store strings, copying strings to the message body. log_strdup will be deprecated.

This approach shall solve all weaknesses and limitations mentioned above.

As already mentioned, internal architecture change will impact backend interface. Currently, log_message handling is asynchronous, message is alive until last backend indicates that message is not used. This approach may give better memory utilization in certain scenarios because backend does not need to copy the message but, in general, new approach will more memory efficient due to ring buffer. Current approach comes from nrf5 sdk where it was used in a bare metal applications where asynchronous handling is more welcomed. In RTOS environment value of this approach diminishes.

Ring buffer needed for the logger may be generic since it is a variable length messages buffer with following arguments:

  • multiple producers
  • 3 modes: saturate, overwrite the oldest, block thread context until available
  • two consumers: standard and panic which may interrupt standard

There are further requests which does not imply architecture change and they will be gradually fixed:

@andyross please let me know if you have any comments.

Dependencies

Logger backends interface will change. It will be simplified as there will be only single message type, message will not be fragmented and arguments will form va_list which can be directly passed to the formatter.

Concerns and Unresolved Questions

_Generic keyword is C11 (#30105) and it is unclear which C standard is currently used by Zephyr. However, it currently compiles without enforcing C11. I didn't who enables C11.

@nordic-krch nordic-krch added area: Logging RFC Request For Comments: want input from the community labels Dec 1, 2020
@nordic-krch nordic-krch self-assigned this Dec 1, 2020
@koffes
Copy link
Collaborator

koffes commented Dec 3, 2020

Great! In relation to this work, I would like to highlight #23021 as well

@carlescufi carlescufi changed the title RFC: Logger subsystem overhaul RFC: Logging subsystem overhaul Dec 3, 2020
@pabigot
Copy link
Collaborator

pabigot commented Dec 7, 2020

I want to point out again that float arguments are already promoted to double when passed to a function that lacks a prototype (or in a varargs position), so there is nothing to do for "Any float arguments will be promoted to double". What has to be done is detecting that the value to be formatted takes more than one word on the stack.

I added cbprintf_arglen() to help with that issue, but that doesn't help with caching transient strings. If the refactored log system still needs to copy out state for later processing rather than perform it immediately then cbprintf could also be enhanced with a function that walks the format string extracting the specifications and provides them to a callback function, which would allow detecting string parameters precisely.

That wouldn't work with CBPRINTF_NANO, though, which lacks the format specification processing infrastructure. So if it can be done with _Generic that would be more flexible. I'm just not clear on how generic varargs functions would work.

@nordic-krch
Copy link
Contributor Author

@pabigot
So with _generic i am able to serialize all log arguments into an array, detect strings and make the copy into the message. What i missing now is on backend side when string needs to be formatted to be able to get args from serialized array instead va_list.

I was thinking about extending cbprint api with something like:

union cbprint_args {
  va_list ap;
  void *stream;
};

int z_cbvprintf(char_out, ctx, fmt, union cbprint_args args, bool is_va_list);

static inline cbvprintf(char_out, ctx, fmt, va_list ap)
{
  union cbprint_args args = {.ap = ap};
  z_cbvprintf(char_out, ctx, fmt, args, true);
}

static inline cbsprint(char_out, ctx, fmt, void *stream)
{
  union cbprint_args args = {.stream = stream};
 z_cbvprintf(char_out, ctx, fmt, args, false);
}

then change current cbvprint body and replace va_arg with something like:

#define arg_get(is_va_list, args, type) is_va_list ? va_arg(args.va_list, type) : get_from_stream(args.stream, type)

didn't yet try it but it seems to be doable.
What do you think?

@pabigot
Copy link
Collaborator

pabigot commented Dec 8, 2020

Can you point me to the proposed solution using _Generic? And how you determine the amount of space required for the stream with that interface?

I think the general idea of what you propose has potential, but I'm still concerned about ensuring argument processing is consistent throughout. Unless you're matching arguments against format specifications, which would require the callback API I proposed, I don't believe you can correctly detect strings. E.g. how do you handle a char* argument in a position that corresponds to %p? Making a copy of the string pointed to by the argument would be wrong.

I have some ideas of how to generalize the cbprintf API but want to understand the rest of your solution architecture before determining the most flexible enhancement.

@nordic-krch
Copy link
Contributor Author

All macro wizardry starts here: https://github.com/nordic-krch/zephyr/blob/7960fd342b65f1779e0c47c1dfc360237d61bf4d/include/logging/log_msg2.h#L399

First, _Generic is used to detect if there are any char * arguments by summing using FOR_EACH on arguments. If there are no char * then case is simple, there are no strings to copy. If there are any char * then complex path is taken (Z_LOG_MSG_EXT). In that path, next step is to build an array which on nth index contains pointer to the char * pointer in the serialized buffer or null if corresponding argument is other than char *. Next z_log_get_s_mask is used to get bit mask of %s in the string. Next, having those 3 things number of bytes needed for those strings are determined. Space is allocated. Then message is filled, strings are copied into message and array with pointers are used to update char * pointers in the serialized arguments.

The most important thing is that all this complex operations not compiled if log message does not contain char *.

There are two versions of macros: one that writes directly to the space allocated for message and second version which creates array with serialized arguments on the stack and pass it to the function. This is a bit slower but may take less rom and is suitable for user space. I will experiment to determine if it makes sense to keep both versions.

I will experiment tomorrow with cbprint extension.

@pabigot
Copy link
Collaborator

pabigot commented Dec 8, 2020

First, _Generic is used to detect if there are any char * arguments by summing using FOR_EACH on arguments.

But that's wrong. If I write:

LOG_DBG("label %s at %p", cfg->label, cfg->label);

I expect to see the string value of cfg->label and the pointer value of cfg->label. Displaying the address of some copy of the string made by the logging infrastructure is not acceptable.

The only way I can see to get correct output is to walk the format string, decode the specifiers, and use the specifier character to determine that a copy of a string needs to be stored somewhere. No macro wizardry would be involved.

I have API in mind that would do this and be flexible enough to handle a few other use cases I can imagine. Let me prototype that API and see whether it'll do what you need.

@nordic-krch
Copy link
Contributor Author

The only way I can see to get correct output is to walk the format string, decode the specifiers, and use the specifier character to determine that a copy of a string needs to be stored somewhere. No macro wizardry would be involved.

yes, that is exactly as you wrote. _Gereric is used to determine at compile time if extended version of macro is used or the simple one. Extended involves call to z_log_get_s_mask which decode specifiers and returns mask of %s types. So if if have LOG_DBG("label addr: %p", (char *)label) it will compile extended version but z_log_get_s_mask will return 0 and no strings will be copied.

In case of your example, logger will present exactly what you want, %p will point to original label address but label string will be copied into message payload and formatted from there.

@nordic-krch
Copy link
Contributor Author

Also it is possible to apply extension only to char * and not to const char * to save space but it may not always be correct (eg. if transient string comes as const char * function parameter).

@pabigot
Copy link
Collaborator

pabigot commented Dec 8, 2020

Ew. OK; where's z_log_get_s_mask()? Can you point me to the branch?

@nordic-krch
Copy link
Contributor Author

this function is already used in master:

uint32_t z_log_get_s_mask(const char *str, uint32_t nargs)

The branch i'm working on: https://github.com/nordic-krch/zephyr/tree/log_64bits

@pabigot
Copy link
Collaborator

pabigot commented Dec 8, 2020

Thanks; I see that now. It is another decoding of the format specifier, which I'd hoped to avoid because parsing the format sequence to extract conversions correctly is not trivial. (For example z_log_get_s_mask() will do the wrong thing if %n or the * width or precision flags are present.)

But my first attempt at sharing the conversion extraction fails, because of the way va_list objects need to be processed. It could be made to work by turning a fairly long sequence of code into a macro, and using that macro in multiple functions. Or possibly by completely inverting the cbprintf_complete implementation. That has some potential but I'm not going to dive into it right now.

My attempt is at: https://github.com/pabigot/zephyr/commits/nordic/20201208b

Please read the message of the top commit, which identifies the core problem. If, in principle, you think something like what's prototyped in the API could work, and agree that we should try to re-use the existing conversion extraction code, I'll revisit this.

@nordic-krch
Copy link
Contributor Author

That is more or less what I have in mind regarding struct cbprintf_packed_args but I assume that it is possible to build struct cbprintf_packed_args only based on types of arguments provided to the string using _Generic.

Walk might be second option when __VA_ARGS__ is not available and va_list is used (e.g. hooking external logger functions).

@nordic-krch
Copy link
Contributor Author

And definitely z_log_get_s_mask() belongs in cbprintf.

@pabigot
Copy link
Collaborator

pabigot commented Dec 9, 2020

Can we make logging select CBPRINTF_COMPLETE? Because if it has to work with nano the cbprintf changes will be a problem.

IMO it's reasonable to require complete support when using logging. @carlescufi you particularly wanted nano as an option, what do you think?

@carlescufi
Copy link
Member

carlescufi commented Dec 11, 2020

Can we make logging select CBPRINTF_COMPLETE? Because if it has to work with nano the cbprintf changes will be a problem.

IMO it's reasonable to require complete support when using logging. @carlescufi you particularly wanted nano as an option, what do you think?

Logging is big enough as a subsystem that selecting CBPRINTF_COMPLETE when enabling it is reasonable to me. The nano version of cbprintf is there for tiny apps that do not use the logging subsystem, just the console/printk.

@pabigot
Copy link
Collaborator

pabigot commented Dec 14, 2020

The solution proposed here as I understand it is based on external processing of the log format string and argument va_list to determine what each parameter is, mutating some arguments (such as transient strings), and saving them all off the stack for subsequent processing.

At a high level this seems reasonable, but I believe the packaging of data that will be processed by cbprintf must be done by cbprintf, for both correctness and maintainability. An example of a case that can't easily be handled externally is a %.5s print specification, for which the argument may be char * pointing to RAM text that has no terminating NUL: it must be copied, and knowing that only the first 5 characters will ever be accessed is the only way to make a copy. A more complicated related case is %.*s where the number of characters to display is provided in a preceding argument. Handling that sort of thing requires much more than knowing which positional arguments correspond to strings as with the existing z_log_get_s_mask().

I've created #30675 which extends the cbprintf infrastructure with something that captures formatting information into a buffer, allowing it to be processed later.

I have some hopes that this would make use of _Generic and varargs macro wizardry around log calls unnecessary, though that may be optimistic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Logging LTS Long term release branch related RFC Request For Comments: want input from the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants