Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

access log: add JSON logging mode #4693

Merged

Conversation

aa-stripe
Copy link
Contributor

@aa-stripe aa-stripe commented Oct 11, 2018

These are the code changes to add a JSON access logging mode. I am working on the docs/release notes, but wanted to get started on code review!

Description:

  • Add a new config option under access_log called json_format. This is a single level dictionary that contains strings as keys, and envoy access log format specifiers (such as %PROTOCOL%) as values. The specifiers will be replaced with actual values at logging time. I call this dictionary the "format dictionary" (as opposed to "format string").
  • You can specify only one of format (format string) or json_format (format dictionary). If neither are there, we fall back to the default string format.
  • Add the correct plumbing inside the configuration parsing to handle this.
  • Add a new access log formatter class that is instantiated with the format dictionary. It maintains the mapping of dictionary keys to loggers
  • Create a new class called FormatterProvider, to distinguish things that actually extract the information from a request. The things that combine together a bunch of FormatterProviders are still called Formatters. This is primarily a semantic/naming difference, but imo these are two conceptually separate things. There is, however no API difference, and if people are truly opposed to this, I could just merge them back into one Formatter class. This also provides a better foundation for adding more log formats in the future.
  • At present, only one specifier per key in the format dictionary is allowed. This is because the whole point of JSON logging is to make logs easily machine-parseable. If you can include multiple formats in the same field, then you'll be right back to parsing those manually
  • At present, only top-level keys are allowed in the format dictionary. This is validated at config load time. In the future, we can expand this to have nested dictionaries.

Risk Level:
Low. It's an optional feature that has to be explicitly enabled.

Testing:
Unit testing for the actual formatter, and config load. Also manually tested using an example config file.

Docs:

  • Amended Access Log docs to create a notion of "Format Strings" and "Format Dictionaries".
    • Put things that are common to access logs in general under "Format Rules", and then distinguished how strings and dictionaries are different.
    • Called out restrictions on format dictionaries
  • Added protobuf comments for format and json_format

Release Notes:
Will have to add Release notes

Fixes #2692

Signed-off-by: Aaltan Ahmad <[email protected]>
…om google::protobuf to Protobuf

Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
@aa-stripe aa-stripe force-pushed the aa-stripe/2692-json-logging-second-try branch from 864f205 to 9064213 Compare October 11, 2018 20:36
@aa-stripe aa-stripe changed the title Aa stripe/2692 json logging second try access log: add JSON logging mode Oct 11, 2018
Signed-off-by: Aaltan Ahmad <[email protected]>
@aa-stripe
Copy link
Contributor Author

aa-stripe commented Oct 11, 2018

@alyssawilk This should work now. I think doing the rebase/signoff thing after rebasing on master confused things a bit.

cc'ing @mattklein123 @htuch @ggreenway

@aa-stripe
Copy link
Contributor Author

I also noticed the Once you submit a PR, please do not rebase it guidance

@htuch htuch self-assigned this Oct 11, 2018
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Some comments to get started.

string format = 2;
oneof access_log_format {
string format = 2;
google.protobuf.Struct json_format = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment describing how this is interpreted? E.g. can we use the same custom access log formats in leaf strings, limitations on depth of nesting, etc.

Copy link
Contributor Author

@aa-stripe aa-stripe Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this will be coming soon with the docs updates.

  • protobuf comment

public:
virtual ~FormatterProvider() {}

virtual std::string format(const Http::HeaderMap& request_headers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a Doxygen comment here with args/return explanation?

Copy link
Contributor Author

@aa-stripe aa-stripe Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

  • doxygen comment on FormatterProvider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt like it was redundant to call out the args, so just added an explanation and then an @return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envoy include/envoy APIs do have a verbose/redundant format, the best way to think of them is as official documentation that could ship without even the code (even though nobody uses them that way today, this is the style..).

@@ -84,5 +84,20 @@ class Formatter {

typedef std::unique_ptr<Formatter> FormatterPtr;

/**
* Interface for access log provider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this explanation that you have of Formatter vs FormatterProvider in the PR commit message here as well?

Copy link
Contributor Author

@aa-stripe aa-stripe Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

  • Add FormatterProvider explanation

for (const auto& pair : format_mapping) {
auto providers = AccessLogFormatParser::parse(pair.second);

// Enforce that each key only has one format specifier in it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this restriction? Why can't leaves have arbitrary format strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good question. I mentioned it in the PR summary too:

At present, only one specifier per key in the format dictionary is allowed. This is because the whole point of JSON logging is to make logs easily machine-parseable. If you can include multiple formats in the same field, then you'll be right back to parsing those manually

The reason I chose to do this is to make the values sane. I didn't really see the value in doing something like:
{"protocol_and_duration": "%PROTOCOL% %DURATION%"}
because during log parsing you'd just have to manually parse that (with a regex or whatever) again.

The other use case would be to put plaintext in there, like:
{"protocol": "my protocol was %PROTOCOL%"}, which also doesn't seem particularly useful.

To me, it seemed saner to keep things simple. However this is an arbitrary restriction, and I'm not married to it. If you feel strongly that there should be multiple specifiers per key, let me know and I'll go ahead and lift that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my thinking was that the implementation of general purpose "%FOO% is %BAR% is already essentially free, you just reuse the existing access log formatter for leaf entries.

I can see legitimate uses here, e.g. where you are building semi-structured JSON for consumption by tools that don't care about the innermost structure.

It seems easier to just do this, as you have less validation to implement :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see that argument, but I feel that having a forcing function to keep logs structured is net win. Additionally, it's easy to add multi-token support in the future, but hard to take it back.

Similar to disallowing nested dictionaries I propose we stick to single-token leaves for the time being, and then expand that if we see demand for it. As you mentioned it'll be a really simple change if we decide to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, with the Envoy APIs we like to provide users flexibility and not to be too opinionated on their use. I generally would prefer if we relaxed the requirement, since it's not very hard to implement - we already have the formatter that should be able to do this in a single line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great, will tackle this

  • relax one formatter requiredment

writer.String(pair.second.c_str());
}
writer.EndObject();
return fmt::format("{}\n", strbuf.GetString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative to using RapidJSON here is to build a google.protobuf.Struct and then do a Proto3 --> JSON conversion. I kind of prefer this, unless you want fine grained control over the output format.

Copy link
Contributor Author

@aa-stripe aa-stripe Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into this. Shouldn't be too complex, although there is a fair bit of boilerplate with constructing Proto3 objects (see the unit tests I wrote in config.cc). But I'll move forward with this anyway.

  • use protobuf to serialize instead of JSON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned out to be pretty straightforward

std::map<std::string, std::string> JsonFormatterImpl::to_map(
const Http::HeaderMap& request_headers, const Http::HeaderMap& response_headers,
const Http::HeaderMap& response_trailers, const StreamInfo::StreamInfo& stream_info) const {
std::map<std::string, std::string> output;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this to be ordered? Otherwise use unordered_map

Copy link
Contributor Author

@aa-stripe aa-stripe Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. Unordered should be fine

  • switch to unordered_map

const Http::HeaderMap& response_trailers,
const StreamInfo::StreamInfo& stream_info) const override;

std::map<std::string, std::string> to_map(const Http::HeaderMap& request_headers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public? Should it just be a helper function in the .cc anonymous namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not too happy about this either, the reasons were:

  1. it's kinda hard to test this otherwise. exposing the map function makes it easy to just compare dicts instead of going through a parse/unparse dance. this isn't a super strong reason though
  2. If we ever support other structured log formats (like proto), then the idea is that we would just rely on this function to create a c++ map, and then each new format would be a simple "serializer" that writes that out (even the current json stuff can be thought of like that). In that case, testing the map function provides more peace of mind

Copy link
Contributor Author

@aa-stripe aa-stripe Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel strongly though, lmk and I'm happy to make it anonymous

if (fal_config.access_log_format_case() == envoy::config::accesslog::v2::FileAccessLog::kFormat ||
fal_config.access_log_format_case() ==
envoy::config::accesslog::v2::FileAccessLog::ACCESS_LOG_FORMAT_NOT_SET) {
if (fal_config.format().empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the logic for non-JSON different to before here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not strictly different, it has just been moved inside the check for
if (fal_config.access_log_format_case() == envoy::config::accesslog::v2::FileAccessLog::kFormat || fal_config.access_log_format_case() == envoy::config::accesslog::v2::FileAccessLog::ACCESS_LOG_FORMAT_NOT_SET)

i.e:

If the format key was provided, or neither key was provided, fall back to the old logic
else if the json_format key was provided, use the json logic
else something bad happened, bail

@@ -37,6 +51,18 @@ ProtobufTypes::MessagePtr FileAccessLogFactory::createEmptyConfigProto() {

std::string FileAccessLogFactory::name() const { return AccessLogNames::get().File; }

std::map<const std::string, const std::string>
FileAccessLogFactory::convert_json_format_to_map(ProtobufWkt::Struct json_format) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convertJsonFormatToMap, same other comemnts as above.

Copy link
Contributor Author

@aa-stripe aa-stripe Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • case change convertJsonFormatToMap

envoy::config::accesslog::v2::FileAccessLog fal_config;
fal_config.set_path("/dev/null");

auto json_format = new ProtobufWkt::Struct;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't want to be doing any raw new here, I think either make it stack allocated or a make_unique.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having some trouble with that earlier, will give it another try

Copy link
Contributor Author

@aa-stripe aa-stripe Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as far as I can tell, calling set_allocated_* with a non-null pointer causes protobuf to take ownership of that object, including releasing the object later (https://developers.google.com/protocol-buffers/docs/reference/cpp-generated#embeddedmessage). So I think using new is actually ok here.

Indeed, if I switch to using a unique_ptr and pass it in with .get(), I get a crash, presumably because the object gets freed twice, once by protobuf and the other by c++. It does work fine if you use make_unique and then .release() the pointer when passing it to set_allocated. I'm happy to do that if you want, although functionally, the effect is the same. Other option is to leave a comment on the call to new explaining why this is ok here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual pattern is to have protobuf do this for you by reordering and doing a fal_config.mutable_allocated_json_format() first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured this out. Thanks for the tip, that was the missing piece

Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
@aa-stripe
Copy link
Contributor Author

Thanks for the feedback @htuch! I agree with most of it, and I'll be updating the code shortly. Left a couple of explanations as well. In the meantime I've just some docs changes.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, taken another pass at it. Looks like it's getting into shape.

* Each value in the dictionary can only contain one command operator.


Command Operators
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think "command operator" is not the right terminology, but I see it was existing before this PR. I probably would have used "format expressions" or the like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if it's pretty straightforward to change this. I personally prefer "format specifiers" since it lines up with the C terminology

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this as is for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this was more just a non-actionable observation.

*/
class Formatter {
public:
virtual ~Formatter() {}

/**
* Return a formatted access log line.
* @param request_headers supplies thte request headers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: thte typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/**
* Extract a value from the provided headers/trailers/stream.
* @param request_headers supplies thte request headers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got copy/pasted :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh! Fixed

* @param response_headers supplies the response headers.
* @param response_trailers supplies the response trailers.
* @param stream_info supplies the stream info.
* @return a string containing the complete formatted access log line.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: FWIW @return std::string containing.. is better Envoy Doxygen style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -71,11 +71,20 @@ typedef std::shared_ptr<Instance> InstanceSharedPtr;

/**
* Interface for access log formatter.
* Formatters combine the output of FormatterProviders into a log output line.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this interface, it's a bit unclear how they do this combining; they don't take FormatterProviders as arguments to format..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The combining takes place in the subclass which isn't entirely clear at this level. I changed to Formatters provide a complete access log output line for the given headers/trailers/stream.

const Http::HeaderMap& response_headers,
const Http::HeaderMap& response_trailers,
const StreamInfo::StreamInfo& stream_info) const {
auto output_map = this->toMap(request_headers, response_headers, response_trailers, stream_info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const auto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::string log_line;
auto conversion_status = ProtobufUtil::MessageToJsonString(output_struct, &log_line);
if (!conversion_status.ok()) {
throw EnvoyException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't safely throw exceptions at runtime on the data path in Envoy, only during construction (config ingestion). Otherwise, Envoy will behave badly. We should have a more graceful failure in this scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, which do you prefer:

  • Return an empty string
  • Return a string containing something to the effect of Error converting access log output to JSON
  • pointer shenanigans to return an error code (I'm against this personally)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably JSON conversion error, it will be easier to understand WTF happened when this fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

const StreamInfo::StreamInfo& stream_info) const override;

std::unordered_map<std::string, std::string>
toMap(const Http::HeaderMap& request_headers, const Http::HeaderMap& response_headers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want this accessible in tests, I suggest creating a test peer. I'm not sure why we need to test this internal behavior though, I'd skip it; it's pretty trivial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look at this. If we stop using this function to test and make it private, then we'll be relying on either

  1. parsing the JSON string in order to do our asserts or
  2. hardcoding an expected JSON string and comparing the raw string

I'll attempt 1, and see how it goes. I suppose it's a more accurate test anyway. I was worried 2 might be susceptible to the order of keys changing and therefore randomly failing tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, 1 sounds good!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


private:
std::vector<FormatterProviderPtr> providers_;
std::map<const std::string, Envoy::AccessLog::FormatterProviderPtr> json_output_format_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment everywhere there is std::map indicating why order matters (or switch to unordered_map)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order didn't matter here either, changed to unordered

@aa-stripe
Copy link
Contributor Author

Thanks for the valuable comments. I will upload changes soon

@aa-stripe
Copy link
Contributor Author

I do have one question for you here: #4693 (comment)

Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
Signed-off-by: Aaltan Ahmad <[email protected]>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nits and test structure suggestions.

* Each value in the dictionary can only contain one command operator.


Command Operators
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this was more just a non-actionable observation.

const Http::HeaderMap& response_trailers,
const StreamInfo::StreamInfo& stream_info) const {
const auto output_map =
this->toMap(request_headers, response_headers, response_trailers, stream_info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you should be able to just write toMap rather than this->toMap. Personally I would inline toMap since it is now private, but up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the this->

@@ -382,6 +382,210 @@ TEST(AccessLogFormatterTest, startTimeFormatter) {
}
}

TEST(AccessLogFormatterTest, JsonFormatterTest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you break these into seperate TESTs. It's best to have distinct tests to control reruns and have finer grained visibility into specific failures. Also, can you add a one line // description above each?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Was following the style of the file, will break them up now

StreamInfo::MockStreamInfo stream_info;
Http::TestHeaderMapImpl request_header{};
Http::TestHeaderMapImpl response_header{};
Http::TestHeaderMapImpl response_trailer{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: {} not needed for these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

"Didn't find a registered implementation for name: 'INVALID'");
}

TEST(FileAccessLogConfigTest, FileAccessLogJsonConversionTest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

string_value.set_string_value(std::string{"some_nested_value"});

ProtobufWkt::Value struct_value;
(*struct_value.mutable_struct_value()->mutable_fields())[std::string{"some_nested_key"}] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do you need these std::string conversions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I must have been doing something else wrong earlier when I put them in. Removed

JsonFormatterImpl formatter(key_mapping);

const auto parsed = Json::Factory::loadFromString(
formatter.format(request_header, response_header, response_trailer, stream_info));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you factor this repeated match pattern into a helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aa-stripe!

@htuch htuch merged commit de03926 into envoyproxy:master Oct 18, 2018
@aa-stripe
Copy link
Contributor Author

Thank you!

@mattklein123
Copy link
Member

@aa-stripe if possible can you do a follow up PR to add a release note? Thank you!

soya3129 pushed a commit to soya3129/envoy that referenced this pull request Oct 19, 2018
Add a new config option under access_log called json_format. This is a single level dictionary that contains strings as keys, and envoy access log format specifiers (such as %PROTOCOL%) as values. 

The specifiers will be replaced with actual values at logging time. I call this dictionary the "format dictionary" (as opposed to "format string").

You can specify only one of format (format string) or json_format (format dictionary). If neither are there, we fall back to the default string format.

Add the correct plumbing inside the configuration parsing to handle this.

Add a new access log formatter class that is instantiated with the format dictionary. It maintains the mapping of dictionary keys to loggers

Create a new class called FormatterProvider, to distinguish things that actually extract the information from a request. The things that combine together a bunch of FormatterProviders are still called Formatters. This is primarily a semantic/naming difference, but imo these are two conceptually separate things. There is, however no API difference, and if people are truly opposed to this, I could just merge them back into one Formatter class. This also provides a better foundation for adding more log formats in the future.

At present, only one specifier per key in the format dictionary is allowed. This is because the whole point of JSON logging is to make logs easily machine-parseable. If you can include multiple formats in the same field, then you'll be right back to parsing those manually

At present, only top-level keys are allowed in the format dictionary. This is validated at config load time. In the future, we can expand this to have nested dictionaries.

Risk Level: Low. It's an optional feature that has to be explicitly enabled.

Testing: Unit testing for the actual formatter, and config load. Also manually tested using an example config file.

Docs:

Amended Access Log docs to create a notion of "Format Strings" and "Format Dictionaries".
Put things that are common to access logs in general under "Format Rules", and then distinguished how strings and dictionaries are different.
Called out restrictions on format dictionaries
Added protobuf comments for format and json_format

Signed-off-by: Aaltan Ahmad <[email protected]>

Signed-off-by: Yang Song <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants