-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat modules: first phase of the Go implementation #3333
Conversation
7a7dbb7
to
e5ab950
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only had a first quick look and left some minor comments.
type ModuleConfig struct { | ||
Module string `config:"module" validate:"required"` | ||
Enabled *bool `config:"enabled"` | ||
Filesets map[string]*FilesetConfig `config:"filesets"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the question. Whether we need Filesets
part of the ModuleConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't at least 1 fileset required in the config? Talking about the validate
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha :). It's actually that all filesets are enabled by default and you need to explicitly disable them. So a simple valid config would be:
modules:
- name: nginx
If you want to disable the error fileset, you need to write:
modules:
- name: nginx
filesets.error.enabled: false
Note that I went for having the filesets a dictionary with all settings, unlike in Metricbeat were it's an array with the names. The reason is that I think it's going to be rare to disable filesets, and there lots of fileset specific settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Is there a reason you prefix the fileset config options with fileset
. Namespace conflicts? The convention in metricbeat is to just use the metricset name, so above it would be error.enabled
.
An other option about would be to use filesets: ["access"]
if you want only one. If filesets
is not around, all sets are loaded. I agree this makes sense in the filebeat module case. But this change would require the previous change, as otherwise filesets
would have 2 meanings.
The following configs would have the same outcome.
modules:
- name: nginx
error.enabled: false
modules:
- name: nginx
filesets: ["access"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side node: I like that the fileset can be disabled with the enabled: false
flag. Makes it command line friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for prefixing with filesets
is just to keep the implementation simple, in the sense that the ModuleConfig can be just:
type ModuleConfig struct {
Module string `config:"module" validate:"required"`
Enabled *bool `config:"enabled"`
Filesets map[string]*FilesetConfig `config:"filesets"`
}
And future module level settings can be added easily. In order to get rid of filesets
, I'd need something like:
type ModuleConfig map[string]*common.Config
and then have code to extract the "enabled" and "name". I can look into doing that, although I'm not sure the extra code complexity is justified.
Btw, I wrote an overview on the user interaction here: #3159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for the additional complexity to replace filesets from the config and be consistent with metricbeat. This also make the command line options look more natural as there it is already without filesets
. The nice part about having *common.Config that it provides already an Enabled()
function, so not even unpacking is needed.
I didn't check the code details on this yet, but one solution is to work in general with *common.Config objects which are passed around and unpacking happens locally. So also each fileset will unpack its own config options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command lines don't include filesets
, just the configuration. For the cli, the options are prefixed with module.fileset
, so for example nginx.access
. From there on, the CLI flags match the structure in the configuration file. So to set close_eof
you would type nginx.access.prospector.close_eof
. But anyway, I'll look into getting rid of filesets
, that would be cleaner.
There are more things to a fileset config than the prospector. Each fileset has configurable variables, which can be used in the prospector template and for deciding the pipeline. For example, you can set format: json
and that variable is used in all the places we need it. I'd expect the module variables to be what 90% users need, and changing something directly in the prospector is "expert" mode. The pipeline itself is also a configuration at the fileset level.
} | ||
} | ||
|
||
return cfg, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a debug log entry here outputting the final merged config would be nice to have.
|
||
// Modules related command line flags. | ||
var ( | ||
modulesFlag = flag.String("module", "", "List of enabled modules (comma separated)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be modules
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had "modules" in the first prototype, then thought that "module" would be nicer for the most common case. But you're right, I think "modules" makes more sense overall.
registry map[string]map[string]*Fileset // module -> fileset -> Fileset | ||
} | ||
|
||
// newModuleRegistry readsy and loads the configured module into the registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/readsy/reads
fields: | ||
source_type: apache2-access | ||
pipeline_id: {{beat.pipeline_id}} | ||
input_type: log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change means each fileset can max have 1 prospector (which I think is a good limitation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, multiple prospectors didn't really work out with the requirement of being able to override some of the prospector settings when needed.
e5ab950
to
bb13529
Compare
@ruflin I added another commit to move all filesets one level up in the configuration (so get rid of the Overall, I think this is a good change. |
38ce7dc
to
f674899
Compare
jenkins, test it |
@@ -30,13 +33,32 @@ func New(b *beat.Beat, rawConfig *common.Config) (beat.Beater, error) { | |||
if err := rawConfig.Unpack(&config); err != nil { | |||
return nil, fmt.Errorf("Error reading config file: %v", err) | |||
} | |||
|
|||
registry, err := fileset.NewModuleRegistry(config.Modules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we introduce more an more different registries, perhaps we should call the variable moduleRegistry here, just to be sure there is no confusion.
|
||
// applyTemplate applies a Golang text/template | ||
func applyTemplate(vars map[string]interface{}, templateString string) (string, error) { | ||
tpl, err := template.New("test").Parse(templateString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be text
or test
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that doesn't really matter, but also "test" is not really good since this is not a test :). I'll make it text, thanks.
enabled: true | ||
|
||
# Format. Options are with_plugins and no_plugins | ||
#var.pipeline: with_plugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking if we should change this from var
to option
. I somehow dislike var
as it is not very descriptive. It is a configuration option for the metricset. As these are "metricset" specific perhaps we can even remove the var
completely and have it under fileset.access.pipeline
. As most fileset have pipeline
and paths
these two could be basic options shared across all filesets like ingest_pipeline
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering inlining var
as well, like we did for fileset
. That would be the nicest CLI experience but the chances of collisions are higher in there.
The reason I called them "vars" is that they can reference each other. But I think I'm good with option
as well, have to think some more about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought was also to namespace it based on the conflicts. But I think the number of options will be quite small here and should not correlate with the existing once. As these are the variables for the template and (probably) rarely used, it's also ok if it is perhaps sometimes a little bit a longer name that prevents a conflict. It's great now to have prospector in it's own namespace :-)
This rewrites the following parts of the prototype in Go: * Reading of the filesets * Evaluation of variables, including the builtin variables * Creation of the prospectors configurations and starting them * Overriding of the variables from the config file and from the command line * The -module and -M CLI flags * The ability to override variables from multiple filesets using wildcards, for example: `-M "nginx.*.var.paths=[/var/log/syslog*]"` At this point, the system tests are still using the filebeat.py wrapper, but only for loading the ingest node pipeline. The prospector configuration part and the overriding of settings is done by Filebeat itself. Part of elastic#3159.
f674899
to
ea58bdb
Compare
This rewrites the following parts of the prototype in Go:
-M "nginx.*.var.paths=[/var/log/syslog*]"
At this point, the system tests are still using the filebeat.py wrapper, but
only for loading the ingest node pipeline. The prospector configuration part
and the overriding of settings is done by Filebeat itself.
Part of #3159.