Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat modules: first phase of the Go implementation #3333

Merged
merged 1 commit into from
Jan 16, 2017

Conversation

tsg
Copy link
Contributor

@tsg tsg commented Jan 10, 2017

This rewrites the following parts of the prototype in Go:

  • Reading of the filesets
  • Evaluation of variables, including the builtin variables
  • Creation of the prospectors configurations and starting them
  • Overriding of the variables from the config file and from the command line
  • The -module and -M CLI flags
  • The ability to override variables from multiple filesets using wildcards, for example: -M "nginx.*.var.paths=[/var/log/syslog*]"

At this point, the system tests are still using the filebeat.py wrapper, but
only for loading the ingest node pipeline. The prospector configuration part
and the overriding of settings is done by Filebeat itself.

Part of #3159.

@tsg tsg added the review label Jan 10, 2017
@tsg tsg mentioned this pull request Jan 10, 2017
22 tasks
@tsg tsg force-pushed the filebeat_modules_golang branch 2 times, most recently from 7a7dbb7 to e5ab950 Compare January 11, 2017 08:22
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only had a first quick look and left some minor comments.

type ModuleConfig struct {
Module string `config:"module" validate:"required"`
Enabled *bool `config:"enabled"`
Filesets map[string]*FilesetConfig `config:"filesets"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the question. Whether we need Filesets part of the ModuleConfig?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't at least 1 fileset required in the config? Talking about the validate part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha :). It's actually that all filesets are enabled by default and you need to explicitly disable them. So a simple valid config would be:

modules:
- name: nginx

If you want to disable the error fileset, you need to write:

modules:
- name: nginx
  filesets.error.enabled: false

Note that I went for having the filesets a dictionary with all settings, unlike in Metricbeat were it's an array with the names. The reason is that I think it's going to be rare to disable filesets, and there lots of fileset specific settings.

Copy link
Contributor

@ruflin ruflin Jan 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Is there a reason you prefix the fileset config options with fileset. Namespace conflicts? The convention in metricbeat is to just use the metricset name, so above it would be error.enabled.

An other option about would be to use filesets: ["access"] if you want only one. If filesets is not around, all sets are loaded. I agree this makes sense in the filebeat module case. But this change would require the previous change, as otherwise filesets would have 2 meanings.

The following configs would have the same outcome.

modules:
- name: nginx
  error.enabled: false
modules:
- name: nginx
  filesets: ["access"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side node: I like that the fileset can be disabled with the enabled: false flag. Makes it command line friendly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for prefixing with filesets is just to keep the implementation simple, in the sense that the ModuleConfig can be just:

type ModuleConfig struct {
	Module   string                    `config:"module"     validate:"required"`
	Enabled  *bool                     `config:"enabled"`
	Filesets map[string]*FilesetConfig `config:"filesets"`
}

And future module level settings can be added easily. In order to get rid of filesets, I'd need something like:

type ModuleConfig map[string]*common.Config

and then have code to extract the "enabled" and "name". I can look into doing that, although I'm not sure the extra code complexity is justified.

Btw, I wrote an overview on the user interaction here: #3159

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for the additional complexity to replace filesets from the config and be consistent with metricbeat. This also make the command line options look more natural as there it is already without filesets. The nice part about having *common.Config that it provides already an Enabled() function, so not even unpacking is needed.

I didn't check the code details on this yet, but one solution is to work in general with *common.Config objects which are passed around and unpacking happens locally. So also each fileset will unpack its own config options

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsg Looking at #3159

Shouldn't it be:

modules:
- name: nginx
  access:
    close_eof: true

I removed the prospector path above. So it is actually map[string]ProspectorConfig. Each module in the end is 1 prospector. Or are there other configs inside access that could occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command lines don't include filesets, just the configuration. For the cli, the options are prefixed with module.fileset, so for example nginx.access. From there on, the CLI flags match the structure in the configuration file. So to set close_eof you would type nginx.access.prospector.close_eof. But anyway, I'll look into getting rid of filesets, that would be cleaner.

There are more things to a fileset config than the prospector. Each fileset has configurable variables, which can be used in the prospector template and for deciding the pipeline. For example, you can set format: json and that variable is used in all the places we need it. I'd expect the module variables to be what 90% users need, and changing something directly in the prospector is "expert" mode. The pipeline itself is also a configuration at the fileset level.

}
}

return cfg, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a debug log entry here outputting the final merged config would be nice to have.


// Modules related command line flags.
var (
modulesFlag = flag.String("module", "", "List of enabled modules (comma separated)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had "modules" in the first prototype, then thought that "module" would be nicer for the most common case. But you're right, I think "modules" makes more sense overall.

registry map[string]map[string]*Fileset // module -> fileset -> Fileset
}

// newModuleRegistry readsy and loads the configured module into the registry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/readsy/reads

fields:
source_type: apache2-access
pipeline_id: {{beat.pipeline_id}}
input_type: log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change means each fileset can max have 1 prospector (which I think is a good limitation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, multiple prospectors didn't really work out with the requirement of being able to override some of the prospector settings when needed.

@tsg tsg force-pushed the filebeat_modules_golang branch from e5ab950 to bb13529 Compare January 11, 2017 08:57
@tsg
Copy link
Contributor Author

tsg commented Jan 13, 2017

@ruflin I added another commit to move all filesets one level up in the configuration (so get rid of the filesets key word). The added code complexity is acceptable, and the other main drawback is that we can't have module or enabled as a fileset name. That's not a problem now, but it might become one if we'll add more config options at the module level.

Overall, I think this is a good change.

@tsg tsg force-pushed the filebeat_modules_golang branch from 38ce7dc to f674899 Compare January 16, 2017 09:12
@tsg
Copy link
Contributor Author

tsg commented Jan 16, 2017

jenkins, test it

@@ -30,13 +33,32 @@ func New(b *beat.Beat, rawConfig *common.Config) (beat.Beater, error) {
if err := rawConfig.Unpack(&config); err != nil {
return nil, fmt.Errorf("Error reading config file: %v", err)
}

registry, err := fileset.NewModuleRegistry(config.Modules)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we introduce more an more different registries, perhaps we should call the variable moduleRegistry here, just to be sure there is no confusion.


// applyTemplate applies a Golang text/template
func applyTemplate(vars map[string]interface{}, templateString string) (string, error) {
tpl, err := template.New("test").Parse(templateString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be text or test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that doesn't really matter, but also "test" is not really good since this is not a test :). I'll make it text, thanks.

enabled: true

# Format. Options are with_plugins and no_plugins
#var.pipeline: with_plugins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if we should change this from var to option. I somehow dislike var as it is not very descriptive. It is a configuration option for the metricset. As these are "metricset" specific perhaps we can even remove the var completely and have it under fileset.access.pipeline. As most fileset have pipeline and paths these two could be basic options shared across all filesets like ingest_pipeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering inlining var as well, like we did for fileset. That would be the nicest CLI experience but the chances of collisions are higher in there.

The reason I called them "vars" is that they can reference each other. But I think I'm good with option as well, have to think some more about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was also to namespace it based on the conflicts. But I think the number of options will be quite small here and should not correlate with the existing once. As these are the variables for the template and (probably) rarely used, it's also ok if it is perhaps sometimes a little bit a longer name that prevents a conflict. It's great now to have prospector in it's own namespace :-)

This rewrites the following parts of the prototype in Go:

* Reading of the filesets
* Evaluation of variables, including the builtin variables
* Creation of the prospectors configurations and starting them
* Overriding of the variables from the config file and from the command line
* The -module and -M CLI flags
* The ability to override variables from multiple filesets using wildcards,
  for example:
    `-M "nginx.*.var.paths=[/var/log/syslog*]"`

At this point, the system tests are still using the filebeat.py wrapper, but
only for loading the ingest node pipeline. The prospector configuration part
and the overriding of settings is done by Filebeat itself.

Part of elastic#3159.
@tsg tsg force-pushed the filebeat_modules_golang branch from f674899 to ea58bdb Compare January 16, 2017 12:36
@ruflin ruflin merged commit 2eb95e4 into elastic:master Jan 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants