Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a plugin to provide more than one option #787

Closed
pombredanne opened this issue Oct 4, 2017 · 15 comments
Closed

Allow a plugin to provide more than one option #787

pombredanne opened this issue Oct 4, 2017 · 15 comments

Comments

@pombredanne
Copy link
Member

We should have a way for a plugin to provide not one, but several CLI options
Also a plugin should be able to provide fully configured click Options as options and not be constrained in what the options args are.

This is needed whenever more than one option is required or when more advanced settings need to be provided for a given option. This would be needed for scan plugins where several have multiple options.

@yashdsaraf ping :)

@yashdsaraf
Copy link
Collaborator

@pombredanne Currently each pre-scan plugin works in a way that if an option such as --ignore abc is passed, any plugin in the pre-scan entrypoint with the entry name ignore is initialized with the user input abc.
If a plugin provides more than one option, what do you think would be a better approach?

  • Initialize the plugin with different user inputs depending on the options?
    for e.g --ignore-glob '*txt' --ignore-regex '^\.' will call the ignore plugin as
plugin('*.txt')
plugin('^\.')

or

  • Pass an iterable with all the user inputs and options to a single plugin object and leave it to the plugin to decide how to process that data.
    for e.g --ignore-glob '*txt' --ignore-regex '^\.' will call the ignore plugin as
plugin(dict('ignore-glob'='*.txt', 'ignore-regex'='^\.'))

@yashdsaraf
Copy link
Collaborator

Also, should we add a common plugin prefix to the options? say if ignore plugin defines 2 options glob and regex, do we add the plugin name such as --ignore-glob and --ignore-regex or we let the plugins practice complete autonomy in the naming scheme?

@pombredanne
Copy link
Member Author

@yashdsaraf Thanks... here is my thinking on this:

  1. We should generalize the notion of plugin to decouple the UI (e.g. the CLI option name) from the actual declaration (as an entry point). This would mean that every plugin would have the same structure and could contribute or more options as they please and define these options entirely.

  2. With this generalization a plugin becomes something that receives an iterable of resources and returns an iterable of resources. We should therefore better specify this resource object. In the future a plugin could express interest only in some resource that match certain criteria too.

With this, the only thing that a plugin would declare in the entry point is its "name" and its plugin callable for a given stage of the scan (pre, scan, post, output). All options would be provided by the plugin class itself. And all plugins would have the same interface: act on a stream of Resources.

This would mean reworking the way all the plugins work today to a single common way.
When they are invoked is the only thing that would be special to a given plugin.
This would IMHO pave nicely the support for scan plugins (and later the support for plugins that could be written in another language than Python)

Does this make sense?

@yashdsaraf
Copy link
Collaborator

Okay, this is what I got from that

  1. All the plugins will have a similar structure, i.e
  • They all take an iterable of resources as input and return the same after processing
  • They all define their own click option(s)
  • Only structural difference would be -- at what stage of a scan is a plugin called
  1. The Resource class will need some modifications to be handled by all the plugins

I think it's better to start restructuring the plugins to take in iterable of resource, rest points will follow automatically.

@pombredanne
Copy link
Member Author

perfect!

@yashdsaraf
Copy link
Collaborator

@pombredanne Like you said, ultimately all the plugins would return an iterable of resources. In this case, how would the output plugins work?

@pombredanne
Copy link
Member Author

Good point: the output plugins could eitherr:

  • return an iterable too.... as there could be more than one and the caller (e.g. the CLI) would just consume this iterable doing nothing after piping through the output plugins. e.g. with multiple outputs the output is not special anymore.
  • OR we tee/send an iterable to each output plugins and they just consume that ass they please

The later is likely cleaner

@yashdsaraf
Copy link
Collaborator

I'm sorry I didn't get that, could you give an example of what should happen if a user calls scancode cli with multiple format options.

@pombredanne
Copy link
Member Author

See the discussion in #789

@yashdsaraf
Copy link
Collaborator

Ah, got it. Each option supplied by the format plugins would take the output file path as input.
Also, what happens if a path is not given? Do we print the output to stdout or simply abort due to the missing param?

@pombredanne
Copy link
Member Author

@yashdsaraf no path given would mean stdout output which would be messy if there are more than one ; ) but that's OK, we cannot prevent folks to shoot themselves in the foot

@pombredanne
Copy link
Member Author

@yashdsaraf I think what would be most useful for @haikoschol is to have this available for post-scan plugins right now. Do you think you can have a look into this? Or shall I take a stab?

haikoschol pushed a commit to haikoschol/scancode-toolkit that referenced this issue Nov 24, 2017
pombredanne pushed a commit that referenced this issue Dec 8, 2017
@pombredanne
Copy link
Member Author

Shoot, in working to encapsulate Resources with PyFilesystem, I did hit PyFilesystem/pyfilesystem2#120 and #688 .... sigh. I need to beef things up to make things work :|

@pombredanne
Copy link
Member Author

FYI I submitted a PR to PyFilesystem/pyfilesystem2#121 in order to have proper handling of non-fs decoable bytes-only paths. Based on this I am going forward with the refactoring of the Resource and scan pipeline around PyFilesystem2 which should provide a clean abstraction and simplify vastly creating new plugins

pombredanne pushed a commit that referenced this issue Jan 17, 2018
pombredanne added a commit that referenced this issue Jan 17, 2018
pombredanne added a commit that referenced this issue Jan 17, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 17, 2018
pombredanne added a commit that referenced this issue Jan 17, 2018
pombredanne added a commit that referenced this issue Jan 17, 2018
pombredanne added a commit that referenced this issue Jan 17, 2018
pombredanne added a commit that referenced this issue Jan 26, 2018
 * otherwise some test fail in somce layouts

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 26, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 26, 2018
pombredanne added a commit that referenced this issue Jan 26, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 26, 2018
pombredanne added a commit that referenced this issue Feb 1, 2018
 * all plugins are now required to declare which attributes they add to
   the scan as a list of attr attributes as declare the sort order in 
   which these should appear on the scan results
 * To suppprt these plugin-contributed scan attributes, a new Resource
   subclass is created on the fly based on plugin provided attributes
 * Since Scanners functions return the actual attributes that are set
   directty on the Resource oject, the API has been updated accordingly.
   Scanner functions now return a mapping.

 * the whole Resource caching system has been updated: resources are
   either entirely in memory or entirely serialized on disk for caching. 
   The new --max-in-memory option sets how many Resources are kept in
   RAM, default to 10K.
 * codebases are no longer cached in a ugly globals

 * only pre-scan plugins can defined a set of "requires" scans that are
   executed in a scan stage before the "pre-scan" (a tad of an ugly
   name).
 * no option is used as a default anymoe: all options need to be set
   explicitly (such as mark-source needing info)
 * codebase.remove_resource can now remove resource during a walk safely 
 * info is now a proper scanner and is no longer super special
 * the plugin entrypoint name is NOT used anymore as the primary key
   for scan results attribute of a scan
 * cli.run_scanners is a new function to encapsulate the scanner runs

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * ensure that a File option used in output cannot bet set to the value
   of a command line option. This is done with FileOptionType, a
   click.File subclass that verifies that a file name is not one of the
   avaialble options and fails if this is the case.
   Reported by @JonoYang
 * move CommandLineOption to scancode.__init__.py
 * add new args to CommandLineOption: requires and conflicts. Each are
   a sequence of other option names that either conflict with this
   option or are require to be set with this option. --strip-root,
   --full-root, --verbose, --quiet and plugin options with option
   dependencies are now using this feature. The CLI fails early
   if requirements are not met or conflicts exist. Added a few simple
   tests for this.
 * add new sort_order arg to CommandLineOption and used it to
   set a relative sort order for options helps in a help group
 * add new documentation CLI help_group for help and other related
   options. Reorganized CLI help
 * renamed json-realted output CLI options to plain json-*

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
JonoYang pushed a commit that referenced this issue Feb 1, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * for now all stages are catching exceptions and rethrowing them with
   an error message(except for scan stages that already catch
   exceptions). This stops the executaion. A better way would be to
   continue trucking and disable the faulty plugins

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * the plugin for license reindexing is updated in next commit

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * reported by @haikoschol at
  #885 (comment)

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * reported by @haikoschol

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * reported by @haikoschol
 * timings are now correctly added if requested to each Resource
 * there is a test to ensure this works and is included in JSON 
   and JSON lines outputs
 * minor refactor of cli_test_utils.py and other minor clieanups

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * tests were failing on Windows otehrwise

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * this can happen if info scan failed somehow

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * otherwise some test fail in somce layouts

Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
JonoYang pushed a commit that referenced this issue Feb 1, 2018
JonoYang pushed a commit that referenced this issue Feb 1, 2018
JonoYang pushed a commit that referenced this issue Feb 1, 2018
 * all plugins are now required to declare which attributes they add to
   the scan as a list of attr attributes as declare the sort order in 
   which these should appear on the scan results
 * To suppprt these plugin-contributed scan attributes, a new Resource
   subclass is created on the fly based on plugin provided attributes
 * Since Scanners functions return the actual attributes that are set
   directty on the Resource oject, the API has been updated accordingly.
   Scanner functions now return a mapping.

 * the whole Resource caching system has been updated: resources are
   either entirely in memory or entirely serialized on disk for caching. 
   The new --max-in-memory option sets how many Resources are kept in
   RAM, default to 10K.
 * codebases are no longer cached in a ugly globals

 * only pre-scan plugins can defined a set of "requires" scans that are
   executed in a scan stage before the "pre-scan" (a tad of an ugly
   name).
 * no option is used as a default anymoe: all options need to be set
   explicitly (such as mark-source needing info)
 * codebase.remove_resource can now remove resource during a walk safely 
 * info is now a proper scanner and is no longer super special
 * the plugin entrypoint name is NOT used anymore as the primary key
   for scan results attribute of a scan
 * cli.run_scanners is a new function to encapsulate the scanner runs

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Feb 6, 2018
 * also ensure that the extractcode cli does not depend
   on scancode cli.

Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne
Copy link
Member Author

Merged in develop with #885 !

JonoYang pushed a commit that referenced this issue Feb 8, 2018
 * also ensure that the extractcode cli does not depend
   on scancode cli.

Signed-off-by: Philippe Ombredanne <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants