Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cache location configuration #357

Closed
pombredanne opened this issue Nov 7, 2016 · 7 comments
Closed

Allow cache location configuration #357

pombredanne opened this issue Nov 7, 2016 · 7 comments

Comments

@pombredanne
Copy link
Member

Since this could be eventually large and does not play well with any network storage (NTFS shares or NFS) the location where the cache is stored on disk should be configurable, ideally with env variables

@pombredanne
Copy link
Member Author

pombredanne commented Dec 1, 2016

Here is my proposal for this:

  • For development usage (from a git checkout) we would detect the presence of .git directory and use a .cache directory (or may be a .tmp directory) at the root of the checkout as today.

  • For non-development installations (e.g. without a .git directory) we would by default store all the temporary and cached files in a .scancode dir in the user home directory. Since there can be multiple installations and versions installed at the same time, we would use a unique key for each installation as a sub-directory in the .scancode directory. That key will be computed based on a hash of the code tree the same way it is computed for the license cache validity check today in here https://github.com/nexB/scancode-toolkit/blob/c57dab57ff74723e783b7c0dfeb7032f9bb7e84e/src/licensedcode/cache.py#L61 . This would ensure that the sub directory is unique to an installation and that even if some files are updated manually in that installation, a new temp directory would be created.

  • additionally if a SCANCODE_TEMP_DIR environment variable is defined pointing to a directory path, this would be used and override the location used for caching and temp files both in dev and regular mode.

Last we could also include some crude and quick stats on the amount temporary disk space used in the current temp dir.

In all cases, deleting the whole temp directory would have no impact (unless this is done while a scan is mid air of course)

Feedback welcomed!

@pombredanne pombredanne modified the milestones: v2.1, v2.0 Feb 21, 2017
@pombredanne pombredanne removed this from the v2.1 milestone Oct 4, 2017
@pombredanne pombredanne added this to the v3.0 milestone Oct 20, 2017
@haikoschol
Copy link
Contributor

@pombredanne Your proposal makes sense to me and I'd like to give implementing it a try (in order to fix #685). Just two questions/suggestions:

  • Why look for a .git directory to detect development mode instead of SCANCODE_DEV_MODE?
  • Wouldn't it be easier to use the same directory structure in both cases for temporary files (i.e. always use .cache/scancode/<unique_id>/...)?

@pombredanne
Copy link
Member Author

@haikoschol excellent!

Why look for a .git directory to detect development mode instead of SCANCODE_DEV_MODE?

well, either way works: your call SCANCODE_DEV_MODE is a bit of a wart I created IMHO . I am not very happy about this

Wouldn't it be easier to use the same directory structure in both cases for temporary files (i.e. always use .cache/scancode/<unique_id>/...)?

That works too. Having it everywhere the same is indeed better: either .scancode/ or .cache/scancode everywhere is indeed best.

@haikoschol
Copy link
Contributor

True, a tag file is not ideal but the concept of "dev mode" and "user mode" is very useful, I think. The advantage of a tag file is that it is explicit and easy to create for testing/reproducing bugs/etc.

@pombredanne
Copy link
Member Author

so now you are convincing me that it may not be such a wart after all :P

haikoschol pushed a commit to haikoschol/scancode-toolkit that referenced this issue Dec 21, 2017
haikoschol pushed a commit to haikoschol/scancode-toolkit that referenced this issue Dec 21, 2017
This follows the logic proposed in aboutcode-org#357 and is intended to fix aboutcode-org#685.

Signed-off-by: Haiko Schol <[email protected]>
@pombredanne
Copy link
Member Author

See instead this #685 (comment) for a comprehensive analysis and suggested solution for both cache and temp files

pombredanne added a commit that referenced this issue Jan 23, 2018
This is based on the design in
#685 (comment)

 * add new scancode_config.py with centralized global defaults for cache
   and temp_dir, SCANCODE_DEV_MODE, and scancode version.
 * add --cache-dir and --temp-dir as new CLI options
 * ensure that plugins can receive all CLI args when they are called
 * ensure that all accesses to temp-dirs and all accesses cache files
   are properly using the top level cache and temp args
 * refactor code that creates temp directories codebase-wide to always
   accept an argument which is the base dir under which this is created
 * refactor licensedcode cache to use the the cache_dir as an option
 * refactor scancode.cli to use the the temp_dir for the per-scan cache

 * Fixed bug when an output option is followed by another option and not
   a file name (@JonoYang reported this). This will raise an error.

 * Fixed bug on dates that were not properly filtered in the test
   results comparison

 * refactor output filter plugins to be regular codebase plugins that
   process a  whole codebase. They now set an is_filtered Resource
   attribute.
  * move output filter processing entirely inside the Codebase/Resource
    processing
  * Modified resource.Codebase/resource walking code accrodingly,
    removed the  sort option from walk(): walk() and children() are
    always returning sorted resources now: default sort order of the
    resource tree is by file, then directories, then case-insentive name
  * fix bug on incorrect file_counts

 * wrap call to plugins in try/except to catch plugins errors on plugins
   runs and exit cleanly with a message. This is now done in a function
   for all plugins except scanners
 * add concrete kwargs for options to plugin methods. Plugins now
   receive all the CLI options as kwargs and there is no Command Option
   ugly tuple anymore.
 * cleanup output to use only a process_codebase and not a save_results
 * Since all errors stack trace is now fully reported, ensure that
   only the last and first line of an error message is used for test
   results comparison.

 * add siblings(), has_(children, parent, siblings) methods to
   resource.Resource
 * add size_count for descendants size to a Resource. Also available
   in JSON outputs. The size attribute of a dir is now always 0.

 * removed plugincode.output._TEST_MODE flag. 
 * renamed SCANCODE_DEBUG* flags.
 * other minor refactorings and formatting

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 23, 2018
 *also do not use metavar in help

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 23, 2018
pombredanne added a commit that referenced this issue Jan 23, 2018
pombredanne added a commit that referenced this issue Jan 23, 2018
 * make sure skip_filtered is used where needed and only there
 * ensure counts are correct in various cases

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 24, 2018
pombredanne added a commit that referenced this issue Jan 24, 2018
pombredanne added a commit that referenced this issue Jan 24, 2018
pombredanne added a commit that referenced this issue Jan 24, 2018
 * this restore the proper cache warmup

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 24, 2018
 * in licensedcode.cached

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 24, 2018
 * this is now essentailly a copy of commoncode.fileutils.create_dir()

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jan 24, 2018
JonoYang pushed a commit that referenced this issue Feb 1, 2018
JonoYang pushed a commit that referenced this issue Feb 1, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
JonoYang pushed a commit that referenced this issue Feb 1, 2018
@pombredanne
Copy link
Member Author

This has been implemented as part of the #885 merge. Closing!

pombredanne added a commit that referenced this issue May 31, 2018
 * this will allow to support facets where one file can be in multiple
   facets

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue May 31, 2018
 * this is generally useful when builoding plugins with options

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue May 31, 2018
facets are based on path only

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue May 31, 2018
pombredanne added a commit that referenced this issue May 31, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue May 31, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 7, 2018
 * this will allow to support facets where one file can be in multiple
   facets

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 7, 2018
 * this is generally useful when builoding plugins with options

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 7, 2018
facets are based on path only

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 7, 2018
pombredanne added a commit that referenced this issue Jun 7, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 7, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 8, 2018
 * this will allow to support facets where one file can be in multiple
   facets

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 8, 2018
 * this is generally useful when builoding plugins with options

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 8, 2018
facets are based on path only

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 8, 2018
pombredanne added a commit that referenced this issue Jun 8, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 8, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 11, 2018
 * this will allow to support facets where one file can be in multiple
   facets

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 11, 2018
 * this is generally useful when builoding plugins with options

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 11, 2018
facets are based on path only

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 11, 2018
pombredanne added a commit that referenced this issue Jun 11, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Jun 11, 2018
Signed-off-by: Philippe Ombredanne <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants