Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cache location configuration #878

Closed

Conversation

haikoschol
Copy link
Contributor

This is a first attempt to address #357 and #685.

I tried to move around as few things as possible, but I'm still not totally happy with the result. IMHO The most problematic part are import side effects, especially the creation of directories.

Instead of the environment variable SCANCODE_TEMP_DIR the code looks for SCANCODE_TMP, because that was already in use in commoncode/fileutils.py.

@pombredanne
Copy link
Member

@haikoschol Thank you!

@codecov
Copy link

codecov bot commented Dec 22, 2017

Codecov Report

Merging #878 into develop will increase coverage by 0.09%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #878      +/-   ##
===========================================
+ Coverage    79.06%   79.16%   +0.09%     
===========================================
  Files           93       93              
  Lines        11456    11454       -2     
===========================================
+ Hits          9058     9067       +9     
+ Misses        2398     2387      -11
Impacted Files Coverage Δ
src/licensedcode/cache.py 96.77% <100%> (-0.45%) ⬇️
src/licensedcode/__init__.py 100% <100%> (+4.76%) ⬆️
src/scancode/cache.py 88.43% <100%> (+0.06%) ⬆️
src/scancode/__init__.py 80% <100%> (-5.72%) ⬇️
src/commoncode/fileutils.py 85.79% <100%> (+2.25%) ⬆️
src/licensedcode/index.py 79.36% <0%> (+0.87%) ⬆️
src/scancode/api.py 77.24% <0%> (+1.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a908319...c381712. Read the comment docs.

Signed-off-by: Haiko Schol <[email protected]>
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few comments for your review.

from functools import partial
from hashlib import md5
from os.path import abspath
from os.path import dirname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind keeping the imports sorted in a given "block" (e.g. here the stdlib imports)

from os.path import join
from os.path import exists

from commoncode import fileutils


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like 2 empty lines after imports

else:
cache_root = os.path.expanduser('~')

# tree_hash = tree_checksum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

from os.path import join
from functools import partial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep imports sorted here and elsewhere

def get_or_build_index_through_cache(
check_consistency=DEV_MODE,
return_index=True,
# used for testing only
_tree_base_dir=src_dir,
_tree_base_dir=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to minimize the need to import global variables. It tricky to avoid circular imports and with lower-case names, it is easy to get them mixed up with local variables.


from scancode import scans_cache_dir

from licensedcode import DEV_MODE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be across all scrancode, not just licensedcode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I thought about what would be a better place for it, probably commoncode. But it seemed like moving a few of those globals would drag a whole lot of stuff with it, so I opted for keeping changes to a minimum.
I can try to move all globals to commoncode/__init__.py (and uppercase them while I'm at it). Should I?

if dev_mode:
cache_root = dirname(dirname(dirname(abspath(__file__))))
else:
cache_root = os.path.expanduser('~')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need an abspath too

cache_root = os.environ.get(TMP_DIR_ENV)

if cache_root and not os.path.exists(cache_root):
raise OSError('{} is set to non-existent directory: {}'.format(TMP_DIR_ENV, cache_root))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why raise an error there? The directory should be created instead IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the environment variable requires user intervention (unlike automatically picking cache dirs based on install method). Therefore I'd argue the user wants to be notified if there is a typo in the path, for example.
Creating a directory in the wrong place and proceeding with the scan might lead to filling up the wrong file system and could be time-consuming to diagnose.


if not cache_root:
if dev_mode:
cache_root = dirname(dirname(dirname(abspath(__file__))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the root of scancode's code, right?

Copy link
Contributor Author

@haikoschol haikoschol Dec 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I decided against using the global src_code to avoid too many imports across modules. But see my comment below regarding DEV_MODE.

@sschuberth
Copy link
Collaborator

@pombredanne Can we please get an updated review from you on this? This is blocking to resolve #685 for us.

@pombredanne
Copy link
Member

@sschuberth Looking at this now! Sorry for the delay

@pombredanne
Copy link
Member

@haikoschol I would like to rework this on top of the new code in #885 based on the detailed description in #685 (comment)

@haikoschol
Copy link
Contributor Author

As discussed offline, this has been superseded by #885.

@haikoschol haikoschol closed this Feb 6, 2018
@haikoschol haikoschol deleted the 357-cache-location branch March 31, 2023 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants