-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: gdal concurrency TDE-457 #96
Conversation
scripts/standardising.py
Outdated
from scripts.files.files_helper import get_file_name_from_path, is_tiff | ||
from scripts.gdal.gdal_helper import run_gdal | ||
from scripts.logging.time_helper import time_in_ms | ||
|
||
|
||
def standardising(files: List[str]) -> List[str]: | ||
def start_standardising(files: List[str], argo_env: bool) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to do something like cryptocurrency: int
and specify the number of workers to use, as argo_env
doesn't really make too much sense to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, could you elaborate further on cryptocurrency
please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL darn autocorrect! concurrency
so then this just takes the number of threads rather than a opaque is_in_argo
@@ -85,7 +85,7 @@ def parse_path(path: str) -> S3Path: | |||
path (str): A S3 path. | |||
|
|||
Returns: | |||
S3Path (NamedTupe): s3_path.bucket (str), s3_path.key (str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not 100% related to this, but does vscode not detect automatically the types of all of these, we shouldnt be doubling up the typing in both the function definition and the docs as it often leads to them being out of sync.
scripts/standardising.py
Outdated
output_files.append(tmp_file_path) | ||
|
||
if concurrency: | ||
with Pool(concurrency) as p: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any down sides to doing with Pool(1) as p
instead of doing two different code paths for running the script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to add any overhead for local file processing, but I could test to see if it is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem to be an issue.
No description provided.