Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: gdal concurrency TDE-457 #96

Merged
merged 13 commits into from
Aug 23, 2022
Merged

Conversation

amfage
Copy link
Contributor

@amfage amfage commented Aug 19, 2022

No description provided.

@amfage amfage marked this pull request as ready for review August 22, 2022 06:18
@amfage amfage requested a review from a team as a code owner August 22, 2022 06:18
from scripts.files.files_helper import get_file_name_from_path, is_tiff
from scripts.gdal.gdal_helper import run_gdal
from scripts.logging.time_helper import time_in_ms


def standardising(files: List[str]) -> List[str]:
def start_standardising(files: List[str], argo_env: bool) -> List[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to do something like cryptocurrency: int and specify the number of workers to use, as argo_env doesn't really make too much sense to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, could you elaborate further on cryptocurrency please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL darn autocorrect! concurrency

so then this just takes the number of threads rather than a opaque is_in_argo

@amfage amfage requested a review from blacha August 22, 2022 22:54
@@ -85,7 +85,7 @@ def parse_path(path: str) -> S3Path:
path (str): A S3 path.

Returns:
S3Path (NamedTupe): s3_path.bucket (str), s3_path.key (str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not 100% related to this, but does vscode not detect automatically the types of all of these, we shouldnt be doubling up the typing in both the function definition and the docs as it often leads to them being out of sync.

output_files.append(tmp_file_path)

if concurrency:
with Pool(concurrency) as p:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any down sides to doing with Pool(1) as p instead of doing two different code paths for running the script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to add any overhead for local file processing, but I could test to see if it is an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem to be an issue.

@kodiakhq kodiakhq bot merged commit 3f4a3be into master Aug 23, 2022
@kodiakhq kodiakhq bot deleted the feat/gdal-concurrency-tde-457 branch August 23, 2022 04:33
@github-actions github-actions bot mentioned this pull request Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants