Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tips for Handling Large High-Resolution Tiff Images with Multiple Channels in the arc-analysis Pipeline #1178

Open
vyshakha opened this issue Jan 10, 2025 · 1 comment
Labels
question Further information is requested

Comments

@vyshakha
Copy link

vyshakha commented Jan 10, 2025

I am working with large Tiff images of very high resolution that contain multiple channels. Specifically, these images have a high number of channels (19+) and are quite large (around 60k) in terms of both pixel dimensions and file size.
I am encountering challenges processing these images efficiently within the arc-analysis pipeline, particularly in terms of memory usage, processing time, and potential issues with scaling or handling such large datasets.
Could anyone share tips or best practices for working with large, high-resolution Tiff images with many channels in the arc-analysis pipeline? Are there recommended techniques (besides creating multiple fovs) for optimizing memory usage, improving processing speed, or handling such data more effectively?

@vyshakha vyshakha added the question Further information is requested label Jan 10, 2025
@alex-l-kong
Copy link
Contributor

Hi @vyshakha how much memory do you have to work with on your machine?

Loading in an entire 60000x60000 image with 19 channels could take up over 50GB of memory assuming np.float64 representation (which our pipeline does do on a per-FOV basis). Even with 64 GB of memory, this could crash the pipeline if you had several other processes running in the background.

If you have access to an HPC with significantly more RAM, it may be easiest to run the pipeline there. You could try lowering the precision to np.float32 or np.float16 during preprocessing, but this will require more computational expertise as it will involve changing the underlying loading functionality and casting the subsetted training dataset back to np.float64 (I believe the SOM function explicitly requires this to maintain maximum precision when determining the weights).

You should also try to use an aggressive subset parameter (subset_proportion = 0.01 or even lower) so that the full training dataset can be loaded into memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants