Skip to content

Commit

Permalink
Add mention of creating zip files
Browse files Browse the repository at this point in the history
  • Loading branch information
timj committed Jan 14, 2025
1 parent c93081a commit e4f7f4d
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 0 deletions.
12 changes: 12 additions & 0 deletions DMTN-306.tex
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,18 @@ \subsection{Registration of files to replicate}

Only files that require replication to another facility are registered with Rucio. As a result, Rubin's instance of Rucio is aware only of files replicated across processing facilities. Files that are local to each facility and not subject to replication remain known only to that facility's Butler and are not registered in Rucio. Since the US Data Facility gets a complete copy of all final data products, by definition files that are not replicated are intermediates in the calculations that are not required to be persisted.

\begin{figure}[h]
\includegraphics[width=0.9\textwidth, center]{images/file_count_and_file_sizes.png}
\caption{Number of files and total file sizes from a data preview processing run.}
\label{fig:filecount}
\end{figure}

The pipeline processing generates many ancillary files in addition to pixel data.
A data preview processing run \cite{10.1051/epjconf/20242950404} demonstrated that the number of JSON and YAML files is approximately of the same scale as the number of FITS and Parquet data files (see Fig.\ \ref{fig:filecount}).
Given that the ancillary files are significantly smaller (sometimes a few kB per file) this can lead to very large file transfer overheads.
To mitigate this problem we have modified the Butler infrastructure to allow the small files from a single processing run to be combined into one or more Zip files.
These Zip files contain the Butler metadata necessary to allow the Butler to retrieve individual files whilst making a single file available to Rucio.

\subsection{Ingestion at reception}
\label{ingestion}

Expand Down
Binary file added images/file_count_and_file_sizes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -378,3 +378,14 @@ @INPROCEEDINGS{2024SPIE13101E..1MF
adsurl = {https://ui.adsabs.harvard.edu/abs/2024SPIE13101E..1MF},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

@article{10.1051/epjconf/20242950404,
author = {{Le Boulc’h, Quentin} and {Hernandez, Fabio} and {Mainetti, Gabriele}},
title = {{The Rubin Observatory’s Legacy Survey of Space and Time DP0.2 processing campaign at CC-IN2P3}},
DOI= "10.1051/epjconf/202429504049",
url= "https://doi.org/10.1051/epjconf/202429504049",
journal = {EPJ Web of Conf.},
year = 2024,
volume = 295,
pages = "04049",
}

0 comments on commit e4f7f4d

Please sign in to comment.