diff --git a/README.md b/README.md index 12b69561..0f98ddaf 100644 --- a/README.md +++ b/README.md @@ -1,25 +1,40 @@ # k4FWCore (key4hep FrameWork Core) -k4FWCore is a Gaudi package that provides the PodioDataService, which allows to -use podio-based event data models like EDM4hep in Gaudi workflows. +k4FWCore is a Gaudi package that provides the IOSvc, which allows to +use EDM4hep in Gaudi workflows. -k4FWCore also provides the `k4run` script used to run Gaudi steering files. +k4FWCore also provides the `k4run` script used to run Gaudi steering files. See the [documentation](doc/k4run-args.md) for more information. ## Components ### Basic I/O -#### k4DataSvc +| Current | Legacy | Description | +|---------|--------|-| +| IOSvc | k4DataSvc | Service handling the PODIO types and collections | +| Reader | PodioInput | Algorithm to read data from input files on disk. | +| Writer | PodioOutput | Algorithm to write data to an output file on disk. | +| MetadataSvc | MetaDataHandle | Service/Handle handling user defined metadata | -Component wrapping the PodioDataService to handle PODIO types and collections. +See the [documentation](doc/PodioInputOutput.md) for more information. -#### PodioInput +### Auxiliary -Algorithm to read data from one or multiple input file(s) on disk. +### Collection Merger -#### PodioOutput +Algorithm merging multiple collections of the same type into a single collection. -Algorithm to write data to an output file on disk. +### EventHeaderCreator + +Algorithm creating a new `edm4hep::EventHeaderCollection` data object. + +### EventCounter + +Algorithm counting processed events and printing heart-bit. + +### UniqueIDGenSvc + +Service generating unique, reproducible numbers to be used for seeding RNG used by the algorithms. See the [documentation](doc/uniqueIDGen.md) for more information. ## k4run ``` @@ -57,6 +72,8 @@ print(my_opts[0].foo) * Gaudi +* EDM4HEP + ## Installation and downstream usage. k4FWCore is a CMake project. After setting up the dependencies (use for example `source /cvmfs/sw.hsf.org/key4hep/setup.sh`) diff --git a/doc/LegacyPodioInputOutput.md b/doc/LegacyPodioInputOutput.md new file mode 100644 index 00000000..5c1d24d9 --- /dev/null +++ b/doc/LegacyPodioInputOutput.md @@ -0,0 +1,136 @@ + +# Legacy reading and writing EDM4hep files in Gaudi with the 4DataSvc + +:::{caution} +`k4DataSvc` is a legacy service previously used in K4FWCore for reading and writing data in EDM4hep or other data models based on PODIO. + +The currently used service is `IOSvc`, which offers improved streamlined functionality and better support for modern workflows. For detailed documentation on `IOSvc`, refer to [this documentation](PodioInputOutput.md). +::: + +This page will describe the usage of legacy [k4FWCore](https://github.com/key4hep/k4FWCore) +facilities to read and write EDM4hep. This page also assumes a certain +familiarity with Gaudi, i.e. most of the snippets just show a minimal +configuration part, and not a complete runnable example. + +## The `k4DataSvc` + +Whenever you want to work with EDM4hep in the Gaudi based framework of Key4hep, +you will need to use the `k4DataSvc` as *EventDataSvc*. You can instantiate and +configure this service like the following + +```python +from Gaudi.Configuration import * +from Configurables import k4DataSvc + +evtSvc = k4DataSvc("EventDataSvc") +``` + +**It is important that the name is `EventDataSvc` in this case, as otherwise +this is an assumption from Gaudi.** Once you have the `k4DataSvc` instantiated, +you still have to make the `ApplicationMgr` aware of it, by making sure that the +`evtSvc` is in the list of the *external services* (`ExtSvc`): + +```python +from Configurables import ApplicationMgr +ApplicationMgr( + # other args + ExtSvc = [evtSvc] +) +``` + +## Reading events + +To read events you will need to use the `PodioInput` algorithm in addition to +the [`k4DataSvc`](#the-k4datasvc). Currently, you will need to pass the input +file to the `k4DataSvc` via the `input` option but pass the collections that you +want to read to the `PodioInput`. We are working on making this (discussion +happens in this [issue](https://github.com/key4hep/k4FWCore/issues/105)). The +parts of your options file related to reading EDM4hep files will look something +like this + +```python +from Configurables import PodioInput, k4DataSvc + +evtSvc = k4DataSvc("EventDataSvc") +evtSvc.input = "/path/to/your/input-file.root" + +podioInput = PodioInput() +``` + +It is possible to change the input file from the command line via +```bash +k4run --EventDataSvc.input= +``` + +By default the `PodioInput` will read all collections that are available from +the input file. It is possible to limit the collections that should become +available via the `collections` option + +```python +podioInput.collections = [ + # List of collection names that should be made available +] +``` + +## Writing events + +To write events you will need to use the `PodioOutput` algorithm in addition to +the [`k4DataSvc`](#the-k4datasvc): + +```python +from Configurables import PodioOutput + +podioOutput = PodioOutput("PodioOutput", filename="my_output.root") +``` + +By default this will write the complete event contents to the output file. + +### Writing only a subset of collections + +Sometimes it is desirable to limit the collections to a subset of all available +collections from the EventStore. The `PodioOutput` allows to do this via the +`outputCommands` option that takes a list of `keep` or `drop` commands. Each +command must consist of the `keep`/`drop` command and a target. The target is a +collection name that may include the `?` or `*` wildcard patterns. This might +look like the following + +```python +podioOutput.outputCommands = ["keep *"] +``` + +which will keep everything (the default), while + +```python +podioOutput.outputCommands = ["drop *"] +``` + +will simply drop all collections and effectively write an empty file (apart from +some metadata). A common pattern is to `"drop *"` and then selectively adding +`keep` collections to keep, e.g. to only keep the highest level MC and reco +information: + +```python +podioOutput.outputCommands = [ + "drop *", + "keep MCParticlesSkimmed", + "keep PandoraPFOs", + "keep RecoMCTruthLink", +] +``` diff --git a/doc/PodioInputOutput.md b/doc/PodioInputOutput.md index 600327f0..ab41066d 100644 --- a/doc/PodioInputOutput.md +++ b/doc/PodioInputOutput.md @@ -18,115 +18,214 @@ limitations under the License. --> # Reading and writing EDM4hep files in Gaudi -The facilities to read and write EDM4hep (or in general event data models based -on podio) are provided by [`k4FWCore`](https://github.com/key4hep/k4FWCore). -This page will describe their usage, but not go into too much details of their -internals. This page also assumes a certain familiarity with Gaudi, i.e. most of -the snippets just show a minimal configuration part, and not a complete runnable -example. +The facilities to read and write EDM4hep (or in general event data models based on podio) are provided by [k4FWCore](https://github.com/key4hep/k4FWCore). This page will describe their usage, but not go into too much details of their internals. This page also assumes a certain familiarity with Gaudi, i.e. most of the snippets just show a minimal configuration part, and not a complete runnable example. -## The `k4DataSvc` +## Accessing event data -Whenever you want to work with EDM4hep in the Gaudi based framework of Key4hep, -you will need to use the `k4DataSvc` as *EventDataSvc*. You can instantiate and -configure this service like the following +`IOSvc` is an external Gaudi service for reading and writing EDM4hep files. The service should be imported from `k4FWCore` and named "IOSvc" as other components may look for it under this name. ```python -from Gaudi.Configuration import * -from Configurables import k4DataSvc +from k4FWCore import IOSvc -evtSvc = k4DataSvc("EventDataSvc") +io_svc = IOSvc("IOSvc") # or just IOSvc() as "IOSvc" name is used by default ``` -**It is important that the name is `EventDataSvc` in this case, as otherwise -this is an assumption from Gaudi.** Once you have the `k4DataSvc` instantiated, -you still have to make the `ApplicationMgr` aware of it, by making sure that the -`evtSvc` is in the list of the *external services* (`ExtSvc`): +After instantiation the service should be register as an external service in the `ApplicationMgr`. Similarly, it's important to import the `ApplicationMgr` from `k4FWCore`: ```python -from Configurables import ApplicationMgr +from k4FWCore import ApplicationMgr + ApplicationMgr( # other args - ExtSvc = [evtSvc] + ExtSvc=[ + io_svc, + # other services + ] ) ``` -## Reading events - -To read events you will need to use the `PodioInput` algorithm in addition to -the [`k4DataSvc`](#the-k4datasvc). Currently, you will need to pass the input -file to the `k4DataSvc` via the `input` option but pass the collections that you -want to read to the `PodioInput`. We are working on making this (discussion -happens in this [issue](https://github.com/key4hep/k4FWCore/issues/105)). The -parts of your options file related to reading EDM4hep files will look something -like this +### Reading events -```python -from Configurables import PodioInput, k4DataSvc +The `IOSvc` supports reading EDM4hep ROOT files. Both files written with the ROOT TTree or RNTuple backend are supported with the backend inferred automatically from the files themselves. -evtSvc = k4DataSvc("EventDataSvc") -evtSvc.input = "/path/to/your/input-file.root" +The `Input` property can be used to specify the input. The `IOSvc` will not read any files unless the `Input` property is specified. -podioInput = PodioInput() +::::{tab-set} +:::{tab-item} Python +```python +io_svc.Input = "input.root" ``` - -It is possible to change the input file from the command line via -```bash -k4run --EventDataSvc.input= +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.Input input.root ``` +::: +:::: -By default the `PodioInput` will read all collections that are available from -the input file. It is possible to limit the collections that should become -available via the `collections` option +:::{note} +The value assigned to the `Input` will be processed as is, in particular without regular expression or glob expansion. +::: +A list of filenames can be given in order to specify multiple input files: + +::::{tab-set} +:::{tab-item} Python ```python -podioInput.collections = [ - # List of collection names that should be made available -] +io_svc.Input = ["input.root", "another_input.root", ] +``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.Input input.root another_input.root ``` +::: +:::: -## Writing events -To write events you will need to use the `PodioOutput` algorithm in addition to -the [`k4DataSvc`](#the-k4datasvc): +During processing, for each event in the Gaudi event loop the `IOSvc` will read a frame from the input and populate the Gaudi Transient Event Store (TES) with the collections stored in that frame. -```python -from Configurables import PodioOutput +The `FirstEventEntry` property of `IOSvc` can be used to start processing from a given frame instead of from the first frame in the input: -podioOutput = PodioOutput("PodioOutput", filename="my_output.root") +::::{tab-set} +:::{tab-item} Python +```python +io_svc.FirstEventEntry = 7 # default 0 ``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.FirstEventEntry 7 +``` +::: +:::: + +A list of collection names can be assigned to the `CollectionNames` property of `IOSvc` to limit the number of collections that will be populated. Without specifying the `CollectionNames` all present collections will be read and put into TES. -By default this will write the complete event contents to the output file. +::::{tab-set} +:::{tab-item} Python +```python +io_svc.CollectionNames = ["MCParticles", "SimTrackerHits"] +``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.CollectionNames "MCParticles" "SimTrackerHits" +``` +::: +:::: -### Writing only a subset of collections +### Writing events -Sometimes it is desirable to limit the collections to a subset of all available -collections from the EventStore. The `PodioOutput` allows to do this via the -`outputCommands` option that takes a list of `keep` or `drop` commands. Each -command must consist of the `keep`/`drop` command and a target. The target is a -collection name that may include the `?` or `*` wildcard patterns. This might -look like the following +The `IOSvc` supports writing EDM4hep to the ROOT output. The `Output` property can be used to specify the output. The `IOSvc` will not write any files unless the `Output` property is specified. +::::{tab-set} +:::{tab-item} Python ```python -podioOutput.outputCommands = ["keep *"] +io_svc.Output = "output.root" ``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.Output output.root +``` +::: +:::: + +:::{note} +Unlike the `Input`, the `Output` property should be a single string even when writing multiple files is expected. When the size limit for an output file is reached, the system will automatically open a new file and start writing to it. +::: -which will keep everything (the default), while +The writing backend can be specified with the `OutputType` property of `IOSvc`. The allowed values are `"ROOT"` for TTree-based output or `"RNTuple"` for RNTuple-based output. By default the `"ROOT"` backend is used. +::::{tab-set} +:::{tab-item} Python ```python -podioOutput.outputCommands = ["drop *"] +io_svc.OutputType = "RNTuple" ``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.OutputType "RNTuple" +``` +::: +:::: -will simply drop all collections and effectively write an empty file (apart from -some metadata). A common pattern is to `"drop *"` and then selectively adding -`keep` collections to keep, e.g. to only keep the highest level MC and reco -information: +During processing, at the end of each event from the Gaudi event loop the `IOSvc` will write a frame with the collection present in TES. By default all the collections will be written. The `outputCommands` property of `IOSvc` can be used to specify commands to select which collections should be written. For example, the following commands will skip writing all the collections except for the collections named `MCParticles1`, `MCParticles2` and `SimTrackerHits`: +::::{tab-set} +:::{tab-item} Python ```python -podioOutput.outputCommands = [ +io_svc.outputCommands = [ "drop *", - "keep MCParticlesSkimmed", - "keep PandoraPFOs", - "keep RecoMCTruthLink", + "keep MCParticles1", + "keep MCParticles2", + "keep SimTrackerHits", ] ``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.outputCommands \ + "drop *" \ + "keep MCParticles1" \ + "keep MCParticles2" \ + "keep SimTrackerHits" +``` +::: +:::: + +## Accessing metadata + +The k4FWCore provides the `MetadataSvc` that allows accessing user metadata in PODIO-based data-models. There is no need to instantiate the `MetadataSvc` explicitly when using `IOSvc` as `IOSvc` can instantiate it on its own if needed. + +When both the `Input` and `Output` properties of `IOSvc` are defined, all the metadata originally present in the input will be propagated to the output, possibly adding also any user metadata created during processing. + +Unlike event data, metadata is not exposed to users through the Gaudi TES and cannot be accessed directly by algorithms in the same way. Instead, handling metadata is encapsulated within the algorithm implementation itself. For more details on how this is managed, refer to the developer documentation. + + +## Migrating from the legacy `k4DataSvc` + +Migrating from the legacy `k4DataSvc` or `PodioDataSvc` is rather straightforward. On a steering file level the `PodioDataSvc` should be replaced with the `IOSvc`, while the `PodioInput` and `PodioOutput` algorithms should be removed. For example: + +```diff +-from Configurables import k4DataSvc +-from Configurables import PodioInput +-from Configurables import PodioOutput ++from k4FWCore import IOSvc +from k4FWCore import ApplicationMgr +from Configurables import SelectorAlg + +-podioevent = k4DataSvc("EventDataSvc") +-podioevent.input = "example_input.root" ++io_svc = IOSvc("IOSvc") ++io_svc.Input= "example_output.root" + +-inp = PodioInput() +-inp.collections = ["MCParticles", "SimTrackerHits", "TrackerHits", "Tracks"] ++io_svc.CollectionNames = ["MCParticles", "SimTrackerHits"] + +alg = SelectorAlg( + "Selector", + InputParticles="MCParticles", + InputHits="SimTrackerHits", + Output="SelectedParticles", +) + +-oup = PodioOutput() +-oup.filename = "example_output.root" +-oup.outputCommands = ["drop MCParticles"] ++io_svc.Output = "example_output.root" ++oup.outputCommands = ["drop MCParticles"] + + +ApplicationMgr( +- TopAlg=[inp, alg,oup], ++ TopAlg=[alg], + EvtSel="NONE", +- ExtSvc=[podioevent], ++ ExtSvc=[io_svc], +) +``` + +Both functional algorithms and classic algorithms are compatible with either `IOSvc` or `PodioDataSvc`.