-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic switching between CPU and GPU for DQMEDAnalyzer #35879
Comments
A new Issue was created by @sroychow Suvankar Roy Chowdhury. @Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign dqm, heterogeneous |
New categories assigned: heterogeneous,dqm @jfernan2,@ahmad3213,@rvenditti,@fwyzard,@emanueleusai,@makortel,@pbo0,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@mmusich currently I don't think that is possible. The reason is that the Since an You should be able to achieve the same effect simply with
The However, with this approach how do you disentangle what has been running on the CPU and on the GPU ? |
@fwyzard aren't all DQM modules |
Ah, good point, I'm stuck to the pre-transition approach based on |
It shouldn't but perhaps we still haven't got the gist of what is requested. |
Then this monitorpixelTrackSoA = SwitchProducerCUDA(
cpu = DQMEDAnalyzer('SiPixelPhase1MonitorTrackSoA',
pixelTrackSrc = cms.InputTag("pixelTracksSoA@cpu"),
TopFolderName = cms.string("SiPixelHeterogeneous/PixelTrackSoA")
),
cuda = DQMEDAnalyzer('SiPixelPhase1MonitorTrackSoA',
pixelTrackSrc = cms.InputTag("pixelTracksSoA@cuda"),
TopFolderName = cms.string("SiPixelHeterogeneous/PixelTrackSoA")
)
) should work in principle. Still
|
Sorry, you wrote
so I assumed you meant inside the HLT running online. |
first, I didn't wrote it, I am not the author of the issue :)
No, we're trying to address requests from PPD/ TSG about CPU/GPU validation. |
Wops, sorry ...
Ah, OK. |
I think too that the details of what exactly is being requested would be crucial to figure out the best course of action. In the leading order,
just reading the |
|
I suspect that this should work
cuda does not mean: run on cuda |
In ECAL we have been looking at similar things recently and tried the |
@thomreis I am not sure I understand your issue very well. Commenting from my experience, for modules aimed at monitoring collections, you should use the same tag for a GPU and a CPU workflow(e.g. |
Hi @sroychow that was in fact what I meant when I said I want to change a parameter regardless if the CPU or GPU case of the SwitchProducer is used. In my example the SwitchProducer would be defined in some cff file with the InputTag for both, the |
Any later customizations of s = SwitchProducerCUDA(cpu=..., cuda=...)
for case in s.parameterNames_():
getattr(s, case).parameter = "new value" |
I'd actually like to understand better the use cases for different code/configuration for CPU and CUDA in DQM. The SwitchProducer-based infrastructure assumes that the downstream consumers do not care where exactly their input data products were produced. For example, if the producer and consumer were in different jobs, this SwitchProducer-based approach would not work in general. Could you tell more what exactly you intend to be different for the CPU vs GPU -produced input data? (e.g. #35879 (comment) suggests for different folders for histograms) |
@makortel if my understanding is correct, the different folders are for the output of the histograms in the DQM root file coming from the DQM modules, in order to distinguish the monitoring of the CPU vs GPU collections . But nothing different is expected from the input collections. |
For ECAL we want to make event-by-event CPU vs. GPU comparisons plots. That requires both input collections but that part of the DQM module should only run on GPU machines (and only on a subset of events obviously because otherwise there would be no point in reconstructing on GPUs in the first place). One thing we tried was a cff file with the following but with that we had the issue I described earlier: import FWCore.ParameterSet.Config as cms
from HeterogeneousCore.CUDACore.SwitchProducerCUDA import SwitchProducerCUDA
from DQM.EcalMonitorTasks.EcalMonitorTask_cfi import ecalMonitorTask as _ecalMonitorTask
ecalMonitorTask = SwitchProducerCUDA(
cpu = _ecalMonitorTask.clone()
)
# Customization to run the CPU vs GPU comparison task if the job runs on a GPU enabled machine
from Configuration.ProcessModifiers.gpu_cff import gpu
from DQM.EcalMonitorTasks.GpuTask_cfi import ecalGpuTask
gpu.toModify(ecalMonitorTask,
cuda = _ecalMonitorTask.clone(workerParameters = dict(GpuTask = ecalGpuTask)))
gpu.toModify(ecalMonitorTask.cuda.workers, func = lambda workers: workers.append("GpuTask")) |
I would also like to point out this nice talk by A. Bocci about GPU in DQM listing all the possibilities: https://indico.cern.ch/event/975162/contributions/4106441/ |
disclaimer n. 1: I've typed all this directly on GitHub without testing any of it - hopefully I didn't make many mistakes, but don't expect this to be 100% error-free disclaimer n. 2: names are not my forte; if you find better names for what I suggest, please, go ahead with them ! BackgroundI think that the complexity here comes from the fact that we want to have a single workflow configuration that does different things (two different set of validation plots) depending if a GPU is available or not:
IMHO this is not something that should be handled "automatically" by the presence or absence of a GPU, but at the level of the definition of the workflow.
Then we could (try to) run each workflow on a different machine:
Then
(*) depending what we think should be the behaviour of the workflow The bottom line is, I would now try to find a technical solution for this problem, because it should have a different definition altogether. The current behaviour in the
So, running without the Then we can add a second modifier (e.g. Let me try to give some made-up examples... Make DQM plots of some reconstructed quantitiesLet's say the original configuration was monitorStuff = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff'),
folder = cms.string('DQM/Folder/Stuff')
)
stuffValidationTask = cms.Task(monitorStuff) Once GPUs are involved, we have three options
Assuming that To achieve 2. we have two options. monitorStuff = SwitchProducerCUDA(
cpu = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff'),
folder = cms.string('DQM/Folder/Stuff')
),
cuda = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff'),
folder = cms.string('DQM/Folder/StuffOnGPU')
)
) I would be great if somebody could actually test it and let us know if it works :-) If the collections being monitored are not from a monitorStuff = SwitchProducerCUDA(
cpu = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuffOnCPU'), # or 'someStuff@cpu'
folder = cms.string('DQM/Folder/Stuff')
),
cuda = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuffOnGPU'), # or 'someStuff@cuda'
folder = cms.string('DQM/Folder/StuffOnGPU')
)
) Finally, 3. is just a variation of the last option: monitorStuff = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff@cpu'),
folder = cms.string('DQM/Folder/Stuff')
)
monitorStuffOnGPU = DQMEDAnalyzer('MonitorStuff',
src = cms.InputTag('someStuff@cuda'),
folder = cms.string('DQM/Folder/StuffOnGPU')
) The configuration for 2. (either options, though the first one is simpler) or 3. can be generated from the configuration of 1. with an appropriate modifier. For 2. (first): _monitorStuff = monitorStuff.clone()
monitorStuff = SwitchProducerCUDA(
cpu = _monitorStuff.clone()
)
gpu_validation.toModify(monitorStuff,
cuda = _monitorStuff.clone(
src = 'someStuff',
folder = 'DQM/Folder/StuffOnGPU'
)
) While for 3. a new module needs to be added to a Task or Sequence: gpu_validation.toModify(monitorStuff,
src = 'someStuff@cpu'
)
monitorStuffOnGPU = monitorStuff.clone(
src = 'someStuff@cuda',
folder = 'DQM/Folder/StuffOnGPU'
)
_stuffValidationTask_gpu = stuffValidationTask.copy()
_stuffValidationTask_gpu.add(monitorStuffOnGPU)
gpu_validation.toReplaceWith(stuffValidationTask, _stuffValidationTask_gpu) Make DQM plots of GPU-vs-CPU reconstructed quantitiesIf we have a single DQM module that can do both the traditional validation, and the GPU-vs-CPU comparison, we have few options. The configuration for performing only the traditional validation could be: monitorAndCompareStuff = DQMEDAnalyzer("MonitorAndCompareStuff",
reference = cms.InputTag('someStuff'),
target = cms.InputTag('') # leave empty not to do any comparison
) As in the previous example, if The configuration for performing the traditional validation and the GPU-vs-CPU comparison could be monitorAndCompareStuff = DQMEDAnalyzer("MonitorAndCompareStuff",
reference = cms.InputTag('someStuff@cpu'),
target = cms.InputTag('someStuff@cuda')
) Whether the Also in this case, the second configuration could be generated starting from the first by an appropriate modifier: gpu_validation.toModify(monitorAndCompareStuff,
reference = 'someStuff@cpu',
target = 'someStuff@cuda'
) |
I have tried to implement something along Andrea's comments (https://github.com/cms-sw/cmssw/compare/master...thomreis:ecal-dqm-addGpuTask?expand=1), based on GPU vs. CPU comparison code from @alejands in PR #35946, but the matrix tests mostly fail with an exception (running on lxplus without GPU):
I am not quite sure what that means and if there would be a way to change EcaDQMonitorTask to be compliant (if it is actually EcaDQMonitorTask that causes this). Note that if the |
Hi Thomas, thanks for the test.
Can I ask what behaviour you are trying to achieve ?
Anyway, it looks like we (currently) cannot use the SwitchProducer for a
DQMEDAnalyzer.
Matti, is this something you think should be added ? Or do we look for a
different solution ?
|
If the Of course there is in principle no need to give the Could you elaborate a bit more what the DQMEDAnalyzer does that prevents its use within a SwitchProducer? |
IMHO this should actually crash: you are explicitly asking to run something on GPU when one is not present.
It looks like the In principle - with the current approach where the "branch" choseb by the |
I see. So the
If |
👍 |
I fully agree.
It seems to me that all the presented use cases so far are really about knowledge of whether the data product was produced on CPU or GPU. The |
Hi, @makortel I'm not sure what the action plan is to get this fixed, should we have a discussion or is it not necessary? |
I understood @thomreis found a different solution for his use case ("Make DQM plots of GPU-vs-CPU reconstructed quantities" in Andrea's #35879 (comment)). The exact use case of the issue description (Andrea's option 1, "same folder for CPU and GPU quantities", in #35879 (comment)) works out of the box. For the use case of Vincenzo in #35879 (comment) (Andrea's option 2, "different folder for CPU and GPU quantities, fill only one of those in a job", in #35879 (comment)), we are going to implement something like Provenance telling if a data product was produced on a CPU or a GPU (actual implementation will likely be different, but I hope this gives the idea). Andrea's option 3, "different folder for CPU and GPU quantities, fill both in a job", in #35879 (comment) would be best implemented with a specific Modifier (as Andrea wrote). |
Just to add that the this use case can technically be implemented already today by using the information stored in event provenance. For example, for an event product While it can be done, this model doesn't scale well for many uses or evolving configuration. Therefore we're planning to introduce a simpler record at process level along "whether GPU offloading was enabled or not". Some more details are in #30044 (where any feedback on that approach would be welcome). |
From Tracker DQM side, we are developing DQM modules to monitor hlt products(e.g. pixel Tracks, Vertices in SoA) which can either be produced on a GPU or a CPU. Right now in out tests, the same module is modified with the
gpu
modifier to use the correct product in a GPU wf. Example is :-Given that we want to run this at HLT, I wanted to understand if we can have a
SwitchProducer
mechanism forDQMEDAnalyzer
so that we can do something like this:-Can FrameWork experts give some guidance on this?
@arossi83 @mmusich @tsusa @connorpa
The text was updated successfully, but these errors were encountered: