Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save to file after each event #58

Merged
merged 11 commits into from
Apr 1, 2022
Merged

Save to file after each event #58

merged 11 commits into from
Apr 1, 2022

Conversation

soleti
Copy link
Collaborator

@soleti soleti commented Mar 30, 2022

This PR changes the way we save the result of the simulation to file by doing it after each event, and not at the end of the full simulation. Fixes issue #57, but it's slightly less efficient, since it has to copy from the GPU memory after each event.

@soleti
Copy link
Collaborator Author

soleti commented Mar 30, 2022

@peter-madigan for some reason I can't add you as a reviewer but I would appreciate if you could take a quick look.

@chenel
Copy link

chenel commented Mar 30, 2022

Unfortunately the sometimes-empty events of the official "ND-LAr+TMS" simulation seem to be causing a problem here:

  File "cli/simulate_pixels.py", line 417, in <module>
    fire.Fire(run_simulation)
  File "/usr/local/lib/python3.8/dist-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.8/dist-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.8/dist-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "cli/simulate_pixels.py", line 380, in run_simulation
    event_id_list_batch = np.concatenate(event_id_list, axis=0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate```

@soleti
Copy link
Collaborator Author

soleti commented Mar 30, 2022

Can you try now @chenel?

@chenel
Copy link

chenel commented Mar 31, 2022

Progress! I now get through the first 262 events of my ~10K event sample. Unfortunately I'm running out of memory now on my ~11GB VRAM GPU. :(

I'm going to try on a machine with a better GPU (more VRAM), but I post this here just in case it is evidence something else might be wrong...

@chenel
Copy link

chenel commented Mar 31, 2022

sad panda. about 30% through file (event 2634/8581):

Traceback (most recent call last):                                              
  File "cli/simulate_pixels.py", line 418, in <module>
    fire.Fire(run_simulation)
  File "/usr/local/lib/python3.8/dist-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.8/dist-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.8/dist-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "cli/simulate_pixels.py", line 387, in run_simulation
    _, _, last_time = fee.export_to_hdf5(event_id_list_batch,
  File "/gpfs/slac/staas/fs1/g/neutrino/jwolcott/app/larnd-sim/larndsim/fee.py", line 207, in export_to_hdf5
    io_group = detector.MODULE_TO_IO_GROUPS[module_id][io_group-1]
KeyError: 0

I don't see any other output for this particular event.

@peter-madigan
Copy link
Member

Sorry for the slow response - I don't have my computer with me this week, but I'll take a look as soon as I'm back.

@soleti
Copy link
Collaborator Author

soleti commented Mar 31, 2022

@chenel can you send me the path of your input file? when it crashes, does the file contains the events simulated so far?

@chenel
Copy link

chenel commented Mar 31, 2022

(for the record, file was sent via Slack. there is an output file, which is generally healthy, but it's missing the tracks product. apparently that's still being saved at the end.)

@soleti
Copy link
Collaborator Author

soleti commented Mar 31, 2022

Ok there was a missing check in the pixel finding algorithm. Now it should work, let me know if it doesn't.

@chenel
Copy link

chenel commented Mar 31, 2022

I'll set a test running.

@chenel
Copy link

chenel commented Apr 1, 2022

So close!

Simulating events...: 100%|███████████████| 8581/8581 [2:08:07<00:00,  1.12it/s]
Traceback (most recent call last):                                              
  File "cli/simulate_pixels.py", line 413, in <module>
    fire.Fire(run_simulation)
  File "/usr/local/lib/python3.8/dist-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.8/dist-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.8/dist-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "cli/simulate_pixels.py", line 404, in run_simulation
    output_file['configs'].attrs['pixel_layout'] = pixel_layout
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/usr/local/lib/python3.8/dist-packages/h5py/_hl/group.py", line 288, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
ValueError: Invalid location identifier (invalid location identifier)

Did I miss updating something somehow?

@soleti
Copy link
Collaborator Author

soleti commented Apr 1, 2022

Oops, I forgot to open the file before writing a config, now it should work 🤞

@soleti soleti self-assigned this Apr 1, 2022
@chenel
Copy link

chenel commented Apr 1, 2022

Victory at last! Finished successfully and file seems to be healthy. 🎉
(I don't understand why there are 21616 packets with packet_type of 7---I thought this were supposed to be event boundaries only?---given there are only 10K events in the edep-sim file, but unless it's likely to be evidence that something went wrong in saving, we can move the discussion elsewhere.)

@soleti
Copy link
Collaborator Author

soleti commented Apr 1, 2022

Those are trigger packets, not just event dividers, you can have more than one per event. I'll merge this and eventually investigate more.

@soleti soleti merged commit 30bb776 into DUNE:master Apr 1, 2022
@soleti soleti deleted the stream branch April 1, 2022 21:04
soleti added a commit that referenced this pull request Apr 27, 2022
Save to file after each event
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants