Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New functionality in BP5 file writer, what do we name the API that triggers it? #3060

Closed
eisenhauer opened this issue Feb 17, 2022 · 1 comment

Comments

@eisenhauer
Copy link
Member

The BP5 File engine has some new functionality and we need to decide how to expose it to the user. It has some similarities, conceptually or otherwise, to existing Writer-side engine functions PerformPuts() and Flush(), so let me differentiate them a bit:

  • PerformPuts()
    • When can it happen: Only in the middle of a step
    • Collective or not? NOT
    • What it sounds like it does: Move data to disk
    • What it really does: copy data in Put(Deferred) to internal buffers, as if you had done Put(Sync) instead
    • This isn’t a very valuable function in BP3/4, just triggering a copy that would happen eventually anyway. It was introduced at least in to provide symmetry to PerformGets().
    • BP5 comments – currently this either does nothing (data is already in buffers) or is a terrible idea (the only data not in buffers isn't there because it's huge and BP5 could write it directly)
  • Flush()
    • When can it happen: Only between steps
    • Collective or not? Collective
    • What it sounds like it does: Move data to disk, whenever it’s called
    • What it really does: Ensure that any queued whole timesteps (like in temporal aggregation) get written to disk
    • BP5 comments – Currently this is a no-op because BP5 doesn’t do temporal aggregation, so there’s never anything to flush
  • New call
    • New BP5 functionality that can push already-Put() data to disk in the middle of a step, reducing memory necessary for ADIOS buffering (and as a side effect making it possible to reuse memory from a Put(Deferred))
    • Collective or not? Collective
    • This only writes actual user data to storage. It does not push out any metadata, so that partial data isn’t accessible (no partial timesteps!), it’s just pushed to disk and doesn’t have to take up space in memory. This call likely cannot have any effect on streaming (which only works with whole timesteps), so it’s really a file-only functionality.

Since the new functionality has some of the same user visible effects (allowing buffer reuse) as the old BP3/4 PerformPuts() functionality, does what PerformPuts() sounds like it should do, and the old functionality is at best useless, it’s tempting to make PerformPuts() trigger the new functionality in BP5. BUT, the new call has to be collective and the old PerformPuts was not, so exposing this as PerformPuts might break old code. Conceptually what we’re doing is a data Flush, but the existing Flush() is called at the wrong time, and does something different, so we probably need to introduce a new user-visible engine function to make this new functionality available. Given all this, what name do you like?

  • FlushData()
  • FlushIntermediateData()
  • PerformDataWrite()
  • WriteAdiosData()
  • WriteData()

Any other suggestions?

g

@eisenhauer
Copy link
Member Author

So, no opinions? @pnorbert, probably makes sense to get this in before we feature-freeze, so I'm going to just make the call myself if nobody speaks up. Personally, I'm leaning towards PerformDataWrite(). That keeps the "Perform" prefix for something that can happen inside the step and I think is sufficiently clear as to what it's doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant