Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some BP5 Serialization documentation, mostly writer-side perspective. #4372

Merged
merged 1 commit into from
Oct 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
258 changes: 258 additions & 0 deletions source/adios2/toolkit/format/bp5/BP5Base.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,264 @@
#pragma warning(disable : 4250)
#endif

/*
* BP5 Metadata Marshalling is based upon FFS, which provides the
* ability to serialize a C-style pointer-based data structure
* (starting with a base struct) and to deserialize it in-place on
* the receiving side.
*
* Normally, in order to use FFS, an application must fully describe
* the base structure using an FMFieldList, where each element
* describes a field in the structure, including the field's name,
* basic type (integer, float, etc.), size and offset from the start
* of the structure. In "normal" scenarios, like in SST this is
* straightforward because we're describing a structure that exists
* at compile-time and all of those things are compile-time static.
* However, ADIOS metadata represents information about variables
* that we don't know about until run-time, so if we're going to use
* FFS here, things have to be a bit more dynamic. In particular,
* we'll represent ADIOS metadata with a "virtual" structure, one
* whose description we'll construct on the fly and which will only
* ever exist virtually, making up offsets as we go. We just have to
* be careful about keeping things aligned appropriately because we
* want this to land on the receiver and be appropriately aligned
* there. (Normally the compiler takes care of this, but this
* virtual structure is never seen by a compiler, so we're doing it.)
* The field name that we specify to FFS is also important because we
* use it to communicate a lot of information between writer and
* reader. While it always contains the variable name, it also
* encodes the variable type (local or global, atomic or array,
* compressed, derived, etc.). Because the variable name only
* appears in the metametadata (ffs format), this is a great place to
* put more static information about the variable, specifically
* anything that is fixed after definition and doesn't change on a
* per-timestep basis. More on names later.
*
* To accomplish managing the structure on the writer side, we
* principally track two things, the FMFieldList that represents the
* description of the virtual struct, and a malloc'd region where we
* build the virtual struct itself. While the description is
* interpreted by FFS, the most important thing for BP5 to remember
* is this field's offset because that's where the (meta)data will
* go. When we Marshal a simple atomic value (local or global), we
* calculate an appropriately aligned new offset in the buffer, add
* to the FMFieldList (maintained in Info.MetaFields on the writer)
* and copy the data into the virtual field at that offset in the
* buffer. On future timesteps, the field already exists, so we just
* use the offset and copy the data into the buffer. Arrays are a
* bit more complex, but lets start with the simple case. FFS
* supports substructures, I.E. fields which themselves are a
* structure and we use that feature for all array representations.
* There are several things that may change on a per-timestep basis
* for arrays, including Shape, Count and Offset values (which are
* themselves arrays), and we also need to track the location of the
* related data block (offset in this rank's data segment). Except
* for Shape (which we assume is set for at least this timestep), all
* of these things are per-block.
*
* Back to FFS capabilities for a moment. FFS's pointer-based
* structures include dynamically-sized arrays, and the size of those
* arrays must be specified by an integer-typed field in that
* structure. There are three different array lengths required here.
* Shape is of length Dims (how many dimensions the array has),
* DataBlockLocation is of length BlockCount (how many blocks were
* written on this rank), and for Count and Offsets we must have
* those per-block, so the length is Dims*BlockCount. To satisfy
* FFS's constraints, that means we must have integer fields
* representing all three lengths in the array metadata struct, and
* we need pointers to the dynamic arrays representing Shape, Count,
* Offsets, and DataBlockLocation. These are the BASE_FIELDS below
* and the FFS FMField entries are BASE_FIELD_ENTRIES in BP5Base.cpp.
* While more complex arrays metadata entries are necessary, these
* must be the first fields in those structures. While there can't
* be a static struct declaration for all of the metadata, there is a
* static declaration for the array metadata substructure,
* MetaArrayRec below. Mostly you'll see this used like this:
*
* MetaArrayRec *MetaEntry = (MetaArrayRec *)((char *)(MetadataBuf) + Rec->MetaOffset);
*
* This gives us a nice way of accessing the key fields in an array's
* metadata entry.
*
* So, what about more complex arrays? All of our compression
* operators require the length of the encrypted field as input to
* the uncompress operator. Generally we don't include data block
* length as part of metadata because it's easily calculated from the
* Count values and the length of the data type, but in order to
* support compression we have to communicate it from the writer to
* the reader so we can uncompress. Therefore every field with an
* operator has as its next field (after BASE_FIELDS) DataBlockSize.
* Like DataBlockLocation, this is per block (and so it's FFS
* description also uses BlockCount). This arrangement is
* represented by the struct MetaArrayRecOperator below. Note that
* BP5 does not itself use the DataBlockSize in the metadata. The
* size of the compressed data is returned from the compression
* operator, and is used by BP5 to copy that data into the data
* block, but after that it is only passed to the Uncompress operator
* on the receiving side, so operators like MGard may choose to use
* this differently.
*
* The last case is arrays that also have Min/Max stats associated
* with them. Since this can be combined with operators, that gives
* us two more possible structs for array metadata, a plain array
* with Min/Max or an array with an operator and Min/Max, these are
* represented by the structs MetaArrayRecMM and
* MetaArrayRecOperatorMM below. Note that MinMax in that struct is
* a char*, but obviously the data type of Min/Max depends upon the
* element type of the array. How does that work? The actual size
* in bytes of the MinMax array is BlockCount * sizeof(array element)
* * 2, but in order to avoid introducing yet another integer-typed
* size value into the structure we've gone to some effort in order
* to leverage the existing BlockCount value. In particular, there
* are a number of FMField lists for The MM and OperatorMM arrays,
* each giving FFS a different element size for the MinMax Array.
* ADIOS types of size 1 use MetarrayRecMM1List, those of size 2 use
* MetaArrayRecMM2List, etc., up to MetaArrayRecMM16List, which would
* be used by long double. Note that BP5 doesn't define or support
* MinMax for string, complex, or structure types.
*
* For each of the array variations above, when we add the field
* associated with that array to the metadata field list, we specify
* the appropriate FieldList in the FFS "field_type" value, and
* allocate space for the relevant structure in the virtual metadata
* struct we're building.
*
* We mentioned field names above, we actually encode a lot of
* information into the FFS field names, including the variable name,
* shape, element_size, ADIOS type, any operator that might be
* applied, the name of the substructure (if the array is a struct
* type), and even the expression that is to be used for derived
* variables. These are all encoded in different ways, for example
* the basic shape of the variable is encoded in the three letter
* prefix of the FFS fieldname: GlobalValue: = "BPg", GlobalArray =
* "BPG"JoinedArray = "BPJ", LocalValue = "BPl", LocalArray = "BPL".
* The details of the encoding are buried in the logic, but important
* bit is knowing that there's a lot of information there and some of
* it (like the expression) is base64 encoded to avoid having special
* characters in the FFS field name. From the BP5 point of view,
* anything that can be encoded in the field name is a good thing
* because it travels in the metametadata, not the metadata, so it
* only gets moved around if the field set changes.
*
* Speaking of changes, there are some details that are omitted above
* to get the main points across, but lets talk about other details.
* First, when you put a first block of an array, we fill out the
* Dims field, init BlockCount to 1, DBCount (the Dims*BlockCount
* value) to Dims and then we malloc memory to hold a copy of the
* Shape, Count and Offset values. (We need to copy these anyway as
* part of serialization as they must be captured at the time of Put,
* so we can't, say, just reference the values in the VariableBase
* class.) For LocalArrays, the Shape value stays at a NULL pointer,
* as does the Start value. If after the first there's another Put()
* on that variable, we add 1 to BlockCount, increment DBCount by
* Dims, and realloc() the Count and Offset arrays so that we can add
* the new Count and Offset values after the ones that are already
* there. This means that the Count values for block 1 start at
* Count[Dims], for block 2 they start at Count[2*Dims], etc. At the
* end of the timestep after using FFSencode() to serialize the
* metadata, FMfree_var_rec_elements() is used to free() all these
* subarrays that we've malloc'd. It understands the structure of
* our entire Metadata structure, walks the field list and
* deallocates appropriately. Once this has been done, we can
* memset() the whole metadata structure back to zeros and we're
* ready to start again. (All pointers NULL and counts are zero.)
*
* When we do start again with the next timestep, we don't start from
* scratch with a new Fieldlist and virtual structure, but instead
* try to reuse the old one. The anticipation is that step-based HPC
* applications are highly regular and the set of variables that are
* output on step N+1 are likely the same as what they output for
* step N. So when we get a Put() for a variable, we look up it's
* entry in internal bookkeeping and if it has an entry in the
* structure we reuse it, putting the appropriate data in the virtual
* structure as described above. This is fine if we write the exact
* same set of variables in subsequent steps, but what if we don't?
* Well, if we write a new variable, then the procedure above
* happens, but we also take steps to make sure that we generate new
* MetaMetaData (I.E. re-register the format with FFS). We do this
* by setting the Info.MetaFormat value to NULL.
*
* Handling a non-written variable is done differently. We don't
* really want to bear the cost of new MetaMetaData frequently
* (because MetaMetaData can be big), so instead we're willing to
* bear the costs of not using some of the data in the virtual
* structure. So if the app Puts an atomic variable on timestep N,
* but skips it on N+1, we essentially leave that fraction of the
* metadata buffer unused in N+1. It's transmitted or stored, but it
* doesn't contain anything useful. But the reader still needs to
* know that it wasn't written, so BP5 metadata carries with it a
* bitmap showing if a variable that is part of the metadata has
* actually been written and is valid. This bitmap, contained in the
* BitField[BitFieldCount] fields in the MetadataFieldList is the
* ultimate authority as to what has been written. Variables are
* assigned an index in order when they are first entered into
* metadata and if the bit at that index isn't set, that variable
* wasn't written on that timestep.
*
* Now, this does bring up a vulnerability with BP5. If an
* application were to write a lot of variables on one step and then
* never use them again, we might end up with a big metadata block
* that mostly carried unused (junk) bytes. We have not yet run into
* this in a real application, so it isn't specifically handled. In
* an ideal world, one would look at the "occcupancy rate" of
* metadata in EndStep() and make a decision that for either this
* timestep or the next, we'd start from scratch with an empty field
* list. There's a tradeoff here. Do this too often and we've got
* big MetaMetadata costs, do it too little and our metadata has a
* lot of useless bytes. Future work. Note that this is mostly a
* writer-side thing to fix/optimize. The reader will appropriately
* handle new metadata, including new metametadata.
*
* The stuff above applies to ADIOS variables, but attributes are
* always handled separately. In the initial FFS-marshalling
* implementation, Attributes, while separate, were handled very
* similarly to variables. That is, there was a field list and
* virtual structure maintained where we entered attributes much like
* Global and local values are described above. There was a
* metametadata generated it it and it was moved around like other
* metametadata blocks. This old way of doing things is still
* present in the code and gets used if MarshalAttribute is called by
* the engine. Engines that use this marshall all attributes in
* Endstep(), calling MarshalAttribute for all attributes and only
* doing this when some attribute has changed. The resulting
* Attribute data always contains *all* the current attribute values,
* a situation that works out well for engines like SST where readers
* might join after timestep 0. The SST writer can save the most
* recent Attribute data block and provide it to a newly-joined
* reader so that it has all available attributes.
*
* However, this encoding mechanism has some significant
* disadvantages under almost all situations. This separation of
* metametadata and metadata was designed for Variables, where the
* set of variables was likely to be reused without changes
* repeatedly. However, attributes aren't like that, particularly in
* the original situation where attributes once set can never change.
* Then we're only doing this when we add an attribute, we're always
* generating new MetaMetadata whenever we have a change, and
* MetaMetadata + Metadata size is always going to be bigger than
* some simpler encoding mechanism. So, BP5 file engine now does
* things differently. It calls OnetimeMarshalAttribute() which uses
* a simpler FFS representation for attributes with the attribute
* "name" being part of the data, not part of the metametadata as it
* is with variables. This means that the metametadata never
* changes, so we don't have the same issues as with the prior
* approach. That metametadata struct (BP5AttrStruct) describes a
* relatively simple structure with two lists, one for attributes of
* any non-string type, and the other a list of string and
* array-of-string attributes. Generally we only want attributes to
* appear here when they change, so the BP5Writer calls
* OnetimeMarshlAttribute whenever it gets the NotifyEngineAttribute
* call (whenever an attribute changes). However it also gets called
* in BeginStep if that step is the first every called, because some
* attributes may have been defined before the engine was ever
* created. In BP5 file, attribute blocks then only every contain an
* attribute once, unless the attribute changes in which case it will
* appear again. This is not such a good situation for SST because
* of the late-coming-reader issue, so that still uses the old
* marshaling mechanism.
*
*/

namespace adios2
{
namespace format
Expand Down