Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mesh: Remove Data Order #194

Merged
merged 1 commit into from
Nov 13, 2018
Merged

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Apr 24, 2018

The data order for indices arises naturally from its flattened memory layout.
This also holds true for logical access to the data.

This affects the base standard attributes:

  • axisLabels
  • gridSpacing
  • gridGlobalOffset
  • shape of constant record components

and the ED-PIC extension attributes:

  • fieldBoundary
  • particleBoundary

Implements issues: #125 and #129

Description

dataOrder is superfluous.

By defining 1D-flattened dimensions like axisLabel are ordered by slowest to fastest varying index one can omit dataOrder. Slowest to fastest varying index here only means the logical flattened memory layout that one assumes for this data format.

This is possible since one does not care how data was written but just about how to label/name indices when reading data. E.g., one would increase from fastest to slowest dimension for all flattened descriptions over dimensions to read data efficiently without seeking.

Let's have a C-like to Fortran-like example

A C-matrix A[i,j,k] is written with i index standing for "x", j for "y" and k for "z". The associated axisLabels is in the index order k-j-i (slowest is first) which is ("x", "y", "z").

A Fortran-reader reads this matrix as M(a,b,c) with the index data order a-b-c (c is varies slowest). The axisLabels attribute reads ("x", "y", "z"). So one will assign the fastest varying index c the label "z" , b the label "y" and the fastest varying index a an "x". Index order and index label/name conserved - hurray! ✨

Affected Components

  • base
  • EXT: ED-PIC

Logic Changes

Before one had to to write the access-index order of the writing code. Now one writes the labels in the order "slowest-to-fastest varying index".

This does not change C-style, but Fortran-style access.

Writer Changes

How does this change affect data writers?

Yes, Fortran-style writers need to invert their order now since the fastest varying index is the last.
C-style writers (C/C++/Python/Java/...) can just omit the dataOrder attribute now.

Unaffected besides the now removed attribute:

Does this pull request change the interpretation of existing data writers?

Fortran/Matlab/R/Julia codes: ...

Reader Changes

How does this change affect data readers?

Yes, all readers can now just label/name flattened dimension accesses to e.g. labels of indices in the order "slowest-to-fastest index".

What would a reader need to change? Link implementation examples!

Data Converter

Mesh records that declare dataOrder='F' need to invert the attributes mentioned above.

The unused dataOrder attribute should be removed as well.

@ax3l ax3l added the major change non-backwards compatible change label Apr 24, 2018
@ax3l ax3l requested review from RemiLehe and C0nsultant April 24, 2018 13:51
Copy link
Member

@C0nsultant C0nsultant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplifying the standard by integrating dataOrder into axisLabels improves understandability a lot. The user will have to be aware of this order, but now ambiguities that reference axisLabels (but not dataOrder) are resolved.

STANDARD.md Outdated
same definition)
- description: this attribute assigns for a matrix `A_{i,j,k}`
human-readible namings for the indices `i`, `j` and `k`
- advice to implementors: dimensions shall be ordered from fastest to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clarification really makes the PR valuable.

@mccoys
Copy link

mccoys commented Apr 25, 2018

If I understand correctly, C-order F-order data will not be actually modified, but only the following attributes must have the order of their elements reversed when writing from Fortran?

  • axisLabels
  • gridSpacing
  • gridGlobalOffset
  • shape of constant record components
  • fieldBoundary
  • particleBoundary

@ax3l
Copy link
Member Author

ax3l commented Apr 25, 2018

@mccoys from a C/C++/Python writer perspective (Smilei), yes just invert the order of those attributes when writing openPMD 2.0

update: we changed the order from originally proposed fastest-to-slowest to slowest-to-fastest.

From a Fortran writer perspective, yes just invert the order of those attributes when writing openPMD 2.0
From a C/C++/Python writer perspective everything stays the same.

@ax3l ax3l added the needs decision solutions emerged but we did not decide on the final solution yet label May 15, 2018
@ax3l
Copy link
Member Author

ax3l commented May 15, 2018

There is also an alternative solution to this change, ordering from slowest-to-fastest varying index.

With that the existing Fortran-written files would need to change their index and we would generally have everything in C-like order. Due to the dominance of Py/C++ in our code bases, one could decide for that solution and have more consistency with e.g. numpy attributes >:-)

STANDARD.md Outdated
information if you need to invert the access to
`axisLabels` (and other attributes that use the
same definition)
- description: this attribute assigns for a matrix `A_{i,j,k}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify the meaning of assigns for, or maybe use some other phrasing?

Copy link
Member

@RemiLehe RemiLehe May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibility is

- description: human-reading labels for the `N` axes of the mesh,
ordered *from fastest to slowest varying index when accessing the
mesh contiguously (as 1D flattened logical memory)`

Copy link
Member Author

@ax3l ax3l May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I wanted to write assigns human-readible namings for the indices ... to a matrix ...

@ax3l
Copy link
Member Author

ax3l commented May 15, 2018

@RemiLehe is it ok for you as well if I change the order to slowest to fastest varying index? :)

VC: ok with all.

@ax3l ax3l force-pushed the topic-removeDataOrder branch from c8bc604 to 08a6d76 Compare May 15, 2018 17:02
@ax3l
Copy link
Member Author

ax3l commented May 15, 2018

all right, I updated the PR and PR description! :)

The data order for indices arises naturally from its flattened memory layout.
This also holds true for logical access to the data.
@ax3l ax3l force-pushed the topic-removeDataOrder branch from 08a6d76 to a6a5b53 Compare May 15, 2018 17:20
STANDARD.md Outdated
a matrix `A[i,j,k]` will have its first index
as the slowest varying index (e.g. `i`);
- if you access a ND array Fortran-like,
a matrix `A[i,j,k]` will have its last index
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortran spells arrays A(i,j,k)

STANDARD.md Outdated
its slowest varying index name and `axisLabels`: `("z", "y", "x")`
- 3D `cartesian` mesh accessed in C-like as `A[x,y,z]` will have `x` as
its slowest varying index name and `axisLabels`: `("x", "y", "z")`
- 2D `cartesian` mesh accessed in Fortran-like as `A[x,y]` will have `y` as
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A(x,y)

STANDARD.md Outdated
its slowest varying index name and `axisLabels`: `("x", "y", "z")`
- 2D `cartesian` mesh accessed in Fortran-like as `A[x,y]` will have `y` as
its slowest varying index name and `axisLabels`: `("y", "x")`
- `thetaMode` accessed Fortran-like `A[r,z]`, `axisLabels`: `("z", "r")`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A(r,z)

@ax3l ax3l removed the needs decision solutions emerged but we did not decide on the final solution yet label May 15, 2018
@ax3l
Copy link
Member Author

ax3l commented Nov 8, 2018

@RemiLehe this PR should be ready to merge. Validator and Updater are up-to-date now :)

@RemiLehe RemiLehe merged commit b5363b7 into openPMD:upcoming-2.0.0 Nov 13, 2018
@ax3l ax3l deleted the topic-removeDataOrder branch November 13, 2018 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major change non-backwards compatible change
Projects
Status: Implemented
Development

Successfully merging this pull request may close these issues.

4 participants