Reading Fortran data using python interface #1665

germasch · 2019-08-09T15:49:35Z

When reading data written by a Fortran code using python, indices end up being reversed:

Sample Fortran code (it's based on the included heatmap test)

program TestBPWriteReadHeatMap2D
  use mpi
  use adios2

  implicit none

  type(adios2_adios) :: adios
  type(adios2_io) :: ioPut
  type(adios2_engine) :: bpWriter
  type(adios2_variable) :: var

  real(kind=8) :: temperature(0:4, 0:9)

  integer(kind=8), parameter :: ishape(2) = (/5, 10/)
  integer(kind=8), parameter :: istart(2) = (/0, 0/)
  integer :: ierr

  call mpi_init(ierr)
  call adios2_init(adios, MPI_COMM_WORLD, adios2_debug_mode_on, ierr)
  
  call random_number(temperature)
  print*, 'sample temperature(2,8) =', temperature(2, 8)

  call adios2_declare_io(ioPut, adios, 'HeatMapWrite', ierr)
  call adios2_define_variable(var, ioPut, 'temperature', adios2_type_dp, &
                              2, ishape, istart, ishape, &
                              adios2_constant_dims, ierr)
  call adios2_open(bpWriter, ioPut, 'HeatMap2D_f.bp', adios2_mode_write, &
                   ierr)
  call adios2_put(bpWriter, var, temperature, ierr)
  call adios2_close(bpWriter, ierr)

  call adios2_finalize(adios, ierr)
  call mpi_finalize(ierr)

end program TestBPWriteReadHeatMap2D

Output:

[kai@macbook build-fortran (pr/fix-mgard *)]$ bin/test_fortran
 sample temperature(2,8) =  0.13062623044039723

Reading the bp file from Python:

import adios2
fh = adios2.open("/Users/kai/build/ADIOS2/build-fortran/HeatMap2D_f.bp", "r")
t = fh.read('temperature')
print(t[2, 8])

gives

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-cf0fcf0017a8> in <module>
      2 fh = adios2.open("/Users/kai/build/ADIOS2/build-fortran/HeatMap2D_f.bp", "r")
      3 t = fh.read('temperature')
----> 4 print(t[2, 8])

IndexError: index 8 is out of bounds for axis 1 with size 5

To access the Fortran data element t(2, 8) from Python, one has to reverse the index order:

>>> print(t[8, 2])
0.13062623044039723

This is related to #1661 , but actually fixable without API additions.

The text was updated successfully, but these errors were encountered:

williamfgc · 2019-08-09T17:06:30Z

The way Python APIs are designed is to follow bpls. Rarely data users will look into the source code that generate the files (in fact, it could be some commit somewhere). bpls is assumed to be the starting point or a Python inspection function. Our ecosystem tools are natural to those languages (C++, Python). Changing this would imply changing bpls as well.

germasch · 2019-08-09T17:59:01Z

It'd certainly be nice to figure out the dimensions of a dataset without looking at the source code that has written it, but unfortunately, that's exactly what's currently not possible. If I give you the bp file that the sample code created, how can you tell whether it's a 10x5 or a 5x10 matrix? You can use bpls, and it'll tell you 10x5, but that's only true if you know that the code that wrote it is was C or Python. If it was written by Fortran or Matlab, it was actually a 5x10 matrix, even though bpls says differently. And that's easily fixable. At the minimum, bpls could say

double   temperature  {10, 5} /* transposed from original {5, 10} */

though I think it'd be better if bpls showed it correctly (ie, what it was when written, not transposed).

Python could (and should, IMO) just load it as a 5x10 dataset, because that's what was written, no performance penalty. If someone runs a C program to create a 512x128 image and gives it to you, you'll get a 512x128 image in python, as you'd expect. If someone else creates their images in Fortran and gives you their data file, python/adios2 will present it as a 128x512 image. Sure, from your view that's correct, because you define correct as "what bpls shows" rather than what the original dataset was, but that's probably not what most people expect. People in general don't really care about thinking about row-major vs col-major, transposing dims and indices or whatever. What they care about is that when they write a MxN dataset and read it back they get a MxN dataset, no matter how it's stored on either end or in the file.

You can google "hdf5 transpose" or "hdf5 reverse dims" and you'll easily find 100s of people being confused by their design. I think one should learn from that.

williamfgc · 2019-08-09T18:25:10Z

Sure, from your view that's correct, because you define correct as "what bpls shows" rather than what the original dataset was, but that's probably not what most people expect.

adios2 preserves fast and slow index information, on top of that the C++ read API tells you if the dimensions were reversed. I'd argue that most people would expect handling numpy array as row-major that's why it's their default, the order options are very confusing, order : {‘K’, ‘A’, ‘C’, ‘F’} I have to admit.

BTW, I am not denying it's confusing (it's since underlying languages are different). I am trying examples in #1661 to understand better the interactions.

pnorbert · 2019-08-09T18:48:28Z

So this particular thread's suggestion is that we should not reverse dimensions automatically because, "What they care about is that when they write a MxN dataset and read it back they get a MxN dataset, no matter how it's stored on either end or in the file."

In adios design, in general since the beginning, we consciously did not want to allow for any feature that slows down adios all the time for the sake of some user who wants that feature. For this case, we have a fast (and maybe ugly) way of reading data. And the rules have been fairly simple: you write in Fortran/Matlab and read in C/python or vice versa, you see your dimensions reversed.

The suggestion that we should represent the dimensions in all languages as it was written goes against our design philosophy. Scalable performance comes before comfort requirements.

So this is for this thread. However, the request and discussion is there to allow for defining the layout and querying the layout, which will need some changes and that discussion is going on in the other thread. That will allow more control for those users who need that control.

germasch · 2019-08-09T18:50:32Z

So what I'm proposing is actually like 'K', that is, preserve the original order (on disk / in the BP3 stream). If the data was created in Fortran, it was written in col-major and would be read and presented as col major. And vice versa.

germasch · 2019-08-09T19:01:23Z

@pnorbert: What I'm proposing is to not reverse the dimensions if the language supports both data layouts. I'm not proposing to transpose data ever, because it's too expensive (and because it's a change in behavior).

So if you write row-major data from C and read it in Fortran, which only supports col-major, the only option is to reverse dims, and one just has to live with that.

If you write col-major data from Fortran and read it in Python, though, you can make a ndarray that points to the original (untransposed) data and tell python that it's in column major order, so you get the data back with the original index order. That alleviates a lot of confusion when using the python API as a user. Python's default layout is 'K', which is whatever layout the data is already in. In fact, python doesn't even have a function to query the underlying data layout from an ndarray. You can use .strides to draw your own conclusions, though it could be row-major, col-major or many other things.

For C++, nothing would change by default. But querying the underlying layout, a program like bpls could go and say "I'm presenting the data layout as if it were row-major" because I've been written in C++, but I'll let you know that it's actually transposed from what it originally was. Or it could actually still read it back as row-major, but transpose it when displaying it. Because people who use bpls probably don't care about what language bpls is written in, they care about the data that's in the bp file.

williamfgc · 2019-08-09T19:02:02Z

@pnorbert thanks for explanation. Yeah, I am trying to understand in #1661 the different options for @germasch proposed APIs. I am taking this as an opportunity to see if we have an opportunity (pun intended). First starting to understand how well-established libraries handle this, basically boost, numpy order, Eigen.

germasch · 2019-08-09T19:16:52Z

I'd add xtensor and kokkos to the list. I'm not familiar with the boost types, but I believe all the other ones support both row-major and col-major.

pnorbert · 2019-08-09T19:56:31Z

Okay, I see. I am a backward C programmer who assumed python and hence numpy was a row-major language. Now I accept hardcore Numpy users may want to use it more flexibly.

So how do we make this simpler with the API just for Python, simpler than for C++ in the other thread? You need to inquire the adios variable first to know the layout in the input, then allocate a numpy array (with your chosen layout) and then read the data with adios.

In the high-level API, in a serial python script, an extra flag in f.read() could do the trick:

fh = adios2.open("/Users/kai/build/ADIOS2/build-fortran/HeatMap2D_f.bp", "r")
t = fh.read('temperature', layout='K')

Maybe we could get the user intention in a layout flag in adios2.open()?

germasch · 2019-08-09T20:23:24Z

Yes,

t = fh.read('temperature', layout='K')

is exactly what I was going to propose except that I'd call it order='K' for consistency with numpy.

I think there's an open issue about what the default should be, because I think the natural thing would be K as that's what people who don't think about layouts would naturally expect. But it's a change from current behavior, which I suppose would be specified as order='C'. And changing behavior is not a good thing...

williamfgc · 2019-08-09T20:46:22Z

@germasch agree, I'd stick with numpy's convention, so users (and I mean users, not application developers) don't have to add to their learning curve.

williamfgc · 2019-08-09T20:51:51Z

Maybe we could get the user intention in a layout flag in adios2.open()?

@pnorbert adding a stream-level setting is a good suggestion, the high-level Python API allows for a call to set_parameter (open is deferred until the first read).

fh = adios2.open("/Users/kai/build/ADIOS2/build-fortran/HeatMap2D_f.bp", "r")
f.set_parameter('order', 'K')
t = fh.read('temperature')

pnorbert · 2019-08-09T21:06:06Z

@williamfgc What about the low-level API? I tried to address both with a flag in open().

williamfgc · 2019-08-09T21:33:39Z

@pnorbert that's a good question and I don't have a good answer, there are a few options:

Use adios2::Operator of type "order" and use either IO.AddOperation or Variable.AddOperation for the granularity. This option won't change current APIs.
Use IO::SetParameter and Variable::SetOrder (this would be new).
Add IO::SetOrder (this would be new) and Variable::SetOrder (this would be new).

germasch mentioned this issue Aug 14, 2019

WIP: return fortran arrays in original order in python high-level interface #1681

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading Fortran data using python interface #1665

Reading Fortran data using python interface #1665

germasch commented Aug 9, 2019

williamfgc commented Aug 9, 2019

germasch commented Aug 9, 2019

williamfgc commented Aug 9, 2019 •

edited

Loading

pnorbert commented Aug 9, 2019

germasch commented Aug 9, 2019

germasch commented Aug 9, 2019

williamfgc commented Aug 9, 2019 •

edited

Loading

germasch commented Aug 9, 2019

pnorbert commented Aug 9, 2019

germasch commented Aug 9, 2019

williamfgc commented Aug 9, 2019 •

edited

Loading

williamfgc commented Aug 9, 2019 •

edited

Loading

pnorbert commented Aug 9, 2019

williamfgc commented Aug 9, 2019

Reading Fortran data using python interface #1665

Reading Fortran data using python interface #1665

Comments

germasch commented Aug 9, 2019

williamfgc commented Aug 9, 2019

germasch commented Aug 9, 2019

williamfgc commented Aug 9, 2019 • edited Loading

pnorbert commented Aug 9, 2019

germasch commented Aug 9, 2019

germasch commented Aug 9, 2019

williamfgc commented Aug 9, 2019 • edited Loading

germasch commented Aug 9, 2019

pnorbert commented Aug 9, 2019

germasch commented Aug 9, 2019

williamfgc commented Aug 9, 2019 • edited Loading

williamfgc commented Aug 9, 2019 • edited Loading

pnorbert commented Aug 9, 2019

williamfgc commented Aug 9, 2019

williamfgc commented Aug 9, 2019 •

edited

Loading

williamfgc commented Aug 9, 2019 •

edited

Loading

williamfgc commented Aug 9, 2019 •

edited

Loading

williamfgc commented Aug 9, 2019 •

edited

Loading