Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ranged properties #452

Closed
Changes from 27 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c18cf0c
Intermediate state adding ranged properties.
JPBergsma Dec 31, 2022
7dc50f2
Merge branch 'develop' into JPBergsma/Add_ranged_properties
JPBergsma Jan 3, 2023
6afcf9c
first draft for ranged properties.
JPBergsma Jan 6, 2023
1a55c2a
Merge branch 'develop' into JPBergsma/Add_ranged_properties
JPBergsma Jan 6, 2023
b597a67
Removed average, set, min and max fields for now as these become quit…
JPBergsma Jan 9, 2023
37db878
Small corrections.
JPBergsma Jan 9, 2023
a832751
Added how to treat missing values for requested range.
JPBergsma Jan 11, 2023
16aba07
changed description field
JPBergsma Jan 12, 2023
b1d69a8
Apply suggestions from code review
JPBergsma Jan 12, 2023
edbfc25
Merge branch 'Materials-Consortia:develop' into JPBergsma/Add_ranged_…
JPBergsma Jan 12, 2023
14de45d
Apply suggestions from code review Vaitkus
JPBergsma Jan 17, 2023
906db81
intermediate state from implementing code review.
JPBergsma Jan 17, 2023
1feb4a9
Merge branch 'JPBergsma/Add_ranged_properties' of https://github.com/…
JPBergsma Jan 17, 2023
a96dffe
Processed comments rartino.
JPBergsma Jan 18, 2023
15f599c
Small corrections.
JPBergsma Feb 15, 2023
73905dc
Added returned range property.
JPBergsma Feb 15, 2023
c6834f3
Added extra explanation values field.
JPBergsma Feb 16, 2023
b0cc94c
Apply suggestions from code review
JPBergsma Feb 17, 2023
0cee1e6
Merge branch 'develop' into JPBergsma/Add_ranged_properties
JPBergsma Mar 6, 2023
d7c8a9c
Processed comments rickard and a few more small improvements.
JPBergsma Mar 9, 2023
d1e8d74
Further changes after proof reading.
JPBergsma Mar 9, 2023
139c70e
further refinements.
JPBergsma Mar 9, 2023
916d6f2
Changed 'n_' to 'n' for ranged metadata properties tio be consistent …
JPBergsma Mar 13, 2023
ee3651e
Changed wording of range_id field after suggestion Rickard.
JPBergsma Mar 13, 2023
1f794b6
placed subsequent sentences on seperate lines.
JPBergsma Mar 16, 2023
169a1f4
Processed points discussed with Rickard.
JPBergsma Mar 24, 2023
f513596
Added per entry next field + small corrections.
JPBergsma Mar 28, 2023
65b8ad1
Apply suggestions from code review
JPBergsma May 2, 2023
94db38c
Adjusted the description next fields for ranged properties.
JPBergsma May 2, 2023
033ea11
Corrected range name for _exmpl_ranged_thermostat.
JPBergsma May 25, 2023
3762d30
Added that querying on properties in the range dictionary is optional.
JPBergsma May 30, 2023
8332567
Specifically mention that support for queries directly on the values …
JPBergsma May 31, 2023
a87b301
Updated example to latest version metadata proposal and added more ex…
JPBergsma Jun 2, 2023
7e9b4f4
Improved explanation returned_range field.
JPBergsma Jun 2, 2023
4b298c5
Small corrections.
JPBergsma Jun 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
240 changes: 240 additions & 0 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ OPTIMADE API specification v1.2.0~develop

entry : names of type of resources, served via OPTIMADE, pertaining to data in a database.
property : data item that belongs to an entry.
ranged_property : A property that can be returned in pieces and that supports slicing.
JPBergsma marked this conversation as resolved.
Show resolved Hide resolved
val : value examples that properties can be.
:val: is ONLY used when referencing values of actual properties, i.e., information that belongs to the database.
type : data type of values.
Expand Down Expand Up @@ -67,6 +68,8 @@ OPTIMADE API specification v1.2.0~develop

.. role:: property(literal)

.. role:: ranged-property(literal)

JPBergsma marked this conversation as resolved.
Show resolved Hide resolved
.. role:: val(literal)

.. role:: type(literal)
Expand Down Expand Up @@ -442,6 +445,110 @@ For example, the following query can be sent to API implementations `exmpl1` and

:filter:`filter=_exmpl1_band_gap<2.0 OR _exmpl2_band_gap<2.5`


Ranged Properties
-----------------

Ranged properties are used for properties that are too large to be returned by default for every entry in a response.
The server can limit the size of the response and require that the client performs another query to retrieve the rest of the data.
They can also be used by the server to support slicing, so the client can request that only a subsection of the values needs to be returned.

- **Requirements/Conventions**:

- **Support**: OPTIONAL support in implementations.
- A ranged property can be recognized by the presence of the field :field:`range` in the metadata of the property, i.e. in the field: :field:`<property_name>_meta`.
- For a ranged property, the server MAY return :val:`null` or only a part of the values of the property under the field :field:`<property_name>`, so the size of the entries remains limited, and many entries can be returned in a single response.
In that case, a links object MUST be provided in the field :field:`<property_name>_meta.range.next` from which the next part of the property is returned.

The dictionary under :field:`<property_name>_meta.range` MUST include these fields.
JPBergsma marked this conversation as resolved.
Show resolved Hide resolved

- :field:`range_ids`: list of strings.
A list with an identifier for each dimension of the property.
If dimensions in two or more properties share the same :field:`range_id` those dimensions should be thought of as the same dimension.
For example, both the :property:`energy` and :property:`cartesian_site_positions` of a molecular dynamics trajectory share a range_id of :val:`frames`. This means that the energy at index x(in the dimension labelled by this range_id) belongs to the cartesian_site_positions at the same index x.

- :field:`indexable_dim`: list of strings.
The list of dimensions for which the client can request a subrange via the :query-param:`property_ranges` query parameter.

- :field:`data_range`: list of dictionaries.
This field describes how the values are distributed in the different dimensions.
It consists of a dictionary for each dimension.
This dictionary has the fields:

- :field:`name`: string.
The name of the dimension as given in the :field:`range_ids` field.

- :field:`start`: integer.
The index of the first value of the property in this dimension.
The indexing is 1 based, so the lowest value an index can have is 1.

- :field:`step`: integer.
If the values are regularly spaced in this dimension, this value indicates the difference in index between subsequent values.
If there is no regular spacing between the values, the value of this property MUST be :val:`null`.

- :field:`stop`: integer.
The index of the last value of this property in this dimension.

- :field:`contains_null`: boolean
Copy link
Contributor Author

@JPBergsma JPBergsma May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "contains_null" not be a more generic metadata property that could also apply to other non-ranged fields ?
Ps. This would also allow querying on whether a property contains_null, now that I have placed in the description that support for queries on the fields under "range" is optional.

This value indicates whether some of the values of the property are :val:`null`.
This may be the case when values are missing or if the data is sparse, but the server still wants to present the data as a regular array to enable slicing.
SHOULD be a queryable property with support for all mandatory filter features.

- :field:`dim_size`: list of integers.

The size of the range for each indexable dimension.
The order is the same as in the :field:`range_ids` field.

- :field:`nvalues`: integer.

The total number of values in the property.
SHOULD be a queryable property with support for all mandatory filter features.

- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had this trouble elsewhere in the specification and keep running into it. The reference to a JSON API links object in a section that is meant to be data-format agnostic doesn't really work. If someone is trying to represent the output data in HDF5 (apparently someone already implemented that...), what does a "JSON API links object" mean inside HDF5?

So, the easy solution (suggested below) is to skip the extra degrees of freedom provided by JSON API links objects when we have links appearing in data-agnostic output, and just say that it is a string with a URL. However, given how often we are running into this issue, I'm starting to wonder if we rather should extend the base OPTIMADE data types with URL, so that the JSON API output format section generally can say that when the OPTIMADE specification says that something is "a link", you are meant to put a JSON API links object there.

Suggested change
- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.
- :field:`next`: String.

Copy link
Contributor Author

@JPBergsma JPBergsma May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we define URLs in many places as a JSON API links object, like the home page field for providers in the meta field, for example. So it seems logical to do it here as well. I will change the description a bit, so it is more like the others.

I however do think that the values we assign are often not JSON API links objects. But instead, values that would qualify as members for JSON API links objects. The value of such a member can also be a string, so it should not be any more problematic to store as the dictionaries we currently have.

Should I make a separate PR to change JSON API links objects to something like a valid member for a JSON API links object elsewhere in the specification ?

Ps. I have written code which outputs the response in the hdf5 format. You can test it with https://optimade-trajectory-demo.matcloud.xyz/v1/structures?response_format=hdf5. Unfortunately, h5py does not support storing a list of dicts, so I have to convert this to a dict of dicts.


If there is still more data available for the property, this field will contain a URL pointing to the entry, to which this property belongs, which contains the next set of values for this property.
If all the data for this property has been returned, the value SHOULD be :val:`null`

- :field:`more_data_available`: boolean.

:field-val:`false` if all the values in the requested range have been returned, and :field-val:`true` if the returned values are incomplete.


If the :field:`<property_name>` contains data (i.e. it is not (null or an empty list)), the following additional properties can be present.
JPBergsma marked this conversation as resolved.
Show resolved Hide resolved
Querying is not relevant for these properties and SHOULD NOT be supported.

- :field:`nreturned_values`: integer

The number of values that have been returned.
This value SHOULD be present.

- :field:`indexes`: list of lists of integers.
If the values are not regularly spaced along the dimensions, this list holds the indexes for each value.
The order of the indexes must match the order in the field :field:`range_ids`.
MUST be present if any of the dimensions in the field :field:`data_ranges.step` has the value null.
i.e. when the values are not regularly distributed over the grid.
Otherwise, it SHOULD NOT be present.

- :field:`returned_range`: list of dictionaries.

The range covering the returned data.
It contains the same fields as the :field:`data_range` field.
In this case these fields, however, only apply to the returned values and not all the values of the property.
This field MUST be present.
For dimensions where the field :field:`data_range.step` is not defined, the value of the field :field:`returned_range.step` MUST match the stepsize as used in the query parameter :query_param:`property_ranges`.

In addition to these fields in the metadata, entries which support accessing data via the :query-param:`property\_ranges` query parameter SHOULD support per entry :field:`next` and :field:`more_data_availble` field to enable returning all the
JPBergsma marked this conversation as resolved.
Show resolved Hide resolved

- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- :field:`next`: a `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.
- :field:`next`: String.

If data is requested for multiple properties at the same time, but the total amount of data is too large to be returned in one response, this field contains a link from which the remainder of the data can be obtained.
Responses supplied via this next link MUST contain all the values for all the requested properties that lie within the returned range. The server May choose to return a range that is smaller than the requested range. In that case another next link SHOULD be provided.
If all the data for this property has been returned, the value SHOULD be :val:`null`

- :field:`more_data_available`: boolean.

:field-val:`false` if all the values in the requested range have been returned, and :field-val:`true` if the returned values are incomplete.


Responses
=========

Expand Down Expand Up @@ -697,6 +804,118 @@ An example of a full response:
]
}


- Several examples of how ranged properties can be returned in the JSON format:

.. code:: jsonc

{
"cartesian_site_positions": [[[2.36, 5.36, 9.56],[7.24, 3.58, 0.56],[8.12, 6.95, 4.56]],
[[2.38, 5.37, 9.56],[7.24, 3.57, 0.58],[8.11, 6.93, 4.58]],
[[2.39, 5.38, 9.55],[7.23, 3.57, 0.59],[8.10, 6.93, 4.57]]
// ...
],
"cartesian_site_positions_meta": {
"range": {
"range_ids": ["frames","particles","xyz"],
"indexable_dim": ["frames","particles","xyz"],
"data_range": [{
"name": "frames",
"start": 1,
"step": 1,
"stop": 200,
},{
"name": "particles",
"start": 1,
"step": 1,
"stop": 3,
},{
"name": "xyz",
"start": 1,
"step": 1,
"stop": 3,
}],
"dim_size": [200, 3, 3],
"nvalues": 1800,
"nreturned_values": 450,
"returned_range": [{
"name": "frames",
"start": 1,
"step": 2,
"stop": 100,
},{
"name": "particles",
"start": 1,
"step": 1,
"stop": 3,
},{
"name": "xyz",
"start": 1,
"step": 1,
"stop": 3,
}],
"contains_null": false,
"more_data_available": true,
"next": "https://example.com/optimade/v1/structures/id123456?response_fields=cartesian_site_positions&property_ranges=mdstep[101,200,2],particles[1,3,1],xyz[1,3,1]]"
JPBergsma marked this conversation as resolved.
Show resolved Hide resolved
},
// ...
},
"species_at_sites": ["He", "Ne", "Ar"],
"species_at_sites_meta": {
"range": {
"indexable_dim": ["particles"],
"dim_size": [3],
"range_ids": ["particles"],
"data_range": [{
"name": "particles",
"start": 1,
"step": 1,
"stop": 3,
}],
"nreturned_values": 3,
"nvalues": 3,
"returned_range":[{
"name": "particles",
"start": 1,
"step": 1,
"stop": 3,
}],
"contains_null": false,
"more_data_available": false,
"next": null
},
// ...
},
"_exmpl_ranged_thermostat": [20, 40, 60],
"_exmpl_ranged_thermostat_meta": {
"range": {
"indexable_dim":["frames"],
"dim_size": [200],
"data_range": [{
"name": "particles",
"start": 1,
"step": null,
"stop": 80,
}],
"range_ids": ["frames"],
"nvalues": 3,
"nreturned_values": 3,
"indexes": [[1], [20], [80]],
"contains_null": false,
"returned_range":[{
"name": "particles",
"start": 1,
"step": 1,
"stop": 80,
}],
"more_data_available": false,
"next": null
},
// ...
}
}


HTTP Response Status Codes
--------------------------

Expand Down Expand Up @@ -880,6 +1099,27 @@ Standard OPTIONAL URL query parameters not in the JSON API specification:
If provided, these fields MUST be returned along with the REQUIRED fields.
Other OPTIONAL fields MUST NOT be returned when this parameter is present.
Example: :query-url:`http://example.com/optimade/v1/structures?response_fields=last_modified,nsites`
- **property\_ranges**: specifies which ranges should be used when returning ranged properties.
In general support is OPTIONAL, property definitions may however deviate from this and place stricter requirements on servers.
It consists of the name of a dimension directly followed by a range.
A range consists of a pair of brackets ("(", ASCII 40(0x28)) and (")", ASCII 41(0x29)) enclosing three integers, which are separated by commas (",", ASCII 91(0x5B))
The first integer specifies the first index in that dimension for which values should be returned.
The second integer specifies the last index for which values should be returned.
The third integer specifies the step size in that dimension.
Ranges can be specified for multiple dimensions by separating them with a comma.
Databases MUST use these ranges for properties where the dimension is listed under indexable_dimensions, if this is not the case the Database MAY return more data than was specified in the range.
The ranges are 1 based, i.e. the first value has index 1, and inclusive i.e. for the range :val:`(10,20,1)` the last value returned belongs to index 20.
If a dimension is not specified, it is assumed that the whole range in that dimension is requested.
If a value is not present at a set of the indexes, no value SHOULD be returned.
However, when a value is explicitly set to :val:`null` and :val:`null` has a meaning beyond indicating that no value has been defined :val:`null` MUST be returned.

Example:

A database has a :entry:`structure` entry with id: :val:`id_12345` and a ranged property :property:`test_field` with the two-dimensional data values :val:`[[9.64, 7.52, 0.69], [4.82, 8.35, 3.26], [4.82, 2.78, 7.87], [5.49, 3.48, 1.65]]`.
In addition, the field :field:`range_ids` has the value :val:"["frames", "xyz"]
A client can then make the request :query-url:`http://example.com/optimade/v1/structures/id_12345?property_ranges=frames(1, 4, 2),xyz(2, 3, 1)&response_fields=test_field`.
The response is then a single entry response for structure `12345` where the `test_field` property is included with the values :val:`[[7.52, 0.69], [2.78, 7.87]]`.


Additional OPTIONAL URL query parameters not described above are not considered to be part of this standard, and are instead considered to be "custom URL query parameters".
These custom URL query parameters MUST be of the format "<database-provider-specific prefix><url\_query\_parameter\_name>".
Expand Down