Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ranged properties #452

Closed
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c18cf0c
Intermediate state adding ranged properties.
JPBergsma Dec 31, 2022
7dc50f2
Merge branch 'develop' into JPBergsma/Add_ranged_properties
JPBergsma Jan 3, 2023
6afcf9c
first draft for ranged properties.
JPBergsma Jan 6, 2023
1a55c2a
Merge branch 'develop' into JPBergsma/Add_ranged_properties
JPBergsma Jan 6, 2023
b597a67
Removed average, set, min and max fields for now as these become quit…
JPBergsma Jan 9, 2023
37db878
Small corrections.
JPBergsma Jan 9, 2023
a832751
Added how to treat missing values for requested range.
JPBergsma Jan 11, 2023
16aba07
changed description field
JPBergsma Jan 12, 2023
b1d69a8
Apply suggestions from code review
JPBergsma Jan 12, 2023
edbfc25
Merge branch 'Materials-Consortia:develop' into JPBergsma/Add_ranged_…
JPBergsma Jan 12, 2023
14de45d
Apply suggestions from code review Vaitkus
JPBergsma Jan 17, 2023
906db81
intermediate state from implementing code review.
JPBergsma Jan 17, 2023
1feb4a9
Merge branch 'JPBergsma/Add_ranged_properties' of https://github.com/…
JPBergsma Jan 17, 2023
a96dffe
Processed comments rartino.
JPBergsma Jan 18, 2023
15f599c
Small corrections.
JPBergsma Feb 15, 2023
73905dc
Added returned range property.
JPBergsma Feb 15, 2023
c6834f3
Added extra explanation values field.
JPBergsma Feb 16, 2023
b0cc94c
Apply suggestions from code review
JPBergsma Feb 17, 2023
0cee1e6
Merge branch 'develop' into JPBergsma/Add_ranged_properties
JPBergsma Mar 6, 2023
d7c8a9c
Processed comments rickard and a few more small improvements.
JPBergsma Mar 9, 2023
d1e8d74
Further changes after proof reading.
JPBergsma Mar 9, 2023
139c70e
further refinements.
JPBergsma Mar 9, 2023
916d6f2
Changed 'n_' to 'n' for ranged metadata properties tio be consistent …
JPBergsma Mar 13, 2023
ee3651e
Changed wording of range_id field after suggestion Rickard.
JPBergsma Mar 13, 2023
1f794b6
placed subsequent sentences on seperate lines.
JPBergsma Mar 16, 2023
169a1f4
Processed points discussed with Rickard.
JPBergsma Mar 24, 2023
f513596
Added per entry next field + small corrections.
JPBergsma Mar 28, 2023
65b8ad1
Apply suggestions from code review
JPBergsma May 2, 2023
94db38c
Adjusted the description next fields for ranged properties.
JPBergsma May 2, 2023
033ea11
Corrected range name for _exmpl_ranged_thermostat.
JPBergsma May 25, 2023
3762d30
Added that querying on properties in the range dictionary is optional.
JPBergsma May 30, 2023
8332567
Specifically mention that support for queries directly on the values …
JPBergsma May 31, 2023
a87b301
Updated example to latest version metadata proposal and added more ex…
JPBergsma Jun 2, 2023
7e9b4f4
Improved explanation returned_range field.
JPBergsma Jun 2, 2023
4b298c5
Small corrections.
JPBergsma Jun 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
259 changes: 259 additions & 0 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,125 @@ For example, the following query can be sent to API implementations `exmpl1` and

:filter:`filter=_exmpl1_band_gap<2.0 OR _exmpl2_band_gap<2.5`


Ranged Properties
-----------------

Ranged properties are used for properties that are too large to be returned by default for every entry in a response.
The server can therefore choose to limit the size of the response by not returning (all) the values.
The client is then has to perform another query to retrieve the rest of the data.
Ranged Properties can also be used by the server to support slicing, so the client can request that only a subsection of the values needs to be returned.

- **Requirements/Conventions**:

- **Support**: OPTIONAL support in implementations.
- A ranged property can be recognized by the presence of the field :field:`range` in the metadata of the property, i.e. in the field: :field:`<property_name>` under the per entry :field: ̀meta` field.
- For a ranged property, the server MAY return :val:`null` or only a part of the values of the property under the field :field:`<property_name>`, so the size of the entries remains limited, and many entries can be returned in a single response.
In that case, a links object MUST be provided in the field :field:`meta.<property_name>.range.next` from which the next part of the property is returned.
- Support for queries on the fields under :field:`range` is OPTIONAL.
- As ranged properties can have many values, support for queries on theses values is OPTIONAL.

The metadata field of the ranged property, :field:`meta.<property_name>.range`, MUST include these fields:

- :field:`range_ids`: list of strings.
A list with an identifier for each dimension of the property.
If, within one entry, dimensions for two or more properties share the same :field:`range_id` those dimensions should be thought of as the same dimension.
For example, both the :property:`energy` and :property:`cartesian_site_positions` of a molecular dynamics trajectory share a range_id of :val:`frames`.
This means that the energy at index x(in the dimension labelled by this range_id) belongs to the cartesian_site_positions at the same index x.

- :field:`indexable_dim`: list of strings.
The list of range_ids of the dimensions for which slicing is supported, i.e. the client can request a subrange via the :query-param:`property_ranges` query parameter.

- :field:`data_range`: list of dictionaries.
This field describes how the values are distributed in the different dimensions.
It consists of a dictionary for each dimension.
This dictionary has the fields:

- :field:`name`: string.
The name of the dimension as given in the :field:`range_ids` field.

- :field:`start`: integer.
The index of the first value of the property in this dimension.
The indexing is 1 based, so the lowest value an index can have is 1.

- :field:`step`: integer.
If the values are regularly spaced in this dimension, this value indicates the difference in index between subsequent values.
If there is no regular spacing between the values, the value of this property MUST be :val:`null`.

- :field:`stop`: integer.
The index of the last value of this property in this dimension.

- :field:`contains_null`: boolean
Copy link
Contributor Author

@JPBergsma JPBergsma May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "contains_null" not be a more generic metadata property that could also apply to other non-ranged fields ?
Ps. This would also allow querying on whether a property contains_null, now that I have placed in the description that support for queries on the fields under "range" is optional.

This value indicates whether some of the values of the property are :val:`null`.
This may be the case when values are missing or if the data is sparse, but the server still wants to present the data as a regular array to enable slicing.

- :field:`dim_size`: list of integers.

The size of the range for each indexable dimension.
The order is the same as in the :field:`range_ids` field.

- :field:`nvalues`: integer.

The total number of values in the property.
SHOULD be a queryable property with support for all mandatory filter features.

- :field:`next`: `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`_

If there is still more data available for the property, this field MUST contain a URL from which the next set of values for this property can be obtained.
The `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__, containing the URL, is either a string, or a links object, which can contain the following fields:

- **href**: a string containing the URL.
- **meta**: a meta object containing non-standard meta-information about the next link.

If all the data for this property has been returned, the value SHOULD be :val:`null`

- :field:`more_data_available`: boolean.

:field-val:`false` if all the values in the requested range have been returned, and :field-val:`true` if the returned values are incomplete.


If the :field:`<property_name>` contains data, i.e., it is neither :val:`null`, nor an empty list, the following additional properties can be present:
Querying is not relevant for these properties and SHOULD NOT be supported.

- :field:`nreturned_values`: integer

The number of values that have been returned.
This value SHOULD be present.

- :field:`indexes`: list of lists of integers.

If the values are not regularly spaced along the dimensions, this list holds the indexes for each value.
The order of the indexes must match the order in the field :field:`range_ids`.
MUST be present if any of the dimensions in the field :field:`data_ranges.step` has the value null.
i.e. when the values are not regularly distributed over the grid.
Otherwise, it SHOULD NOT be present.

- :field:`returned_range`: list of dictionaries.

The range covering the returned data.
The dictionaries contain the same fields as those of the :field:`data_range` field.
In this case these fields, however, only apply to the returned values and not all the values of the property.
This field MUST be present when values are returned.
For dimensions where the field :field:`data_range.step` is not defined, the value of the field :field:`returned_range.step` MUST match the stepsize as used in the query parameter :query_param:`property_ranges`.

In addition to these fields in the metadata, entries which support accessing data via the :query-param:`property\_ranges` query parameter SHOULD support per entry :field:`next` and :field:`more_data_available` fields to enable returning the remainder of the data for all properties for the rest of the range.

- :field:`next`: `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__.

If data is requested for multiple properties at the same time, but the total amount of data is too large to be returned in one response, this field contains a link from which the remainder of the data can be obtained.
Responses supplied via this next link MUST contain all the values for all the requested properties that lie within the requested range and have not yet been returned. The server MAY again choose to return only a part of the values. In that case another next link SHOULD be provided for the remaining values.
If all the data for this entryy has been returned, the value SHOULD be :val:`null`

The `JSON API links object <http://jsonapi.org/format/1.0/#document-links>`__, containing the URL, is either a string, or a links object, which can contain the following fields:

- **href**: a string containing the URL.
- **meta**: a meta object containing non-standard meta-information about the next link.

- :field:`more_data_available`: boolean.

:field-val:`false` if all the values in the requested range have been returned, and :field-val:`true` if the returned values are incomplete.


Responses
=========

Expand Down Expand Up @@ -697,6 +816,121 @@ An example of a full response:
]
}


- Several examples of how ranged properties can be returned in the JSON format:

.. code:: jsonc

{
"attributes":{
"cartesian_site_positions": [[[2.36, 5.36, 9.56],[7.24, 3.58, 0.56],[8.12, 6.95, 4.56]],
[[2.38, 5.37, 9.56],[7.24, 3.57, 0.58],[8.11, 6.93, 4.58]],
[[2.39, 5.38, 9.55],[7.23, 3.57, 0.59],[8.10, 6.93, 4.57]]
// ...
],
"species_at_sites": ["He", "Ne", "Ar"],
"_exmpl_ranged_thermostat": [20, 40, 60],
// ...
},
"meta":{
"cartesian_site_positions": {
"range": {
"range_ids": ["frames","particles","xyz"],
"indexable_dim": ["frames","particles","xyz"],
"data_range": [{
"name": "frames",
"start": 1,
"step": 1,
"stop": 200,
},{
"name": "particles",
"start": 1,
"step": 1,
"stop": 3,
},{
"name": "xyz",
"start": 1,
"step": 1,
"stop": 3,
}],
"dim_size": [200, 3, 3],
"nvalues": 1800,
"nreturned_values": 900,
"returned_range": [{
"name": "frames",
"start": 1,
"step": 2,
"stop": 100,
},{
"name": "particles",
"start": 1,
"step": 1,
"stop": 3,
},{
"name": "xyz",
"start": 1,
"step": 1,
"stop": 3,
}],
"contains_null": false,
"more_data_available": true,
"next": "https://example.com/optimade/v1/structures/id123456?response_fields=cartesian_site_positions&property_ranges=frames(101,200,2),particles(1,3,1),xyz(1,3,1)"
}
},
"species_at_sites": {
"range": {
"indexable_dim": ["particles"],
"dim_size": [3],
"range_ids": ["particles"],
"data_range": [{
"name": "particles",
"start": 1,
"step": 1,
"stop": 3,
}],
"nreturned_values": 3,
"nvalues": 3,
"returned_range":[{
"name": "particles",
"start": 1,
"step": 1,
"stop": 3,
}],
"contains_null": false,
"more_data_available": false,
"next": null
}
},
"_exmpl_ranged_thermostat": {
"range": {
"indexable_dim":["frames"],
"dim_size": [200],
"data_range": [{
"name": "frames",
"start": 1,
"step": null,
"stop": 80,
}],
"range_ids": ["frames"],
"nvalues": 3,
"nreturned_values": 3,
"indexes": [[1], [20], [80]],
"contains_null": false,
"returned_range":[{
"name": "frames",
"start": 1,
"step": null,
"stop": 80,
}],
"more_data_available": false,
"next": null
}
}
},
// ...
}


HTTP Response Status Codes
--------------------------

Expand Down Expand Up @@ -880,6 +1114,31 @@ Standard OPTIONAL URL query parameters not in the JSON API specification:
If provided, these fields MUST be returned along with the REQUIRED fields.
Other OPTIONAL fields MUST NOT be returned when this parameter is present.
Example: :query-url:`http://example.com/optimade/v1/structures?response_fields=last_modified,nsites`
- **property\_ranges**: specifies which ranges should be used when returning ranged properties.
In general support is OPTIONAL, property definitions may however deviate from this and place stricter requirements on servers.
It consists of the name of a dimension directly followed by a range.
A range consists of a pair of brackets ("(", ASCII 40(0x28)) and (")", ASCII 41(0x29)) enclosing three integers, which are separated by commas (",", ASCII 91(0x5B))
The first integer specifies the first index in that dimension for which values should be returned.
The second integer specifies the last index for which values should be returned.
The third integer specifies the step size in that dimension.
Ranges can be specified for multiple dimensions by separating them with a comma.
Databases MUST use these ranges for properties where the dimension is listed under indexable_dimensions, if this is not the case the database MAY return more data than was specified in the range.
The ranges are 1 based, i.e. the first value has index 1, and inclusive i.e. for the range :val:`(10,20,1)` the last value returned belongs to index 20.
If a dimension is not specified, it is assumed that the whole range in that dimension is requested.
If a value is not present at a set of the indexes, no value SHOULD be returned.
However, when a value is explicitly set to :val:`null` and :val:`null` has a meaning beyond indicating that no value has been defined :val:`null` MUST be returned.
Incase the requested property_range covers more data than the server wants to return the server may choose to return only a part of the data.
For each combination of indexes for which data is returned all the values for all requested properties however need to be returned.
If the server does not return all the requested data, a link MUST be provided in the :field:`next` field, that applies to an entry as a whole, from which the remainder of the data can be retrieved.


Example:

A database has a :entry:`structure` entry with id: :val:`id_12345` and a ranged property :property:`test_field` with the two-dimensional data values :val:`[[9.64, 7.52, 0.69], [4.82, 8.35, 3.26], [4.82, 2.78, 7.87], [5.49, 3.48, 1.65]]`.
In addition, the field :field:`range_ids` has the value :val:"["frames", "xyz"]
A client can then make the request :query-url:`http://example.com/optimade/v1/structures/id_12345?property_ranges=frames(1, 4, 2),xyz(2, 3, 1)&response_fields=test_field`.
The response is then a single entry response for structure `12345` where the `test_field` property is included with the values :val:`[[7.52, 0.69], [2.78, 7.87]]`.


Additional OPTIONAL URL query parameters not described above are not considered to be part of this standard, and are instead considered to be "custom URL query parameters".
These custom URL query parameters MUST be of the format "<database-provider-specific prefix><url\_query\_parameter\_name>".
Expand Down