Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON/TOML backend: introduce abbreviated IO modes #1493

Merged
merged 28 commits into from
Dec 16, 2024
Merged
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3b06c09
Introduce dataset template mode to JSON backend
franzpoeschel Aug 4, 2023
0258000
Write used mode to JSON file
franzpoeschel Aug 4, 2023
a48e92e
Use Attribute::getOptional for snapshot attribute
franzpoeschel Feb 23, 2023
e6e5357
Introduce attribute mode
franzpoeschel Aug 4, 2023
0f2d33f
Add example 14_toml_template.cpp
franzpoeschel Aug 4, 2023
52e518d
Use Datatype::UNDEFINED to indicate no dataset definition in template
franzpoeschel Mar 10, 2023
e03d21d
Extend example
franzpoeschel May 19, 2022
ce3aab4
Test short attribute mode
franzpoeschel Aug 7, 2023
06d7ec1
Copy datatypeToString to JSON implementation
franzpoeschel Aug 7, 2023
b29d64d
Fix after rebase: Init JSON config in parallel mode
franzpoeschel Sep 22, 2023
76f2421
Fix after rebase: Don't erase JSON datasets when writing
franzpoeschel Sep 22, 2023
d5534cb
openpmd-pipe: use short modes for test
franzpoeschel Oct 12, 2023
7983763
Less intrusive warnings, allow disabling them
franzpoeschel Oct 12, 2023
f413d75
TOML: Use short modes by default
franzpoeschel Oct 12, 2023
429ade5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 24, 2023
c309e89
Documentation
franzpoeschel Nov 24, 2023
e5f1177
Short mode in default in openPMD >= 2.
franzpoeschel Nov 24, 2023
183c1a1
Short value by default in TOML
franzpoeschel Mar 19, 2024
395cd10
Store the openPMD version information in the IOHandler
franzpoeschel Mar 19, 2024
d4b6f88
Fixes
franzpoeschel Mar 26, 2024
afa0165
Adapt test to recent rebase
franzpoeschel Jun 7, 2024
50ea53c
toml11 4.0 compatibility
franzpoeschel Aug 5, 2024
61a84c6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 15, 2024
d5b35e2
wip: cleanup
franzpoeschel Dec 11, 2024
346c37e
wip: cleanup
franzpoeschel Dec 13, 2024
be3543a
Cleanup
franzpoeschel Dec 13, 2024
7f81e62
Extensive testing
franzpoeschel Dec 13, 2024
985a59f
CI fixes
franzpoeschel Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
wip: cleanup
  • Loading branch information
franzpoeschel committed Dec 13, 2024
commit 346c37e91d4912e3b7c5323fa26a5ad077d8b6a9
7 changes: 4 additions & 3 deletions docs/source/backends/json.rst
Original file line number Diff line number Diff line change
@@ -54,9 +54,10 @@ Stored as an actual dataset, an **openPMD dataset** is a JSON object with three

Stored as a **dataset template**, an openPMD dataset is represented by three JSON keys:

* ``datatype`` (required): As above.
* ``extent`` (required): A list of integers, describing the extent of the dataset.
* ``attributes``: As above.
* ``datatype`` (required): As above.
* ``extent`` (required): A list of integers, describing the extent of the dataset.
This replaces the ``data`` key from the non-template representation.
* ``attributes``: As above.

This mode stores only the dataset metadata.
Chunk load/store operations are ignored.
6 changes: 3 additions & 3 deletions docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
@@ -243,11 +243,11 @@ A full configuration of the JSON backend:

The TOML backend is configured analogously, replacing the ``"json"`` key with ``"toml"``.

All keys found under ``hdf5.dataset`` are applicable globally as well as per dataset.
All keys found under ``json.dataset`` are applicable globally as well as per dataset.
Explanation of the single keys:

* ``json.dataset.mode`` / ``toml.dataset.mode``: One of ``"dataset"`` (default) or ``"template"``.
In "dataset" mode, the dataset will be written as an n-dimensional (recursive) array, padded with nulls (JSON) or zeroes (TOML) for missing values.
In "template" mode, only the dataset metadata (type, extent and attributes) are stored and no chunks can be written or read.
* ``json.attribute.mode`` / ``toml.attribute.mode``: One of ``"long"`` (default in openPMD 1.*) or ``"short"`` (default in openPMD 2.*).
In "template" mode, only the dataset metadata (type, extent and attributes) are stored and no chunks can be written or read (i.e. write/read operations will be skipped).
* ``json.attribute.mode`` / ``toml.attribute.mode``: One of ``"long"`` (default in openPMD 1.*) or ``"short"`` (default in openPMD 2.* and generally in TOML).
The long format explicitly encodes the attribute type in the dataset on disk, the short format only writes the actual attribute as a JSON/TOML value, requiring readers to recover the type.
16 changes: 16 additions & 0 deletions include/openPMD/Dataset.hpp
Original file line number Diff line number Diff line change
@@ -41,7 +41,23 @@ class Dataset
public:
enum : std::uint64_t
{
/**
* Setting one dimension of the extent as JOINED_DIMENSION means that
* the extent along that dimension will be defined by the sum of all
* parallel processes' contributions.
* Only one dimension can be joined. For store operations, the offset
* should be an empty array and the extent should give the actual
* extent of the chunk (i.e. the number of joined elements along the
* joined dimension, equal to the global extent in all other
* dimensions). For more details, refer to
* docs/source/usage/workflow.rst.
*/
JOINED_DIMENSION = std::numeric_limits<std::uint64_t>::max(),
/**
* Some backends (i.e. JSON and TOML in template mode) support the
* creation of dataset with undefined datatype and extent.
* The extent should be given as {UNDEFINED_EXTENT} for that.
*/
UNDEFINED_EXTENT = std::numeric_limits<std::uint64_t>::max() - 1
};

44 changes: 30 additions & 14 deletions include/openPMD/IO/JSON/JSONIOHandlerImpl.hpp
Original file line number Diff line number Diff line change
@@ -267,6 +267,10 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
*/
FileFormat m_fileFormat{};

/*
* Under which key do we find the backend configuration?
* -> "json" for the JSON backend, "toml" for the TOML backend.
*/
std::string backendConfigKey() const;

/*
@@ -278,6 +282,10 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl

std::string m_originalExtension;

/*
* Was the config value explicitly user-chosen, or are we still working with
* defaults?
*/
enum class SpecificationVia
{
DefaultValue,
@@ -288,30 +296,33 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
// Dataset IO mode //
/////////////////////

enum class IOMode
enum class DatasetMode
{
Dataset,
Template
};

IOMode m_mode = IOMode::Dataset;
SpecificationVia m_IOModeSpecificationVia = SpecificationVia::DefaultValue;
bool m_printedSkippedWriteWarningAlready = false;
// IOMode m_mode{};
// SpecificationVia m_IOModeSpecificationVia =
// SpecificationVia::DefaultValue; bool m_printedSkippedWriteWarningAlready
// = false;

struct DatasetMode
struct DatasetMode_s
{
IOMode m_IOMode;
// Initialized in init()
DatasetMode m_mode{};
SpecificationVia m_specificationVia;
bool m_skipWarnings;

template <typename A, typename B, typename C>
operator std::tuple<A, B, C>()
{
return std::tuple<A, B, C>{
m_IOMode, m_specificationVia, m_skipWarnings};
m_mode, m_specificationVia, m_skipWarnings};
}
};
DatasetMode retrieveDatasetMode(openPMD::json::TracingJSON &config) const;
DatasetMode_s m_datasetMode;
DatasetMode_s retrieveDatasetMode(openPMD::json::TracingJSON &config) const;

///////////////////////
// Attribute IO mode //
@@ -323,11 +334,16 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
Long
};

AttributeMode m_attributeMode = AttributeMode::Long;
SpecificationVia m_attributeModeSpecificationVia =
SpecificationVia::DefaultValue;
struct AttributeMode_s
{
// Will be modified in init() based on the openPMD version and the
// active file format (JSON/TOML)
AttributeMode m_mode{};
SpecificationVia m_specificationVia = SpecificationVia::DefaultValue;
};
AttributeMode_s m_attributeMode;

std::pair<AttributeMode, SpecificationVia>
AttributeMode_s
retrieveAttributeMode(openPMD::json::TracingJSON &config) const;

// HELPER FUNCTIONS
@@ -376,7 +392,7 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl
// essentially: m_i = \prod_{j=0}^{i-1} extent_j
static Extent getMultiplicators(Extent const &extent);

static std::pair<Extent, IOMode> getExtent(nlohmann::json &j);
static std::pair<Extent, DatasetMode> getExtent(nlohmann::json &j);

// remove single '/' in the beginning and end of a string
static std::string removeSlashes(std::string);
@@ -434,7 +450,7 @@ class JSONIOHandlerImpl : public AbstractIOHandlerImpl

// check whether the json reference contains a valid dataset
template <typename Param>
IOMode verifyDataset(Param const &parameters, nlohmann::json &);
DatasetMode verifyDataset(Param const &parameters, nlohmann::json &);

static nlohmann::json platformSpecifics();

Loading