RFC: Configuration File Options #13

jamesmunns · 2024-03-26T13:46:27Z

We'll eventually need three kinds/sources of configuration:

CLI
Environment Variables
Configuration File(s)

The option for CLI is generally straightforward, we'll start with clap, as in #12. There are also reasonable env variable libraries, which I'll likely add soon.

The harder question is: What to do about config files?

EDIT/NOTE: The relevant configuration requirements are here, though configuration will touch nearly every component in the system. As a (primarily, at least) headless application, configuration inputs serve as the largest UX interface offered by river.

Option A: Simple Text Formats

This is the class of JSON(5), TOML, YAML, INI, RON, RSON, but also more expressive ones like NestedText (cc #3), or KDL. These are essentially libraries for describing structured data in some portable, (sometimes) human readable, and (sometimes) typed way.

As a note, the upstream pingora project has decided to use YAML for these purposes.

They are straightforward to parse and emit, but are limited to the existing syntax and capabilities of the language.

PROS:
- Ser/De work has already been done
- Straightforward to understand how they work
- Already understood by users and IDEs
CONS:
- Limited expressiveness
- May lead to awkward or hard to understand constructs as configuration complexity increases

Option B: Use an existing "lite" programming language

This is the class of starlark, as used by various build systems. There are likely others (like Nix from NixOS), but this is perhaps the most well known as a configuration language. (while writing this, I found another one, tyson, which is intended as a "lite" version of Typescript).

This uses a limited subset of an existing programming language (Python) to allow for greater expressiveness, or avoiding repetition that might require metaprogramming or templating systems, as can be found with other languages.

These run some sort of user provided script/recipe, and "boils down" to a single structured data output. The idea is that instead of using external tools to create verbose/complex configuration files, provide a simple language for users to utilize instead.

As a downside, we intend to support WASM scripting later, which means that we'll essentially have TWO language VMs in the river binary, which might be an okay choice, but feels unfortunate. That being said, WASM may not directly be a great choice for Config, as discussed in the buck2 repo.

PROS:
- Starlark is used by at least two large companies, work has already been done
- Might be familiar to some users already
- Likely similar enough to Python to be understood by IDEs, if not there is an LSP server
- Allows for more expressive configuration than flat files
CONS:
- Heavyweight option (wrt code and complexity) compared to simple text formats
- Another language for users to learn
- Might be the second (or third) language necessary for contributors to use/understand

Option C: Write our own configuration file language

Tools like NGINX and Caddy (cc #1) have their own Domain Specific Language (DSL) to specify configuration. We would be able to specify our own configuration, more closely related to river's actual operation

This has a fun blend of pros and cons from the other two options, however note that as this language doesn't exist today, take the pros with a grain of salt! It is very easy to have rose tinted glasses for a tool that doesn't exist yet, but be critical of the downsides of tools that DO exist today:

PROS:
- We can match the expressiveness of the river application to the language itself
- If we need to make changes, we will be able to
CONS:
- We have to write a language spec and parser
- We will have to live with no IDE support, or write an LSP for it
- We will have to teach users the syntax if we don't mimic an existing tool
- Still a second (or third) language necessary for contributors to use/understand

Option D: Compromise (e.g. A then B/C)

This option proposes choosing Option A for now, until we hit the level of complexity necessary to justify choosing Option B or Option C.

This signs us up for EITHER a breaking change in the future (when we do the switchover), or maintenance of two configuration systems (if we support both), or maybe both (if we support the legacy option for a while before dropping it).

PROS:
- Simple now
- Won't get complicated until river is further along
- See other benefits of A
CONS:
- Signing us up TODAY for legacy/dual maintenance and/or breaking changes TOMORROW
- Need to manage and plan for the switchover process

Option E: Defer (e.g. Nothing now, then A/B/C later)

This choice is like D, but instead of using a config file temporarily, we ONLY use CLI/Env arguments for now, and totally defer the choice of a config format.

Current Implementor Position

I am of the opinion that we should go with Option D.

We have bigger fish to fry right now, though not for TOO long, as the configuration file is likely to become the primary UX of the application
We will have a better idea of the needs of the configuration file format, once we start implementing Request Path Control
It might be worth it to still have an "easy mode" config format for the easy things, with the ability to fall back to a more expressive language later.
We'll likely need (multiple) breaking changes in the config format between now and "1.0" anyway, and migrations will be easier with a well defined and simple config format.

I am open to dissenting voices, particularly if you think there is an option missed, if the PROS/CONS listed are misleading or incomplete, or if you think Option B/C are worth chasing NOW.

Please refrain from discussing which of B/C you would like to pursue, I'll open a follow-up issue for that later.

The text was updated successfully, but these errors were encountered:

jamesmunns · 2024-03-26T13:49:41Z

As a side note, river does not have an established RFC process, and I am the only implementor at the moment. I feel that an issue exploring the options and tradeoffs is "enough paper" for now, and we can implement a more formal process later.

jiripospisil · 2024-03-26T14:04:07Z

Option B: Use an existing "lite" programming language

There's also pkl by Apple which I like the most but it's new and Rust implementations are still in early stages.

mcpherrinm · 2024-03-26T14:38:22Z

Using a language like json or yaml that is easily generated by other tools does mean that it’s possible to do options B or C outside of the context of River codebase more easily. Pkl and starlark seem like they can be used to generate json easily.

I think starting with just environment variables is a totally reasonable option though, and the config format would essentially become whatever you’re deploying River with (like a systemd unit file, k8s pod spec) which isn’t so bad.

But it does seem likely awkward as it gets big or especially as it needs structure like lists or maps

djc · 2024-03-26T14:39:34Z

TOML now, Starlark later seems like a good path to me, FWIW.

johnpyp · 2024-03-26T15:40:42Z

Caddy's real configuration format is actually json at the core, and the DSL just sits on top of it (another kind of "layered complexity").

For that reason, option D does seem reasonable particularly during pre-stable development when the requirements and demands are still being ironed out.

It would also be nice to be unopinionated about the base language (e.g support JSON, YAML, and TOML, easily done with serde) then just choose a "primary" one to use in examples, but parse any of them into the target config structs.

jamesmunns · 2024-03-26T15:43:41Z

It would also be nice to be unopinionated about the base language (e.g support JSON, YAML, and TOML, easily done with serde) then just choose a "primary" one to use in examples, but parse any of them into the target config structs.

I mostly agree with you, but not all structs can be expressed in all three. For example, TOML requires tables to be in the trailing position, while this is not the case for JSON.

That being said - I don't think this approach is terribly unreasonable.

Thanks for the context re: the Caddy DSL!

oowl · 2024-03-27T04:53:17Z

I want to recommend another approach way to achieve some DSL configuration purposes. Like lua-nginx- module openresty tries to implement full cycle lua language in Nginx runtime, This will fit the more dynamic case. eg: https://github.com/Kong/kong has been built top in openresty and Nginx by lua programming language.

jamesmunns · 2024-03-27T15:51:06Z

I appreciate the input from everyone! I'll leave this issue open until I have an initial implementation, but I plan to go with:

Pick Option D - leave further discussion re: Options B/C for later
Start with TOML now as the primary config format, with the intent to also support JSON (maybe as --config-json=PATH and --config-toml=PATH, then maybe making a call to prefer one over the other as things go).

(I'm a little biased against YAML currently, and YAML in Rust is having a moment this week, so I'm going to avoid it for now, but don't take that as a permanent decision, just what I'm going to do now).

The intent will be to have an internal Rust struct the represents the actual full configuration data, with short leaps from the deserialized TOML + JSON (and later DSL/tool output) to convert to that internal format. This decoupling will also facilitate internal changes (without breaking the config file format), as well as breaking config file formats (without breaking the internal representation).

jamesmunns · 2024-03-28T14:13:19Z

This is implemented (basically) in #14, I haven't added JSON support yet, but I'm open to PRs, or can do this when there is interest/demand (otherwise I'll defer to reduce multiple management for now).

This PR implements the decision made in #13 to go with TOML for now, leaving the door open for a more complex configuration language in the future. Very few configuration options are exposed, I expect to introduce more shortly to allow for initial "out of the box" Server setup.

alexandru0-dev · 2024-04-01T15:16:13Z

Here are my 2 cents:
Premise: I'm a nix enthusiast.

Going for the D option makes the best sense but I would suggest to really think about if option B is really needed before committing to a language as a configuration method.
As it increases complexity in development, testing and deployment.
Also forces the end user into a DSL or programming language (which can be a benefit or not).
Using pkl or similar, still can generate to static configs so the result is basically option A

Using option A, would allow easier to make it a nix pkgs and modules as nix can parse into toml, yaml etc

I would advise against making your own DSL as it increases complexity, slow development times and shifts the main focus of the project

I'm happy to help and discuss options

keithmattix · 2024-04-05T17:13:01Z

Hi! Big fan of the project; excited to see discussion + development already kicking off! I come from a service mesh/envoy background, so I'm really interested in the configuration format discussions and the eventual possibility of being able to use the xDS protocol for dynamically programming a fleet of River proxies. I'd love to get some opinions on if something like this would be useful for others' use-cases as well

jamesmunns added F-Configuration Functionality relating to configuration Q-RFC Questions that are open to comments labels Mar 26, 2024

jamesmunns added this to the Kickstart Spike 1 milestone Mar 26, 2024

This was referenced Mar 26, 2024

Layered Complexity in Configuration #1

Open

Consider using (something like) NestedText as config format #3

Closed

Create tool that translates Nginx configs to River configs #5

Open

jamesmunns mentioned this issue Mar 28, 2024

Begin implementing configuration #14

Merged

jamesmunns closed this as completed in #14 Mar 28, 2024

jamesmunns mentioned this issue Jun 11, 2024

Add support for KDL configuration #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Configuration File Options #13

RFC: Configuration File Options #13

jamesmunns commented Mar 26, 2024 •

edited

Loading

jamesmunns commented Mar 26, 2024 •

edited

Loading

jiripospisil commented Mar 26, 2024

mcpherrinm commented Mar 26, 2024

djc commented Mar 26, 2024

johnpyp commented Mar 26, 2024

jamesmunns commented Mar 26, 2024

oowl commented Mar 27, 2024 •

edited

Loading

jamesmunns commented Mar 27, 2024

jamesmunns commented Mar 28, 2024

alexandru0-dev commented Apr 1, 2024

keithmattix commented Apr 5, 2024 •

edited

Loading

RFC: Configuration File Options #13

RFC: Configuration File Options #13

Comments

jamesmunns commented Mar 26, 2024 • edited Loading

Option A: Simple Text Formats

Option B: Use an existing "lite" programming language

Option C: Write our own configuration file language

Option D: Compromise (e.g. A then B/C)

Option E: Defer (e.g. Nothing now, then A/B/C later)

Current Implementor Position

jamesmunns commented Mar 26, 2024 • edited Loading

jiripospisil commented Mar 26, 2024

mcpherrinm commented Mar 26, 2024

djc commented Mar 26, 2024

johnpyp commented Mar 26, 2024

jamesmunns commented Mar 26, 2024

oowl commented Mar 27, 2024 • edited Loading

jamesmunns commented Mar 27, 2024

jamesmunns commented Mar 28, 2024

alexandru0-dev commented Apr 1, 2024

keithmattix commented Apr 5, 2024 • edited Loading

jamesmunns commented Mar 26, 2024 •

edited

Loading

jamesmunns commented Mar 26, 2024 •

edited

Loading

oowl commented Mar 27, 2024 •

edited

Loading

keithmattix commented Apr 5, 2024 •

edited

Loading