Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Configuration File Options #13

Closed
jamesmunns opened this issue Mar 26, 2024 · 11 comments · Fixed by #14
Closed

RFC: Configuration File Options #13

jamesmunns opened this issue Mar 26, 2024 · 11 comments · Fixed by #14
Labels
F-Configuration Functionality relating to configuration Q-RFC Questions that are open to comments

Comments

@jamesmunns
Copy link
Collaborator

jamesmunns commented Mar 26, 2024

We'll eventually need three kinds/sources of configuration:

  • CLI
  • Environment Variables
  • Configuration File(s)

The option for CLI is generally straightforward, we'll start with clap, as in #12. There are also reasonable env variable libraries, which I'll likely add soon.

The harder question is: What to do about config files?

EDIT/NOTE: The relevant configuration requirements are here, though configuration will touch nearly every component in the system. As a (primarily, at least) headless application, configuration inputs serve as the largest UX interface offered by river.

Option A: Simple Text Formats

This is the class of JSON(5), TOML, YAML, INI, RON, RSON, but also more expressive ones like NestedText (cc #3), or KDL. These are essentially libraries for describing structured data in some portable, (sometimes) human readable, and (sometimes) typed way.

As a note, the upstream pingora project has decided to use YAML for these purposes.

They are straightforward to parse and emit, but are limited to the existing syntax and capabilities of the language.

  • PROS:
    • Ser/De work has already been done
    • Straightforward to understand how they work
    • Already understood by users and IDEs
  • CONS:
    • Limited expressiveness
    • May lead to awkward or hard to understand constructs as configuration complexity increases

Option B: Use an existing "lite" programming language

This is the class of starlark, as used by various build systems. There are likely others (like Nix from NixOS), but this is perhaps the most well known as a configuration language. (while writing this, I found another one, tyson, which is intended as a "lite" version of Typescript).

This uses a limited subset of an existing programming language (Python) to allow for greater expressiveness, or avoiding repetition that might require metaprogramming or templating systems, as can be found with other languages.

These run some sort of user provided script/recipe, and "boils down" to a single structured data output. The idea is that instead of using external tools to create verbose/complex configuration files, provide a simple language for users to utilize instead.

As a downside, we intend to support WASM scripting later, which means that we'll essentially have TWO language VMs in the river binary, which might be an okay choice, but feels unfortunate. That being said, WASM may not directly be a great choice for Config, as discussed in the buck2 repo.

  • PROS:
    • Starlark is used by at least two large companies, work has already been done
    • Might be familiar to some users already
    • Likely similar enough to Python to be understood by IDEs, if not there is an LSP server
    • Allows for more expressive configuration than flat files
  • CONS:
    • Heavyweight option (wrt code and complexity) compared to simple text formats
    • Another language for users to learn
    • Might be the second (or third) language necessary for contributors to use/understand

Option C: Write our own configuration file language

Tools like NGINX and Caddy (cc #1) have their own Domain Specific Language (DSL) to specify configuration. We would be able to specify our own configuration, more closely related to river's actual operation

This has a fun blend of pros and cons from the other two options, however note that as this language doesn't exist today, take the pros with a grain of salt! It is very easy to have rose tinted glasses for a tool that doesn't exist yet, but be critical of the downsides of tools that DO exist today:

  • PROS:
    • We can match the expressiveness of the river application to the language itself
    • If we need to make changes, we will be able to
  • CONS:
    • We have to write a language spec and parser
    • We will have to live with no IDE support, or write an LSP for it
    • We will have to teach users the syntax if we don't mimic an existing tool
    • Still a second (or third) language necessary for contributors to use/understand

Option D: Compromise (e.g. A then B/C)

This option proposes choosing Option A for now, until we hit the level of complexity necessary to justify choosing Option B or Option C.

This signs us up for EITHER a breaking change in the future (when we do the switchover), or maintenance of two configuration systems (if we support both), or maybe both (if we support the legacy option for a while before dropping it).

  • PROS:
    • Simple now
    • Won't get complicated until river is further along
    • See other benefits of A
  • CONS:
    • Signing us up TODAY for legacy/dual maintenance and/or breaking changes TOMORROW
    • Need to manage and plan for the switchover process

Option E: Defer (e.g. Nothing now, then A/B/C later)

This choice is like D, but instead of using a config file temporarily, we ONLY use CLI/Env arguments for now, and totally defer the choice of a config format.

Current Implementor Position

I am of the opinion that we should go with Option D.

  • We have bigger fish to fry right now, though not for TOO long, as the configuration file is likely to become the primary UX of the application
  • We will have a better idea of the needs of the configuration file format, once we start implementing Request Path Control
  • It might be worth it to still have an "easy mode" config format for the easy things, with the ability to fall back to a more expressive language later.
  • We'll likely need (multiple) breaking changes in the config format between now and "1.0" anyway, and migrations will be easier with a well defined and simple config format.

I am open to dissenting voices, particularly if you think there is an option missed, if the PROS/CONS listed are misleading or incomplete, or if you think Option B/C are worth chasing NOW.

Please refrain from discussing which of B/C you would like to pursue, I'll open a follow-up issue for that later.

@jamesmunns jamesmunns added F-Configuration Functionality relating to configuration Q-RFC Questions that are open to comments labels Mar 26, 2024
@jamesmunns jamesmunns added this to the Kickstart Spike 1 milestone Mar 26, 2024
@jamesmunns
Copy link
Collaborator Author

jamesmunns commented Mar 26, 2024

As a side note, river does not have an established RFC process, and I am the only implementor at the moment. I feel that an issue exploring the options and tradeoffs is "enough paper" for now, and we can implement a more formal process later.

@jiripospisil
Copy link

Option B: Use an existing "lite" programming language

There's also pkl by Apple which I like the most but it's new and Rust implementations are still in early stages.

@mcpherrinm
Copy link

Using a language like json or yaml that is easily generated by other tools does mean that it’s possible to do options B or C outside of the context of River codebase more easily. Pkl and starlark seem like they can be used to generate json easily.

I think starting with just environment variables is a totally reasonable option though, and the config format would essentially become whatever you’re deploying River with (like a systemd unit file, k8s pod spec) which isn’t so bad.

But it does seem likely awkward as it gets big or especially as it needs structure like lists or maps

@djc
Copy link
Collaborator

djc commented Mar 26, 2024

TOML now, Starlark later seems like a good path to me, FWIW.

@johnpyp
Copy link

johnpyp commented Mar 26, 2024

Caddy's real configuration format is actually json at the core, and the DSL just sits on top of it (another kind of "layered complexity").

For that reason, option D does seem reasonable particularly during pre-stable development when the requirements and demands are still being ironed out.

It would also be nice to be unopinionated about the base language (e.g support JSON, YAML, and TOML, easily done with serde) then just choose a "primary" one to use in examples, but parse any of them into the target config structs.

@jamesmunns
Copy link
Collaborator Author

It would also be nice to be unopinionated about the base language (e.g support JSON, YAML, and TOML, easily done with serde) then just choose a "primary" one to use in examples, but parse any of them into the target config structs.

I mostly agree with you, but not all structs can be expressed in all three. For example, TOML requires tables to be in the trailing position, while this is not the case for JSON.

That being said - I don't think this approach is terribly unreasonable.

Thanks for the context re: the Caddy DSL!

@oowl
Copy link

oowl commented Mar 27, 2024

I want to recommend another approach way to achieve some DSL configuration purposes. Like lua-nginx- module openresty tries to implement full cycle lua language in Nginx runtime, This will fit the more dynamic case. eg: https://github.com/Kong/kong has been built top in openresty and Nginx by lua programming language.

@jamesmunns
Copy link
Collaborator Author

I appreciate the input from everyone! I'll leave this issue open until I have an initial implementation, but I plan to go with:

  • Pick Option D - leave further discussion re: Options B/C for later
  • Start with TOML now as the primary config format, with the intent to also support JSON (maybe as --config-json=PATH and --config-toml=PATH, then maybe making a call to prefer one over the other as things go).

(I'm a little biased against YAML currently, and YAML in Rust is having a moment this week, so I'm going to avoid it for now, but don't take that as a permanent decision, just what I'm going to do now).

The intent will be to have an internal Rust struct the represents the actual full configuration data, with short leaps from the deserialized TOML + JSON (and later DSL/tool output) to convert to that internal format. This decoupling will also facilitate internal changes (without breaking the config file format), as well as breaking config file formats (without breaking the internal representation).

@jamesmunns
Copy link
Collaborator Author

This is implemented (basically) in #14, I haven't added JSON support yet, but I'm open to PRs, or can do this when there is interest/demand (otherwise I'll defer to reduce multiple management for now).

jamesmunns added a commit that referenced this issue Mar 28, 2024
This PR implements the decision made in #13 to go with TOML for now, leaving the door open
for a more complex configuration language in the future.

Very few configuration options are exposed, I expect to introduce more shortly to allow for
initial "out of the box" Server setup.
@alexandru0-dev
Copy link

Here are my 2 cents:
Premise: I'm a nix enthusiast.

Going for the D option makes the best sense but I would suggest to really think about if option B is really needed before committing to a language as a configuration method.
As it increases complexity in development, testing and deployment.
Also forces the end user into a DSL or programming language (which can be a benefit or not).
Using pkl or similar, still can generate to static configs so the result is basically option A

Using option A, would allow easier to make it a nix pkgs and modules as nix can parse into toml, yaml etc

I would advise against making your own DSL as it increases complexity, slow development times and shifts the main focus of the project

I'm happy to help and discuss options

@keithmattix
Copy link

keithmattix commented Apr 5, 2024

Hi! Big fan of the project; excited to see discussion + development already kicking off! I come from a service mesh/envoy background, so I'm really interested in the configuration format discussions and the eventual possibility of being able to use the xDS protocol for dynamically programming a fleet of River proxies. I'd love to get some opinions on if something like this would be useful for others' use-cases as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-Configuration Functionality relating to configuration Q-RFC Questions that are open to comments
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants