Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is there way to make both --opt-with-dash and environment variable work at the same time? #469

Closed
braindevices opened this issue Nov 5, 2024 · 18 comments · Fixed by #489
Assignees

Comments

@braindevices
Copy link

braindevices commented Nov 5, 2024

test using #468

import os
import traceback
from pydantic import AliasChoices, AliasGenerator, BaseModel
from pydantic_settings import BaseSettings, SettingsConfigDict, version
import sys
import pydantic.version

class SubModel(BaseModel):
    v1: str = "default"
    v2: bytes = b"hello"
    v3: int


class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="MYTEST_",
        env_nested_delimiter="__",
        cli_parse_args=True,
        nested_model_default_partial_update=True,
        alias_generator=AliasGenerator(lambda s: s.replace('_', '-'))
    )

    v0: str = "ok"
    sub_model: SubModel = SubModel(v1="top default", v3=33)


class Settings1(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="MYTEST_",
        env_nested_delimiter="__",
        cli_parse_args=True,
        nested_model_default_partial_update=True
    )

    v0: str = "ok"
    sub_model: SubModel = SubModel(v1="top default", v3=33)


class Settings2(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="MYTEST_",
        env_nested_delimiter="__",
        cli_parse_args=True,
        extra="ignore",
        nested_model_default_partial_update=True,
        alias_generator=AliasGenerator(validation_alias=lambda field_name: AliasChoices(field_name, field_name.replace("_", "-")))
    )

    v0: str = "ok"
    sub_model: SubModel = SubModel(v1="top default", v3=33)


def help(cls: BaseSettings):
    sys.argv = [
        cls.__name__,
        "-h"
    ]
    try:
        cls()
    except SystemExit:
        print("sys.exit() was called and handled.")
    print("After exit")

def test_cli(cls: BaseSettings, dash: bool, disable_cli: bool = False):
    sys.argv = [
        'setting',
        *(["--sub-model.v1=cli" if dash else "--sub_model.v1=cli"] if not disable_cli else [])
    ]
    print("argv:", sys.argv)
    # for k, v in os.environ.items():
    #     if k.startswith("MYTEST_"):
    #         print(k, v)
    try:
        a = cls()
        print(f"in test cli: {cls.__name__}()=", a)
    except Exception:
        traceback.print_exc()

def main():
    print("pydnatic version:", pydantic.version.VERSION)
    print("pydantic_settings version:", version.VERSION)

    help(Settings)
    help(Settings1)
    help(Settings2)

    # test_cli(Settings, dash=True)
    os.environ["V0"] = "env no prefix"
    os.environ["SUB_MODEL_V1"] = "env no prefix"
    os.environ["SUB_MODEL_V2"] = "env no prefix"
    os.environ["SUB_MODEL__V1"] = "env__no__prefix"
    os.environ["SUB_MODEL__V2"] = "env__no__prefix"
    os.environ["MYTEST_V0"] = "env with prefix"
    os.environ["MYTEST_SUB_MODEL__V1"] = "env with prefix"
    os.environ["MYTEST_SUB_MODEL__V2"] = "env with prefix"
    test_cli(Settings1, dash=False)
    test_cli(Settings, dash=True)
    test_cli(Settings2, dash=False)
    test_cli(Settings2, dash=True)
    test_cli(Settings2, dash=False, disable_cli=True)


if __name__ == "__main__":
    main()

output:

pydnatic version: 2.9.2
pydantic_settings version: 2.6.1
usage: Settings [-h] [--v0 str] [--sub-model JSON] [--sub-model.v1 str] [--sub-model.v2 bytes] [--sub-model.v3 int]

options:
  -h, --help            show this help message and exit
  --v0 str              (default: ok)

sub-model options:
  --sub-model JSON      set sub-model from JSON string
  --sub-model.v1 str    (default: top default)
  --sub-model.v2 bytes  (default: b'hello')
  --sub-model.v3 int    (default: 33)
sys.exit() was called and handled.
After exit
usage: Settings1 [-h] [--v0 str] [--sub_model JSON] [--sub_model.v1 str] [--sub_model.v2 bytes] [--sub_model.v3 int]

options:
  -h, --help            show this help message and exit
  --v0 str              (default: ok)

sub_model options:
  --sub_model JSON      set sub_model from JSON string
  --sub_model.v1 str    (default: top default)
  --sub_model.v2 bytes  (default: b'hello')
  --sub_model.v3 int    (default: 33)
sys.exit() was called and handled.
After exit
usage: Settings2 [-h] [--v0 str] [--sub_model JSON] [--sub_model.v1 str] [--sub_model.v2 bytes] [--sub_model.v3 int]

options:
  -h, --help            show this help message and exit
  --v0 str              (default: ok)

sub_model options:
  --sub_model JSON, --sub-model JSON
                        set sub_model from JSON string
  --sub_model.v1 str, --sub-model.v1 str
                        (default: top default)
  --sub_model.v2 bytes, --sub-model.v2 bytes
                        (default: b'hello')
  --sub_model.v3 int, --sub-model.v3 int
                        (default: 33)

argv: ['setting', '--sub_model.v1=cli']
in test cli: Settings1()= v0='env with prefix' sub_model=SubModel(v1='cli', v2=b'env with prefix', v3=33)
argv: ['setting', '--sub-model.v1=cli']
in test cli: Settings()= v0='env no prefix' sub_model=SubModel(v1='cli', v2=b'hello', v3=33)
argv: ['setting', '--sub_model.v1=cli']
in test cli: Settings2()= v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)
argv: ['setting', '--sub-model.v1=cli']
in test cli: Settings2()= v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)
argv: ['setting']
in test cli: Settings2()= v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)

So the Settings, use alias_generator=AliasGenerator(lambda s: s.replace('_', '-')) which cause the env_prefix has no effect, the sub_model become sub-model so it won't find any sub_model in env.

Settings1, doe not have the alias generator, so the env options works normally.

Settings2, attempt to use alias_generator=AliasGenerator(validation_alias=lambda field_name: AliasChoices(field_name, field_name.replace("_", "-"))) to allow both --sub-model and env_* options to work. But it raise error without extra="ignore":

pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings2
sub-model
  Extra inputs are not permitted [type=extra_forbidden, input_value={'v1': 'env__no__prefix', 'v2': 'env__no__prefix'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/extra_forbidden

with extra="ignore", both '--sub_model.v1=cli' and '--sub-model.v1=cli' has no effect.

@hramezani
Copy link
Member

@kschwab Could you take a look?

@kschwab
Copy link
Contributor

kschwab commented Nov 21, 2024

@braindevices, @hramezani, this is an issue specific to EnvSettingsSource. Typically, environment variables only allow letters and the underscore character, e.g.:

$ export HELLO-WORLD=today # fails
bash: export: 'HELLO-WORLD=today': not a valid identifier
$ export HELLO_WORLD=today # passes

So, in theory the correct solution is Settings2, which would allow for use of --sub-model at the CLI and SUB_MODEL for environment variables. This is identical to #406. Printing what the sources return for Settings2 with and without - at the CLI:

argv: ['Settings2', '--sub_model.v1=cli']
CliSettingsSource {'sub-model': {'v1': 'cli'}}
InitSettingsSource {}
EnvSettingsSource {'v0': 'env no prefix', 'sub-model': {'v1': 'env__no__prefix', 'v2': 'env__no__prefix'}}
DotEnvSettingsSource {}
SecretsSettingsSource {}
DefaultSettingsSource {'sub_model': {'v1': 'top default', 'v2': b'hello', 'v3': 33}}
in test cli: Settings2()= v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)

argv: ['Settings2', '--sub-model.v1=cli']
CliSettingsSource {'sub-model': {'v1': 'cli'}}
InitSettingsSource {}
EnvSettingsSource {'v0': 'env no prefix', 'sub-model': {'v1': 'env__no__prefix', 'v2': 'env__no__prefix'}}
DotEnvSettingsSource {}
SecretsSettingsSource {}
DefaultSettingsSource {'sub_model': {'v1': 'top default', 'v2': b'hello', 'v3': 33}}
in test cli: Settings2()= v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)

The difference between this issue and #406 is the MYTEST_ prefix which the EnvSettingsSource is missing during resolution. i.e., it resolves to the non-prefixed environment variables, "env no prefix", instead of the prefixed ones, "env with prefix".

@kschwab
Copy link
Contributor

kschwab commented Nov 21, 2024

Also, @hramezani there is still an issue with alias resolution consistency we'll need to address. Actually, I think this has potential to be general issue for any source. In the above example, the {'v1': 'cli'} should be propagated downwards but it doesn't because of mismatched alias resolution (i.e. 'sub-model' vs 'sub_model'), similar to #466 but in EnvSettingsSource.

e.g., printing what CliSettingsSource resolves to before internally passing it to EnvSettingsSource for resolution:

argv: ['Settings2', '--sub_model.v1=cli']
{'sub_model.v1': 'cli'}                        # what CLI settings resolved
CliSettingsSource {'sub-model': {'v1': 'cli'}} # what internal env settings resolved
:
EnvSettingsSource {'sub-model': {'v1': 'env__no__prefix', 'v2': 'env__no__prefix'}, ...}
:
in test cli: Settings2()= v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)

argv: ['Settings2', '--sub-model.v1=cli']
{'sub_model.v1': 'cli'}                        # what CLI settings resolved
CliSettingsSource {'sub-model': {'v1': 'cli'}} # what internal env settings resolved
:
EnvSettingsSource {'sub-model': {'v1': 'env__no__prefix', 'v2': 'env__no__prefix'}, ...}
:
in test cli: Settings2()= v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)

Because of "sub-model" vs "sub_model" difference it will not propagate correctly. In case of AliasChoices(field_name, field_name.replace("_", "-")), "sub_model" alias has higher precedence to "sub-model", but EnvSettingsSource currently resolves to "sub-model".

@braindevices
Copy link
Author

braindevices commented Nov 22, 2024

@kschwab the order does not even related to AliasChoices, if I make the sub-model has higher priority:

        alias_generator=AliasGenerator(
            validation_alias=lambda field_name: AliasChoices(
                field_name.replace("_", "-"),
                field_name
            )
        )

The final result won't change, because now the DefaultSettingsSource switch to 'sub-model':

argv: ['setting', '--sub_model.v1=cli']
CliSettingsSource: {'sub_model': {'v1': 'cli'}}
InitSettingsSource: {}
EnvSettingsSource: {'v0': 'env no prefix', 'sub_model': {'v1': 'env__no__prefix', 'v2': 'env__no__prefix'}}
DotEnvSettingsSource: {}
SecretsSettingsSource: {}
DefaultSettingsSource: {'sub-model': {'v1': 'top default', 'v2': b'hello', 'v3': 33}}
v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)
in test cli: Settings2()= v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)
argv: ['setting', '--sub-model.v1=cli']
CliSettingsSource: {'sub_model': {'v1': 'cli'}}
InitSettingsSource: {}
EnvSettingsSource: {'v0': 'env no prefix', 'sub_model': {'v1': 'env__no__prefix', 'v2': 'env__no__prefix'}}
DotEnvSettingsSource: {}
SecretsSettingsSource: {}
DefaultSettingsSource: {'sub-model': {'v1': 'top default', 'v2': b'hello', 'v3': 33}}
v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)
in test cli: Settings2()= v0='env no prefix' sub_model=SubModel(v1='top default', v2=b'hello', v3=33)

So the DefaultSettingsSource has the correct priority. But the CliSettingsSource and EnvSettingsSource use the last field_key presented which is kind of reversed priority

@kschwab
Copy link
Contributor

kschwab commented Nov 22, 2024

The final result won't change, because now the DefaultSettingsSource switch to 'sub-model':

Correct. It's an internal issue related to alias resolution consistency. Externally it will be the same result.

the order does not even related to AliasChoices

It's related because internally it impacts how aliases are resolved during merging. It's a general problem, but for built-in sources they should all use the same strategy.

But the CliSettingsSource and EnvSettingsSource use the last field_key presented which is kind of reversed priority

Technically it's just EnvSettingsSource. CliSettingsSource uses same resolution as DefaultSettingsSource, but the final result is handled by EnvSettingsSource, which is why they appear to align.

In any case, they all just need to use the same strategy. It is a separate issue from the MYTEST_ prefix problem.

I'll leave the MYTEST_ prefix for @hramezani to resolve. However, I can pick up the alias resolution issue under a separate PR. I'll try to get to it soon.

@hramezani
Copy link
Member

hramezani commented Nov 26, 2024

Thanks @kschwab and @braindevices for working on this issue.

I'll leave the MYTEST_ prefix for @hramezani to resolve

@kschwab, Could you explain the MYTEST_ prefix issue and provide an MRE? Unfortunately, the description is too long. It would be great if you could help me resolve the issue.

@kschwab
Copy link
Contributor

kschwab commented Nov 26, 2024

Sure @hramezani. Below is MRE with explanation of issue:

class SubModel(BaseModel):
    v1: str = 'model default'

class SettingsPass(BaseSettings):
    model_config = SettingsConfigDict(
        env_nested_delimiter='__',
        nested_model_default_partial_update=True,

        # Alias generator that yields "sub_model" and "sub-model".
        alias_generator=AliasGenerator(
            validation_alias=lambda s: AliasChoices(s, s.replace('_', '-'))
        )
    )

    sub_model: SubModel = SubModel(v1='top default')


class SettingsFail(SettingsPass):
    # Same as SettingsPass, but uses environment prefix "MYTEST_"
    model_config = SettingsConfigDict(env_prefix='MYTEST_')


# Passes when not using a prefixed environment variable.
os.environ['SUB_MODEL__V1'] = 'env without prefix'
assert SettingsPass().model_dump() == {
    'sub_model': {'v1': 'env without prefix'}
}


# Fails when using a prefixed environment variable. It is related to using the
# AliasGenerator in combination with environment prefix.
os.environ['MYTEST_SUB_MODEL__V1'] = 'env with prefix'
assert SettingsFail().model_dump() == {
    'sub_model': {'v1': 'env with prefix'}
}

I suspect it is related to new functionality added in pydantic/pydantic#10424, which now must be accounted for.

Nevermind, it was already supported. So some issue with the above alias generator in combination with environment prefix.

@kschwab
Copy link
Contributor

kschwab commented Nov 26, 2024

Also, @braindevices @hramezani I opened #481 to resolve the separate alias resolution issue.

@hramezani
Copy link
Member

Thanks @kschwab for the fix.

for the SettingsFail model that has env_prefix and alias together, I think they don't work together and we already mentioned it in the doc:

image

@kschwab
Copy link
Contributor

kschwab commented Nov 27, 2024

@hramezani thanks, I forgot about this.

@braindevices this is why the MYTEST_ prefix env vars are ignored on aliased env vars. Otherwise, for non-prefixed alias env vars it will work with cli --opt-with-dash etc. after #481 fix. Is this issue ok to be closed?

@braindevices
Copy link
Author

braindevices commented Nov 28, 2024

@kschwab @hramezani thanks for the heads up. Hmmm, this makes the whole solution a bit messier.

Does this means I need to do something like AliasChoices("MYTEST_" + field_name, field_name, field_name.replace("_", "-")) which will be very messy, and if I do not use prefix, then if have other process using pydantic setting with EnvSettingsSource, there will be a lot of conflict.

Wonder if we need to rethink about how should we handle the environment var names and the - to _ converting in CliSettingsSource.

A. maybe we should put a flag in CliSettingsSource to convert between - and _, in the end the key name should always be XX_YY, then we do not need to use AliasChoices
B. in EnvSettingsSource, we enforce prefix when resolve the field name to environment var name. Why we decide not to apply the prefix to alias?

@hramezani
Copy link
Member

Does this means I need to do something like AliasChoices("MYTEST_" + field_name, field_name, field_name.replace("_", "-")) which will be very messy, and if I do not use prefix, then if have other process using pydantic setting with EnvSettingsSource, there will be a lot of conflict.

I think yes. if you want to have both env_prefix and alias together.

A. maybe we should put a flag in CliSettingsSource to convert between - and _, in the end the key name should always be XX_YY, then we do not need to use AliasChoices

@kschwab do you think this is a good idea? how complex is it?

B. in EnvSettingsSource, we enforce prefix when resolve the field name to environment var name. Why we decide not to apply the prefix to alias?

Don't know what was the reason. I think at some point we should make aliases with env_prefix work but I am not sure it will happen soon.

@kschwab
Copy link
Contributor

kschwab commented Dec 1, 2024

A. maybe we should put a flag in CliSettingsSource to convert between - and _, in the end the key name should always be XX_YY, then we do not need to use AliasChoices

@kschwab do you think this is a good idea? how complex is it?

I don't think this would be too difficult, it's a great suggestion. I think it makes sense to have a flag like this, e.g. cli_kebab_case, which would replace the need for AliasGenerator. It also helps work around issue B as well which is a plus.

I'll work on putting up a PR soon.

@kschwab
Copy link
Contributor

kschwab commented Dec 1, 2024

@braindevices thanks again for the suggestion on option A. That actually came out really clean and is a nice addition. Wish we would have had that in there sooner 👍🏾

Opened #489 for resolution.

@braindevices
Copy link
Author

braindevices commented Dec 1, 2024

@kschwab thanks a lot for the implementation. But here is some question about the PR.

!!! note
CLI kebab case does not apply to subcommand or positional arguments, which must still use aliasing.

  • How does it behave for _ in nested field name? for example sub_model.v_test will it be sub-model.v-test or sub_model.v_test
  • what do you mean does not apply to subcommand? does it means the sub_cmd will still be sub_cmd instead of sub-cmd? or do you means the sub_cmd --sub_arg will not become sub_cmd --sub-arg?
    if it is cannot handle --sub-arg then it is surprising and probably wrong. if just sub_cmd should remains the sub_cmd then it is probably ok, if we can still use alias. The envsettingsource for sub command args seems very complicate and probably already broken in some kind of way? Remember if we use kebab format then people would expect git like sub command git remote-add ... so it will be very awkward that sub_cmd will not change to sub-cmd automatically

I think the test case should include these cases to make sure that the behaviour is defined.

@kschwab
Copy link
Contributor

kschwab commented Dec 2, 2024

How does it behave for _ in nested field name? for example sub_model.v_test will it be sub-model.v-test or sub_model.v_test

It will be --sub-model.v-test.

what do you mean does not apply to subcommand? does it means the sub_cmd will still be sub_cmd instead of sub-cmd?

Correct. Subcommand and positional args would not have kebab applied. All other args would, e.g. sub_cmd --sub-arg.

Remember if we use kebab format then people would expect git like sub command git remote-add ... so it will be very awkward that sub_cmd will not change to sub-cmd automatically

My concern here is that cli_kebab_case forces kebab case for any alias/name (at CLI only), which feels more heavy handed in subcommand and positonal arg cases. However, I agree it's more consistent. Will update.

@kschwab
Copy link
Contributor

kschwab commented Dec 3, 2024

PR has been updated with feedback. cli_kebab_case now applies to all args.

@braindevices
Copy link
Author

@kschwab thanks, it looks perfect now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants