Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncssh API change breaks parsing of config #54

Open
martijnwgnr opened this issue Dec 17, 2024 · 10 comments
Open

asyncssh API change breaks parsing of config #54

martijnwgnr opened this issue Dec 17, 2024 · 10 comments

Comments

@martijnwgnr
Copy link

martijnwgnr commented Dec 17, 2024

The v2.19.0 release of asyncssh has added support for hostname canonicalization. With this change, two new arguments were added to SSHClientConfig.load: canonical and final. We should also pass these parameters in sshfs, because right now the local_user and user are passed in their places, which breaks parsing the config in config.py:

sshfs/sshfs/config.py

Lines 23 to 31 in 34b52b2

return SSHClientConfig.load(
last_config,
config_files,
reload,
local_user,
user,
host,
port,
)

https://github.com/ronf/asyncssh/blob/95756fa4eb54826adb9156b87443d945fd1ed4f8/asyncssh/config.py#L428-L430

@shcheklein
Copy link
Collaborator

@ronf can we by chance make these new arguments into kwargs for example of find some other way to keep it backward compatible? I can bump the minimum required version, but if it's an easy change on your end would prefer to keep it less strict (it's important in some environments, e.g. it takes time for new packages to become available).

@ronf
Copy link

ronf commented Dec 18, 2024

SSHClientConfig is not a public API in AsyncSSH. It's used internally by AsyncSSH outside of the config module but it's not one of the symbols exposed in asyncssh/__init__.py. Importing from anything not present in the top-level asyncssh namespace could lead to the kind of issue you're seeing here.

If you want to preload some configuration options, the preferred approach is to use "options" rather than "config". If you do use config, the only supported mechanism is to pass in a list of path names of config files to load (which is handled as part of setting "options").

If I were to make these keyword args with default values, the new functionality of "match canonical" and "match final" would be broken as a result of not setting these values correctly. Calls to this "load" function really need to be coming from asyncssh/connection.py, which has enough information to set those new arguments properly.

Do you know why sshfs felt the need for this wrapper? How is it used?

@shcheklein
Copy link
Collaborator

thanks, @ronf

the preferred approach is to use "options" rather than "config".

can you point me to the class / code please?

Do you know why sshfs felt the need for this wrapper? How is it used?

I think it's an utility function that was needed for DVC (to preload some config indeed)

https://github.com/iterative/dvc-ssh/blob/main/dvc_ssh/__init__.py#L44

@ronf
Copy link

ronf commented Dec 19, 2024

From what I can see, that _prepare_credentials method is never called. It seems to be trying to pull some fields from its own keyword args but then pull other args from a user config file, but translating those into its own names for things rather than letting AsyncSSH read the user config directly. Unfortunately, using AsyncSSH's config parser outside of its options framework isn't supported, as the AsyncSSH options and config classes have a lot of tight interactions.

Generally speaking, the expectation is that you'll pass AsyncSSH a config="filename" argument on a top-level call like asyncssh.connect() if you want to read an OpenSSH config file, or it will default to reading from ~/.ssh/config if you don't specify something else or pass in None to disable that. Loading options in advance is possible but not recommended in the case of OpenSSH config files, as the values they set can change depending on things like the host, port, username, etc. in a request. So, the config file needs to be parsed each time to return the proper results.

I think it would be possible to take the existing keyword args in the DVC code and convert those to AsyncSSH "options" arguments (see the SSHClientConnectionOptions class), and allow one of those arguments for the user config to be passed in as a config argument (or just let AsyncSSH use the default) if you wanted to allow some values to come from an OpenSSH config file when they weren't already specified as options directly. You'd then pass the resulting SSHClientConnectionOptions instance as an options argument to a call like asynsssh.connect().

In OpenSSH 2.15.0 and later, there's a class method called construct() on SSHClientConnectionOptions which is the recommended way to construct the options object. This makes sure to do all the evaluation in an executor thread, to avoid blocking the asyncio event loop when evaluating some of the options.

You could also convert the keyword args to a new dictionary of AsyncSSH keyword args and just pass those in as keyword args to asyncssh.connect() directly, without constructing an SSHClientConnectionOptions object yourself. That's by far the simplest and cleanest of the approaches, I think. In some cases, some of the keywords might need to be renamed to match the AsyncSSH naming if you want to maintain the existing names DVC is using, but that should be straightforward.

@99-NinetyNine
Copy link

  1. "asyncssh<=2.18.0",
    was one solution as in pull: ssh remote fails, error in asyncssh 2.19.0, missing arguments iterative/dvc#10656
    this approach ensures backward compatibility.

  2. But for more patches or other reasons, if up to date async ssh is to be used, then I suggest changing the parse method directly. As I have done.

  3. Further, This code in https://github.com/fsspec/sshfs/blob/main/setup.cfg was updated 2 years ago. So I think eventually, any new pip install fails. But only previous installations keeps working since they had downloaded the earlier versions of async ssh.
    image

Conclusion:

  • So, checking version of async ssh in the code and calling method accordingly if we sepcify versions (<= or >=) manner.
  • Or, update config parse.py to a exact asyncssh version. (== in setup). I prefer this approach.

@99-NinetyNine
Copy link

I have provided the PR. Haha, test case is to be updated. The string is returned for config.get("ProxyCommand") but is compared against a list by using split(). After this I think it should work, :)

@99-NinetyNine
Copy link

image
that failed test is cause I think.. I changed test files :) lol. Anyways, it is clear now.
split() is causing issue.

@ronf
Copy link

ronf commented Jan 9, 2025

I strongly recommend against pinning a specific version of AsyncSSH here. That would mean that anyone using this package would not be able to pick up any security vulnerability fixes, or other bug fixes.

@shcheklein
Copy link
Collaborator

@ronf thanks for the detailed response.

I also don't see an immediate need (see some questions / caveats below) to DVC to try to build an effective config itself. The logic seems to go back many years ago to this ticket:

iterative/dvc#1613

and this PR:

iterative/dvc#1965

It was using back then paramiko to parse the config and do the connection. I wonder if paramiko didn't have a way to override values in connect / init - e.g. I want to pass my own (from DVC config) port number, or additional key file, etc. Thus we were parsing user ssh config and creating our own dict.

Now, we might not need all of this.


A few caveats that I can immediately see:

  1. We show user, hostname, port when we ask for a password / passphrase in prepare credentials. That won't be as precise at the very least (we can't show a proper hostname w/o parsing the config I guess). Can be annoying when you don't see proper, effective user / hostname when you asked for a password / passphrase.

  2. username falls back to the getpass.getuser() value in:

        login_info["username"] = (
            config.get("user")
            or config.get("username")
            or user_ssh_config.get("User")
            or getpass.getuser()
        )

will assyncssh do the same in connect?

  1. login_info["proxy_command"] = user_ssh_config.get("ProxyCommand") - just not sure why we have a separate line at all for this. I assume it should be also parsed automatically by connect? will it?

  2. Key files:

    if config.get("keyfile"):
           raw_keys.append(config.get("keyfile"))
    elif user_ssh_config.get("IdentityFile"):
           raw_keys.extend(user_ssh_config.get("IdentityFile"))

Can we pass an additional key file to asyncssh? Not override completely, but append to the ones in the SSH config?

@ronf if you have quick suggestions / thoughts on these items ^^ I would appreciate any help.

@ronf
Copy link

ronf commented Jan 22, 2025

Starting with the easiest, the username is gotten from one of the following (in order of precedence):

  • A 'username' keyword arg in the connect() call
  • A 'username' keyword arg in an SSHClientConnectionOptions
  • A 'User' config setting in an SSH config file
  • The local user, queried by getpass.getuser()

Really, there's a value local_username is always gotten from getpass.getuser() first and that is available as one of the fields you can match on in a config file, but this value will also become the 'User' value when parsing the config if it is not explicitly set. However, if a username is set directly in connect() or via options, any User value in config files is ignored.

As for ProxyCommand, it shouldn't need to be special-cased. It can come from keyword args, options, or config, with a similar precedence to the above, but with the default being unset. I'm guessing the "login_info" you show here is related to trying to convert from SSH config values to Paramiko values you mentioned above.

As for showing user/host/port, there are really two different sets of values here. There's the requested user, host, and port and there's the rewritten version of those after parsing config files, if you decide to support those. If your application passes values to connect() via keyword arguments or options, that will take precedence over any config files, which may or may not be what you want.

As for keys, you can have multiple IdentityFile lines in a config file that can be conditionally matched, building up a set of keys to load for all the config lines which match. If you specify keys via keyword arguments or options, though, you have to specify the complete list (doing any conditional logic yourself to build that list).

Note that you can load multiple config files. However, there isn't an easy way right now to state that you want to load the system default config if it exists and then some config file of your own (or vice-versa in the ordering). You'd have to check yourself to see if the system config file existed and then specify the complete list of files to load, being careful to pick the order, as most config settings will lock on the first config rule which matches, ignoring later lines that match. IdentityFile is an exception in that regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants