Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying connection URL in the environment #1789

Merged
merged 2 commits into from
Apr 13, 2023

Conversation

raboof
Copy link
Contributor

@raboof raboof commented Dec 6, 2021

Since Omeka just passes through these options to the Doctrine DBAL it
didn't seem elegant to give Omeka 'knowledge' of the individual fields.
Instead 'just' allowing to set the URL is sufficient, since fields from
the URL will overwrite the individual settings.

https://www.doctrine-project.org/projects/doctrine-dbal/en/latest/reference/configuration.html#getting-a-connection

Fixes #1765

@raboof raboof force-pushed the allow-setting-db-via-environment branch from f829497 to 045dd58 Compare December 7, 2021 08:05
@zerocrates
Copy link
Member

My only concern here is it seems that the URL specified would be able to override the engine, which feels like a pretty forseeable trouble point.

@raboof
Copy link
Contributor Author

raboof commented Dec 17, 2021

it seems that the URL specified would be able to override the engine

That sounds correct to me, yes

which feels like a pretty forseeable trouble point.

What kind of trouble do you expect?

Are you thinking of security concerns? My impression is if an attacker can write to the environment they can pretty much do anything they like already (via influencing other more low-level variables like LD_PRELOAD or the PATH - I'm sure there are more vectors), so this doesn't seem to make things worse.

From a usability standpoint, of course if people select unsupported engines they are unlikely to work, but that seems expected :).

@zerocrates
Copy link
Member

My concern wasn't so much for security but just support, as a pretty accessible footgun potentially. It's not necessarily a big problem just because only a certain subset of users will use this feature. Charset would be a bigger issue as it's easy to forget and leaving it to a bad default can cause invisible but insidious problems, but since we can provide a default for that it should be fine.

This probably won't go in for 3.2, our upcoming release, but could go into the next one.

@raboof
Copy link
Contributor Author

raboof commented Dec 20, 2022

This probably won't go in for 3.2, our upcoming release, but could go into the next one.

Might be a candidate for 4.0.0?

@zerocrates
Copy link
Member

We had a lot going on for 4.0.0 and, as you've seen, this didn't make it. But I haven't forgotten about it and we should be able to get it in for 4.1.0.

@raboof
Copy link
Contributor Author

raboof commented Apr 7, 2023

We had a lot going on for 4.0.0 and, as you've seen, this didn't make it.

No worries, congrats on the release! Happily running it in prod (with this patch ;) )

But I haven't forgotten about it and we should be able to get it in for 4.1.0.

Would be great!

@zerocrates
Copy link
Member

A quick note: have you tried this out?

A typical way users set variables (Apache SetEnv, passing through PHP-FPM) doesn't get picked up by the getenv here, presumably because of the second "local only" argument being used.

@raboof
Copy link
Contributor Author

raboof commented Apr 7, 2023

A quick note: have you tried this out?

Yes ;)

A typical way users set variables (Apache SetEnv, passing through PHP-FPM) doesn't get picked up by the getenv here, presumably because of the second "local only" argument being used.

Ah, that could be - I'm setting it as an environment variable when starting a docker container, where the docker startup script starts PHP-FPM and nginx. I'm not sure if there's any security implications from setting local_only to false - the docs are a bit vague :/ - if you know it's safe I'm happy to flip it, otherwise let's do some research.

@zerocrates
Copy link
Member

I tried to dig into that a little too, since as you say the docs are a little vague.

I'm pretty sure I have it figured out: really this specifically comes from a problem with the env var HTTP_PROXY: it's a quasi-standard for setting a proxy to use for outgoing HTTP connections, but it's also the variable that an incoming Proxy: header from an HTTP request will get because of how CGI handles headers.

getenv normally checks first for variables coming from the SAPI (fpm, Apache, etc.) and then calls the OS getenv only if there was no result from the SAPI, and this is problematic for the HTTP_ variables in particular since it will end up trusting the value from a request over one from the process environment.

In our case... I don't think there's an issue using getenv without the second argument: this variable isn't derived from a header so there's no risk there of the type that lead to this functionality existing in the first place. There's a pattern of doing the "local only" check, then the regular getenv... but all that does is reverse which source "wins" if the variable is specified differently in both places.

My understanding of the distinction here is that the way you have it now will not work if you try to set the var using SetEnv in Apache (or fastcgi_param in the nginx config), but will work if it's set in the worker process's environment (so, if it's in the FPM server's startup environment and the FPM clear_env setting is off, or it's explicitly set with the env FPM setting).

One-arg getenv will respect the SetEnv/fastcgi_param value, or one in the worker process env, and it will prefer the former over the latter. I don't see any reason to be upset or concerned with that order of things so I think one-arg getenv is fine here.

@raboof raboof force-pushed the allow-setting-db-via-environment branch from 045dd58 to f9c7c88 Compare April 8, 2023 09:27
@raboof
Copy link
Contributor Author

raboof commented Apr 8, 2023

it's a quasi-standard for setting a proxy to use for outgoing HTTP connections, but it's also the variable that an incoming Proxy: header from an HTTP request will get because of how CGI handles headers.

😆

I don't think there's an issue using getenv without the second argument: this variable isn't derived from a header so there's no risk there of the type that lead to this functionality existing in the first place.

Agreed - as OMEKA_CONNECTION_URL doesn't start with HTTP_ there's no risk of allowing an attacker to trick us into connecting to a different URL.

My understanding of the distinction here is that the way you have it now will not work if you try to set the var using SetEnv in Apache (or fastcgi_param in the nginx config), but will work if it's set in the worker process's environment (so, if it's in the FPM server's startup environment and the FPM clear_env setting is off, or it's explicitly set with the env FPM setting).

That matches what I see, yes

One-arg getenv will respect the SetEnv/fastcgi_param value, or one in the worker process env, and it will prefer the former over the latter. I don't see any reason to be upset or concerned with that order of things so I think one-arg getenv is fine here.

Agreed!

@zerocrates
Copy link
Member

OK, I've poked around with this and I'm pretty happy with it. The ability for it to work with the database.ini absent, and correctly throw the error if both the URL and the file are absent, seems to work well.

I think on my prior concern about this setting the engine... it's already an "advanced" feature so I'm thinking we're probably okay not guarding specifically against people setting an engine that just won't work.

One thing did occur to me: let's go ahead and change the env var name here to OMEKA_DB_CONNECTION_URL, just so we're clear on it being the DB and not any other "connection" we might make.

raboof added 2 commits April 12, 2023 09:07
Since Omeka just passes through these options to the Doctrine DBAL it
didn't seem elegant to give Omeka 'knowledge' of the individual fields.
Instead 'just' allowing to set the URL is sufficient, since fields from
the URL will overwrite the individual settings.

https://www.doctrine-project.org/projects/doctrine-dbal/en/latest/reference/configuration.html#getting-a-connection

Fixes omeka#1765
@raboof raboof force-pushed the allow-setting-db-via-environment branch from f9c7c88 to 3e005ea Compare April 12, 2023 07:08
@raboof
Copy link
Contributor Author

raboof commented Apr 12, 2023

let's go ahead and change the env var name here to OMEKA_DB_CONNECTION_URL, just so we're clear on it being the DB and not any other "connection" we might make.

Agreed, updated & re-tested

@zerocrates zerocrates merged commit 6b0967c into omeka:develop Apr 13, 2023
@zerocrates
Copy link
Member

OK, we're good to go.

So I'd expect this to be out (eventually) in 4.1.0.

@raboof
Copy link
Contributor Author

raboof commented Apr 14, 2023

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow reading database credentials from environment
2 participants