function of validateRegex in repoproviders #1920

minrk · 2025-01-29T13:04:56Z

repoproviders have a validateRegex field, which can apparently be used in the frontend to validate specs. This means specs are validated twice: once in the frontend for rendering purposes, and a second time with a different implementation for actual build errors. But since these implementations are not shared, different invalid specs may present differently.

As it is, the result is some inconsistency in handling different invalid specs:

https://mybinder.org/v2/nosuch/1234 renders a 404 page for no matching provider (though the http status is actually 200)
https://mybinder.org/v2/gh/org/ renders a generic 404 page for the invalid gh spec, though no information about the invalid spec is presented
https://mybinder.org/v2/gh/org/repo// passes validateRegex, but doesn't pass the more correct server-side spec validation, so the error about the invalid spec is presented in the build log (pre-new UI behavior):

Spec is not of the form "user/repo/ref", provided: "org/repo". Did you mean "org/repo/HEAD"?

As far as I can tell, validateRegex is used exclusively for react page routing. I think the error messages from server-side resolving is clearer, so I propose removing validateRegex, and replace the per-provider ?<buildSpec> regex here with ?<buildSpec>.+? I think that would result in more consistent error handling and clearer error messages, because there would be only one source of errors.

Does that make sense @yuvipanda ? Is there another use for the duplicate validateRegex I'm missing?

The text was updated successfully, but these errors were encountered:

minrk · 2025-01-29T13:28:05Z

the validation for gists appears to be missing a ? to make the ref group optional, too. I can work on fixing the issues in the validateRegexes if they should stay, but I'll wait since I think it probably makes more sense to remove the field altogether, if my understanding so far is correct (it may not be!).

yuvipanda · 2025-01-29T16:15:37Z

The original goal for putting them in the API and doing this in the frontend was to support URL parsing in other frontends, such as jupyterhub-fancy-profiles so they can have compatible URLs. I also want to explore a potentially different model for federation where the frontend makes routing choices, both for better resilience and the ability to credit federation members better. So the frontend app would be a separate app that simply talks to binderhub almost on API only mode.

This is how I had actually written it, and only had to add the provider specific handlers back to support opengraph cards and couldn't see a way to do it without that :(

I also felt that one of the reasons the frontend got complicated last time was because it got a little more entangled than I'd like, between the frontend and the backend.

So, I agree that we should try to consolidate all routing in one place. However, my preference is that we consolidate that all in the frontend.

In the longer run, I want us to do a v3 scheme based on autodetecting identifiers (via https://github.com/yuvipanda/repoproviders, WIP design doc in https://hackmd.io/e-8aqmQgRMGomFiAEJKJuQ). And in that design doc, I've written that avoiding regexes is one of the goals haha.

So here are what I think are possible paths forward:

Fix the validateRegex errors for now, and let it be.
Find a way to get rid of backend routing completely. This is my ideal scenario.
Rip out validateRegex and leave only backend routing. I'm also kinda ok with this, because I want to make more significant changes with the repoproviders and those changes include avoiding regexes.

Happy for you to actually take any of these paths forward @minrk!

minrk · 2025-01-30T10:12:15Z

Thanks for writing down your goals! That makes a lot of sense. I'm not 100% sure I know precisely what you mean by 'routing', but I'm going to take it to mean the URL/path handling, which is done in the React Router, but also the tornado Handler list.

If I've understood correctly, it's actually not the backend routing (splitting on provider id) that I see a problem with, but rather the restrictive spec parsing in validateRegex, which is ~unused.

I see a substantial difference between the two error URLs:

https://mybinder.org/v2/gh/partial/spec (invalid spec)
https://mybinder.org/random/wrong/path

but right now, both are treated the same: HTTP status 200, rendering a 404 page, which is not ideal. Removing validateRegex would result in the first URL presenting a fairly actionable "spec should look like org/repo/ref" error, instead of generic 404.

If we remove validateRegex, then we know the URL pattern is /v2/{provider.id}/{spec} where spec is "the rest" and let the server-side provider deal with it, including reporting on its validity (including resolution, etc.). I think that's consistent with your goal, but I'm not 100% sure. I'm not sure I see the benefit in parsing and validating the input argument to the provider before passing it along, when the main thing it accomplishes so far is to give a less informative error than the provider itself can give.

So my inclination based on what I can see so far is to take option 3 and remove validateRegex, at least for now, so the frontend still needs to understand provider id and which part of the URL is the spec, but not so much the contents of the spec, and the backend is responsible for interpreting the spec, which includes validation, since that's already implemented and can provide informative errors, etc. The routing can still be in the frontend, but for routing purposes, the frontend only understands 'spec' as a single opaque chunk handled as "the rest of the URL" and doesn't need to be parsed by the frontend into its components.

I think the detect regex serves a more clearly useful purpose, and should stick around (though I fully share your v3 goal of moving URL->builder to the server side without any regexes).

minrk · 2025-01-30T10:56:22Z

After writing all that, I'm actually okay with keeping validateRegex if it's used to show informative error messages in the UI. The main thing I don't think we should do is use it for routing, and always use .* for spec in the Router.

I see that there is an error message using validateRegex in LoadingPage, but I'm pretty sure this condition can never be triggered, specifically because LoadingPage is prohibited from handling a page that doesn't match validateRegex. If the only change we make is removing validateRegex from the Router in App.jsx, then the formatError branch is taken.

Here is what a user sees for the 3 cases, visiting the invalid /v2/gh/invalid URL:

current, with validateRegex used in the Router (generic 404):

only remove validateRegex from Router, let frontend validate and present error:

remove validateRegex from the frontend altogether, let backend handle invalid specs:

The backend clearly has the best chance to present a useful error, because it can run arbitrary code to prepare the error message in a provider-specific and mistake-specific way. The frontend can only present either a totally uninformative error, or make understanding regular expressions the user's problem (we could address this with an additional invalidSpecMessage field to present, instead of showing the regex to the user).

WDYT about these two choices, now:

remove validateRegex altogether, leave it to the server-side provider to present errors (i.e. invalid spec is an equivalent error to an unresolvable spec, which makes the most sense to me)
remove validateRegex from routing and add single 'spec.invalidMessage' or 'spec.description' to present when the frontend fails validation, rather than the raw regex?

rgaiacs mentioned this issue Jan 29, 2025

Trailing / in repo URL is not stripped #1919

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

function of validateRegex in repoproviders #1920

function of validateRegex in repoproviders #1920

minrk commented Jan 29, 2025

minrk commented Jan 29, 2025

yuvipanda commented Jan 29, 2025

minrk commented Jan 30, 2025

minrk commented Jan 30, 2025

function of validateRegex in repoproviders #1920

function of validateRegex in repoproviders #1920

Comments

minrk commented Jan 29, 2025

minrk commented Jan 29, 2025

yuvipanda commented Jan 29, 2025

minrk commented Jan 30, 2025

minrk commented Jan 30, 2025