Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replace glob-to-regexp with URLPattern #392

Merged
merged 7 commits into from
Jul 26, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion node/config.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ test('Loads function paths from the in-source `config` function', async () => {
expect(routes[2]).toEqual({ function: 'framework-func1', pattern: '^/framework-func1/?$' })
expect(routes[3]).toEqual({ function: 'user-func1', pattern: '^/user-func1/?$' })
expect(routes[4]).toEqual({ function: 'user-func3', pattern: '^/user-func3/?$' })
expect(routes[5]).toEqual({ function: 'user-func5', pattern: '^/user-func5/.*/?$' })
expect(routes[5]).toEqual({ function: 'user-func5', pattern: '^/user-func5(?:/(.*))/?$' })

expect(postCacheRoutes.length).toBe(1)
expect(postCacheRoutes[0]).toEqual({ function: 'user-func4', pattern: '^/user-func4/?$' })
Expand Down
18 changes: 9 additions & 9 deletions node/manifest.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ test('Generates a manifest with display names', () => {
}
const manifest = generateManifest({ bundles: [], declarations, functions, internalFunctionConfig })

const expectedRoutes = [{ function: 'func-1', pattern: '^/f1/.*/?$' }]
const expectedRoutes = [{ function: 'func-1', pattern: '^/f1(?:/(.*))/?$' }]
expect(manifest.function_config).toEqual({
'func-1': { name: 'Display Name' },
})
Expand All @@ -65,7 +65,7 @@ test('Generates a manifest with a generator field', () => {
}
const manifest = generateManifest({ bundles: [], declarations, functions, internalFunctionConfig })

const expectedRoutes = [{ function: 'func-1', pattern: '^/f1/.*/?$' }]
const expectedRoutes = [{ function: 'func-1', pattern: '^/f1(?:/(.*))/?$' }]
const expectedFunctionConfig = { 'func-1': { generator: '@netlify/[email protected]' } }
expect(manifest.routes).toEqual(expectedRoutes)
expect(manifest.function_config).toEqual(expectedFunctionConfig)
Expand All @@ -79,13 +79,13 @@ test('Generates a manifest with excluded paths and patterns', () => {
]
const declarations: Declaration[] = [
{ function: 'func-1', path: '/f1/*', excludedPath: '/f1/exclude' },
{ function: 'func-2', pattern: '^/f2/.*/?$', excludedPattern: '^/f2/exclude$' },
{ function: 'func-2', pattern: '^/f2(?:/(.*))/?$', excludedPattern: '^/f2/exclude$' },
{ function: 'func-3', path: '/*', excludedPath: '/**/*.html' },
]
const manifest = generateManifest({ bundles: [], declarations, functions })
const expectedRoutes = [
{ function: 'func-1', pattern: '^/f1/.*/?$' },
{ function: 'func-2', pattern: '^/f2/.*/?$' },
{ function: 'func-1', pattern: '^/f1(?:/(.*))/?$' },
{ function: 'func-2', pattern: '^/f2(?:/(.*))/?$' },
{ function: 'func-3', pattern: '^/.*/?$' },
]

Expand Down Expand Up @@ -127,7 +127,7 @@ test('Filters out internal in-source configurations in user created functions',
]
const declarations: Declaration[] = [
{ function: 'func-1', path: '/f1/*' },
{ function: 'func-2', pattern: '^/f2/.*/?$' },
{ function: 'func-2', pattern: '^/f2(?:/(.*))/?$' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the impact of changing from /.* to (?:/(.*)) with this go benchmark:

import (
	"regexp"
	"testing"
)

func BenchmarkCompileGroup(b *testing.B) {
	for n := 0; n < b.N; n++ {
		regexp.Compile("^/f2(?:/(.*))/?$")
	}
}

func BenchmarkCompile(b *testing.B) {
	for n := 0; n < b.N; n++ {
		regexp.Compile("^/f2/.*/?$")
	}
}

func BenchmarkExecuteGroup(b *testing.B) {
	r, _ := regexp.Compile("^/f2(?:/(.*))/?$")
	for n := 0; n < b.N; n++ {
		r.MatchString("/f2/test/")
		r.MatchString("/different")
	}
}

func BenchmarkExecute(b *testing.B) {
	r, _ := regexp.Compile("^/f2/.*/?$")
	for n := 0; n < b.N; n++ {
		r.MatchString("/f2/test/")
		r.MatchString("/different")
	}
}

It's naïve, so i'm not sure how much we can trust it, but it shows that the group has a slight negative impact on performance (while having the same semantics, I think):

=== RUN   BenchmarkCompileGroup
BenchmarkCompileGroup
BenchmarkCompileGroup-10          461311              2570 ns/op         5000 B/op          61 allocs/op
=== RUN   BenchmarkCompile
BenchmarkCompile
BenchmarkCompile-10               542733              2152 ns/op         4240 B/op          53 allocs/op
=== RUN   BenchmarkExecuteGroup
BenchmarkExecuteGroup
BenchmarkExecuteGroup-10         7894125               152.6 ns/op          0 B/op           0 allocs/op
=== RUN   BenchmarkExecute
BenchmarkExecute
BenchmarkExecute-10              8257365               145.7 ns/op          0 B/op           0 allocs/op
PASS

This is a ~17% hit in compile time and ~5% in matching time.
Is there an easy way for us to prevent this change from adding capture groups to the output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove capturing groups ourselves, since we're parsing the regular expression into an AST and transforming it. But I don't think that's a huge priority.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay 👍 Maybe we can capture this in an issue, so we don't forget about it.

]
const userFunctionConfig: Record<string, FunctionConfig> = {
'func-1': {
Expand Down Expand Up @@ -178,7 +178,7 @@ test('Includes failure modes in manifest', () => {
]
const declarations: Declaration[] = [
{ function: 'func-1', path: '/f1/*' },
{ function: 'func-2', pattern: '^/f2/.*/?$' },
{ function: 'func-2', pattern: '^/f2(?:/(.*))/?$' },
]
const userFunctionConfig: Record<string, FunctionConfig> = {
'func-1': {
Expand Down Expand Up @@ -288,8 +288,8 @@ test('Generates a manifest with layers', () => {
{ function: 'func-2', path: '/f2/*' },
]
const expectedRoutes = [
{ function: 'func-1', pattern: '^/f1/.*/?$' },
{ function: 'func-2', pattern: '^/f2/.*/?$' },
{ function: 'func-1', pattern: '^/f1(?:/(.*))/?$' },
{ function: 'func-2', pattern: '^/f2(?:/(.*))/?$' },
]
const layers = [
{
Expand Down
13 changes: 7 additions & 6 deletions node/manifest.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
import { promises as fs } from 'fs'
import { join } from 'path'

import globToRegExp from 'glob-to-regexp'

import type { Bundle } from './bundle.js'
import { Cache, FunctionConfig, Path } from './config.js'
import { Declaration, parsePattern } from './declaration.js'
Expand All @@ -11,6 +9,7 @@ import { FeatureFlags } from './feature_flags.js'
import { Layer } from './layer.js'
import { getPackageVersion } from './package_json.js'
import { nonNullable } from './utils/non_nullable.js'
import { ExtendedURLPattern } from './utils/urlpattern.js'

interface Route {
function: string
Expand Down Expand Up @@ -166,14 +165,16 @@ const generateManifest = ({
}

const pathToRegularExpression = (path: string) => {
// We use the global flag so that `globToRegExp` will not wrap the expression
// with `^` and `$`. We'll do that ourselves.
const regularExpression = globToRegExp(path, { flags: 'g' })
const pattern = new ExtendedURLPattern({ pathname: path })

// Removing the `^` and `$` delimiters because we'll need to modify what's
// between them.
const source = pattern.regexp.pathname.source.slice(1, -1)

// Wrapping the expression source with `^` and `$`. Also, adding an optional
// trailing slash, so that a declaration of `path: "/foo"` matches requests
// for both `/foo` and `/foo/`.
const normalizedSource = `^${regularExpression.source}\\/?$`
const normalizedSource = `^${source}\\/?$`

return normalizedSource
}
Expand Down
7 changes: 7 additions & 0 deletions node/utils/urlpattern.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import { URLPattern } from 'urlpattern-polyfill'

export class ExtendedURLPattern extends URLPattern {
// @ts-expect-error Internal property that the underlying class is using but
// not exposing.
regexp: Record<string, RegExp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea. This will most probably not be available once this is available in Node.js natively. Also looking at the docs of how URLPattern is supposed to be used, I do not think it matches our usecase of converting a glob to a regex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mdn docs mention this library: https://github.com/pillarjs/path-to-regexp maybe we should use this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will most probably not be available once this is available in Node.js natively.

I know that this property will not be exposed by the native URLPattern because it's not part of the spec, but that doesn't mean we can't continue to use the polyfill with its comprehensive test suite even after the primitive makes its way into Node natively.

Also looking at the docs of how URLPattern is supposed to be used, I do not think it matches our usecase of converting a glob to a regex.

The native URLPattern does not emit regular expressions, it just uses them internally. That's why in this PR I'm exposing the internal property where the regex is stored. The way we're using it is definitely not how it's built to be used; we're using it opportunistically to leverage its spec-backed syntax but emit a different output.

The mdn docs mention this library: https://github.com/pillarjs/path-to-regexp maybe we should use this?

That is a user-defined library though, not a standard. The whole point of this is to say that the path expressions you write when defining edge functions use the URLPattern spec from the web platform, which you can experiment with equally in Deno, Node, or the browser.

Using path-to-regexp wouldn't bring any benefits over the library we use at the moment, since they're both defined in user space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, up to you. I personally feel this is a terrible idea to rely on the internals and implementation details of other packages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that, but can you help find a better idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share the sentiment of this feeling "bad", but I think it's an example of us knowing the rules enough that we're allowed to break them. Since we're using a polyfill, the underlying implementation won't depend on the runtime, and since we have tests we'll notice when a change in the polyfill breaks this. I think it's fine - and if we find problems, as a last resort we can still reimplement the spec ourselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is that we're opening ourselves to the possibility of the internals of the polyfill changing, as it's not a public API. At the very least we should pin an exact version, but personally I'd be more inclined to say we should vendor the polyfill and add the subclass to our fork.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least we should pin an exact version

Totally. I actually meant to do this but ended up forgetting about it. Done in b308434.

My concern here is that we're opening ourselves to the possibility of the internals of the polyfill changing, as it's not a public API

If the internals change and the property we're using disappears or gets renamed, our own tests would catch it.

I'm not opposed to using our own fork if this makes people more comfortable — in fact, we do already have a fork, which I created in order to push a change upstream.

But if we go down that path, I would rather keep it on its own repo from where we can easily merge upstream changes, run the tests, etc. Not sure what we get by vendoring it in with Edge Bundler?

}
35 changes: 11 additions & 24 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@
"@commitlint/cli": "^17.0.0",
"@commitlint/config-conventional": "^17.0.0",
"@netlify/eslint-config-node": "^7.0.1",
"@types/glob-to-regexp": "^0.4.1",
"@types/node": "^14.18.32",
"@types/semver": "^7.3.9",
"@types/uuid": "^9.0.0",
Expand Down Expand Up @@ -83,7 +82,6 @@
"execa": "^6.0.0",
"find-up": "^6.3.0",
"get-port": "^6.1.2",
"glob-to-regexp": "^0.4.1",
"is-path-inside": "^4.0.0",
"jsonc-parser": "^3.2.0",
"node-fetch": "^3.1.1",
Expand All @@ -94,6 +92,7 @@
"regexp-tree": "^0.1.24",
"semver": "^7.3.5",
"tmp-promise": "^3.0.3",
"urlpattern-polyfill": "^8.0.2",
"uuid": "^9.0.0"
}
}