Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameterized routing & page loaders #1523

Merged
merged 92 commits into from
Aug 29, 2024
Merged

parameterized routing & page loaders #1523

merged 92 commits into from
Aug 29, 2024

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Jul 17, 2024

This introduces two concepts: page loaders allow server-side rendering of dynamic Markdown pages at build time (like data loaders, but for pages), while parameterized routes allow both page and data loaders to generate multiple outputs with varying parameter values.

Note: after this merges, or if you check out this branch locally, you may need to run git clean -i to remove the previous copies of theme pages.

TODO

  • Parameterized data loader routing
  • Parameter in file name ([param].csv.js)
  • Parameter in folder name ([param]/data.csv.js)
  • Partially-parameterized names? (prefix-[param].csv.js)
  • Names with multiple parameters? ([param1]-[param2].csv.js)
  • Extract parameters from route (--param=foo)
  • Parameterized page routing
  • Parameterized JavaScript components (rewriting observable.params.param)
  • Parameterized JavaScript code blocks (rewriting observable.params.param)
  • Parameterized JavaScript code expressions (rewriting observable.params.param)
  • Respect shadowing when rewriting observable.params.param
  • Allow static string concatenation within FileAttachment calls
  • Parameterized stylesheets (no rewriting; not recommended but should work)
  • Parameterized static files (no rewriting; not recommended but should work)
  • Parameterized archives
  • Don’t allow empty parameter values to match (path/to/.csv)
  • Don’t allow empty parameter names ([].csv.js)
  • Parameter value enumeration from config during build
  • Page loaders
  • Fix sidebar and pager links
  • Fix search indexing
  • Default paths should include page loaders, too (just not parameterized routes) Future work
  • Rename the paths config option to params dynamicPaths
  • Documentation for parameterized routing
  • Documentation for page loaders
  • Tests for parameterized routing
  • Tests for page loaders

Fixes #245.
Fixes #931.

@mbostock
Copy link
Member Author

Precedence:

# literal request
/data/foo.csv

# first look for exact matches for each interpreter
/data/foo.csv.js
/data/foo.csv.py
/data/foo.csv.go

# parameterized data loader
/data/[user].csv.js --user=foo

# look for parameterized parents (1)
/[resource]/foo.csv.js --resource=data

# look for parameterized parents (2)
/[resource]/[user].csv.js --resource=data --user=foo

# then look for parent extractors
/data.zip
/data.zip.js
/data.zip.py
/data.zip.go
/data.tar.gz.js
/data.tar.gz.py
/data.tar.gz.go

# then look for parameterized extractors (3)
/[resource].zip.js --resource=data

Valid vs. invalid parameters:

# valid parameters
/[name]/foo.csv.js
/data/[name].csv.js

# invalid parameters
/foo-[name]/foo.csv.js
/[name]-bar/foo.csv.js
/data/[name]-bar.csv.js

@mbostock
Copy link
Member Author

mbostock commented Jul 25, 2024

Starting to head down the rabbit hole… 🎩 🐇 🤔

So far, this PR implements dynamic routing for data loaders: you can say e.g. FileAttachment("foo.json") and it will be served by [param].json.js with --param=foo. So far so good, but this isn’t especially useful as the parameter value (foo) must be declared statically (since FileAttachment requires a static string as argument, but moreover because we require all files to be statically declared so we know what to build).

It’ll be more useful when we also have dynamic routing for pages, so you can have e.g. /users/mbostock as a page, which is backed by /users/[user].md. But there’s no way to statically reference /users/mbostock/data.json as a relative path that I can think of. So we could instead have a page /users/mbostock/ backed by /users/[user]/index.md, and then a relative (static) reference to ./data.json resolves to /users/mbostock/data.json backed by /users/[user]/data.json.js which works. Yay.

But now imagine that we also have a JavaScript component /users/[user]/widget.js. This too could be imported as ./widget.js from /users/[user]/index.md. Should this be considered a “parameterized” component? Should there be a way to reference the parameter value within the code, such as process.env.USER, that gets baked-into the component when it is generated? Or should this be served as /users/[user]/widget.js and we effectively treat it as a static route within the dynamic route, serving the same content across all parameter values?

For now, I’m tempted to do the latter, since I think adding parameterized aliasing for JavaScript components (and likewise stylesheets) would introduce quite a bit of complexity since we need to track both the serving path (/users/mbostock/widget.js) and the source path (/users/[user]/widget.js) to traverse imports and compute content hashes correctly.

@mythmon
Copy link
Member

mythmon commented Jul 25, 2024

I'm excited to see progress on this feature! I have a question about use cases. For parameterized data loaders I can see a very clear use case. Parameterized pages I'm a little fuzzier about the details but still am on board. Where I struggle is the parameterized JS files.

I'm trying to think of something that I could do with a parameterized JS file that I couldn't do in Framework today. That is to say, JS files already have a natural way of passing in extra parameters, which both pages and data loaders currently lack.

import {widget} from "components/widget.js";
widget(parameter);

I'm imagining that Markdown files will have access to their parameters somehow, and could pass them into JS files as needed. Though re-reading, I see now that you haven't specified any mechanism for that. 🤔

Do you have any specific use cases in mind that could be benefit from parameterized JavaScript components?

@mbostock
Copy link
Member Author

mbostock commented Jul 25, 2024

The purpose of parameterized pages is essentially to enable parameterized data loaders (through relative paths as described in my previous comment). If we enumerate the parameter values in the config, we can build a page for each parameter value; the parameterized page can then reference a parameterized data loader as a relative path.

I don’t have any use case imagined (yet) for parameterized JavaScript components or stylesheets. It’s just something that “falls out” of having parameterized pages. I’m not trying to actively support parameterized JavaScript components, but we’ll have to do something with path rewriting to avoid it if we have parameterized pages.

@mbostock
Copy link
Member Author

As another example, say you have the parameterized page /[user]/index.md which references the file ./foo.txt. If this is a static file, i.e. there exists the file /[user]/foo.txt in the source root, then the contents of this file are the same regardless of the parameter value. So if we have both /a/index.md and /b/index.md, we don’t want to generate copies /a/foo.txt and /b/foo.txt; we want to use a single static /[user]/foo.txt that is shared by both pages. The same goes for a JavaScript component (that can’t be generated by a data loader, at least not currently).

So when we resolve references from parameterized pages, we only want to preserve the parameterized-ness for resources that are generated by data loaders. For other resources we’ll need to rewrite the paths so that they are treated statically.

@mbostock
Copy link
Member Author

mbostock commented Jul 25, 2024

I’ve thought of another challenge of trying to share static resources across parameter values.

If a parameterized page imports a JavaScript component, we want that JavaScript component to be considered static (since the code will be the same across all parameter values and we’re not trying to support parameterized components). However, that JavaScript component could reference a file, and that file could be generated by a data loader which is parameterized. Since JavaScript components live in _import while files live in _file, we need to know the full path in the source root to the file… which means the built JavaScript component can’t be considered static, because it will need to bake-in the parameterized path, e.g., FileAttachment("../../foo/data.txt", import.meta.url) where foo is the parameter value. 🤔

So I think either we should duplicate shared resources across parameterized pages (de-duplicating is just a performance optimization anyway, and can be done manually by moving the resources to a non-parameterized path) — or at least duplicate shared components.

Or maybe we avoid the concept of parameterized pages entirely. But if we don’t have parameterized pages, then how would you practically take advantage of parameterized data loaders? You can’t pass a dynamic argument to FileAttachment. And I think we really want parameterized pages anyway… without them we’d force users to parse query parameters to create the equivalent linkable view, and that feels awkward.

Or maybe we don’t have separate _import and _file roots so that relative paths continue to work? The only reason we make that distinction now is to allow you to have a .js file that’s not transpiled in _file, but that feels pretty rare.

@mbostock
Copy link
Member Author

mbostock commented Jul 25, 2024

I want to try leaning into parameterization.

I think the answer here is that we don’t try to share static assets across parameters; if you have /[user]/foo.png, it’s duplicated to /a/foo.png, /b/foo.png, etc. Maybe in the future when we have server-side rendering, only some of those assets end up being present for some parameter values. And it’s desirable to not leave any evidence as to whether foo.png is generated by a data loader or not. (Again, if you want to avoid duplication, you just move foo.png out of /[user]/ to a non-parameterized path.)

Then, for JavaScript components with parameterized paths, we likewise duplicate the components as needed: /[user]/widget.js becomes /a/widget.js, /b/widget.js, etc. And therefore any relative FileAttachment reference in widget.js will “just work”. This will require some tricky plumbing in our code to distinguish between the source path and serving path of components, but it should be doable.

In addition, we can allow parameterized components and JavaScript code blocks in parameterized pages to refer to parameter values, say as observable.params.user. Referenced parameter values are baked-in to the code when built (in transpileModule, perhaps using esbuild’s define option, in a similar fashion to process.env references in Next.js).

With a bit more magic, this approach could even allow parameter values in arguments to FileAttachment, since parameter values are known statically. This is nice because now we can have /[user].md load the corresponding /[user].json as FileAttachment(`${observable.params.user}.json`), rather than being forced to use /[user]/index.md. It also means you could have /[property]/[user].md load /[user].json which is not (otherwise) statically expressible as a relative path.

@mbostock mbostock changed the title parameterized data loaders parameterized routing Jul 26, 2024
docs/[dir]/index.md Outdated Show resolved Hide resolved
@Fil
Copy link
Contributor

Fil commented Aug 29, 2024

I tried it on pangea and got the following error:

SyntaxError: The requested module 'glob' does not provide an export named 'globSync'

The version of glob that I have in yarn.lock is ancient:

glob@^8.0.3:
  resolved "https://registry.yarnpkg.com/glob/-/glob-8.1.0.tgz#d388f656593ef708ee3e34640fdfb99a9fd1c33e"

as required by @rollup/plugin-commonjs@^25.0.7

Shouldn't glob be a dependency, not a devDependency? Or is it because of the way I link it on pangea?

  "dependencies": {
    "@observablehq/framework": "https://github.com/observablehq/framework#mbostock/parameterized",

Adding glob 11 as a dependency in pangea fixes this, but then it chokes on my version of node (21). It wants node 20 or node 22. Using node 20 fixes that, and I can now run the project.

@Fil
Copy link
Contributor

Fil commented Aug 29, 2024

When a page loader crashes, the preview server sometimes crashes.

I think the reproduction is the following:

create crash.md.js, with contents

process.stdout.write(`# Hello, world`);

open the preview server to /crash, the page shows hello world. Now edit the contents to be:

process.stdout.write(`# Hello ${name}`);

the server crashes.

trace
load /crash.md → [stale] file:///Users/fil/Source/pangea/src/crash.md.js:1
process.stdout.write(`# Hello ${name}`);
                                ^

ReferenceError: name is not defined
    at file:///Users/fil/Source/pangea/src/crash.md.js:1:33
    at ModuleJob.run (node:internal/modules/esm/module_job:218:25)
    at async ModuleLoader.import (node:internal/modules/esm/loader:329:24)
    at async loadESM (node:internal/process/esm_loader:28:7)
    at async handleMainPromise (node:internal/modules/run_main:113:12)

Node.js v20.11.1
error in 56ms: loader exited with code 1
/Users/fil/Source/pangea/node_modules/@observablehq/framework/src/loader.ts:471
      throw new Error(`loader exited with code ${code}`);
            ^


Error: loader exited with code 1
    at CommandLoader.exec (/Users/fil/Source/pangea/node_modules/@observablehq/framework/src/loader.ts:471:13)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at <anonymous> (/Users/fil/Source/pangea/node_modules/@observablehq/framework/src/loader.ts:404:11)

Node.js v20.11.1
error Command failed with exit code 1.

Note: it does not crash if you directly load the erroneous page, in that case it fails gracefully as expected.

@mbostock
Copy link
Member Author

Yes, glob should be a dependency, not a devDependency. We are using glob@10; glob@11 drops support for older versions of Node but we haven’t done the same (yet) in Framework so we shouldn’t use it. I’ve fixed it in the latest commit.

I’ll investigate the Preview server crash when a page loader crashes next. I’ve seen that before, too.

@mbostock
Copy link
Member Author

Okay, fixed the page loader crash during preview. I also fixed the logic so that the exponential backoff on reopening the socket works as intended when the socket opens and then immediately crashes because of the page loader; we should only reset the socket reopen delay when the page loader is successful.

@Fil
Copy link
Contributor

Fil commented Aug 29, 2024

I don't think it's an issue of any consequence, but it's fun to see that if you have a dynamic route [x][y].md.js and call /framework, the letters are distributed as x=framewor, y=k.

I've tried to pass an option value that starts with a dash (e.g., rotate,[x].md.js with url rotate,-45). It's not clear to me if this is safe to do, or if we're unwittingly passing the raw value when we should be passing a quoted value?

Anyhow my hint is that I could not read the negative number from parseArgs, because it errors with 'ERR_PARSE_ARGS_INVALID_OPTION_VALUE' (it's still possible to look at argv directly).

import {parseArgs} from "node:util";
const {
  values: {x}
} = parseArgs({
  options: {x: {type: "string"}}
});
//const x = +process.argv[1 + [...process.argv].findIndex((d) => d === "--x")]; // this works but…

process.stdout.write(`# Hello ${JSON.stringify(process.argv)}

${JSON.stringify({x})}`);

@mbostock
Copy link
Member Author

mbostock commented Aug 29, 2024

Okay, fixed by sending --name=value instead of sending --name and value as two separate arguments. This wasn’t an escaping problem, but an ambiguity with Node’s parseArgs where it interpreted the value as a separate argument rather than the value of the previous argument. But Framework can avoid that ambiguity by using the name=value syntax instead of passing the value separately.

@Fil
Copy link
Contributor

Fil commented Aug 29, 2024

I've tried to break it some more, but I can't find how

docs/page-loaders.md.js Outdated Show resolved Hide resolved
Co-authored-by: Philippe Rivière <[email protected]>
@mbostock mbostock enabled auto-merge (squash) August 29, 2024 17:32
docs/page-loaders.md.js Outdated Show resolved Hide resolved
\`);
~~~~

Framework’s [theme previews](./themes) are implemented as parameterized page loaders; see [their source](https://github.com/observablehq/framework/blob/main/docs/theme/%5Btheme%5D.md.ts) for a practical example.
Copy link
Contributor

@Fil Fil Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Framework’s [theme previews](./themes) are implemented as parameterized page loaders; see [their source](https://github.com/observablehq/framework/blob/main/docs/theme/%5Btheme%5D.md.ts) for a practical example.
Within this documentation, the [theme previews](./themes) are implemented as parameterized page loaders; see [their source](https://github.com/observablehq/framework/blob/main/docs/theme/%5Btheme%5D.md.ts) for a practical example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s clear that “theme previews” are part of documentation, no?

@Fil
Copy link
Contributor

Fil commented Aug 29, 2024

That's all I have!

@Fil Fil disabled auto-merge August 29, 2024 18:15
Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've disabled auto-merge to give you a chance to see the remaining suggestion.

This is a HUGE step forward.

@mbostock
Copy link
Member Author

I’m going to move the parameterized page loader docs into the parameterized routes docs so that it isn’t duplicated in both places. That makes the page loader docs pretty short, but that’s not too surprising since they’re very closely related to data loaders. I also want to expand the examples a bit and make them slightly more realistic.

@mbostock mbostock merged commit dfb6bda into main Aug 29, 2024
4 checks passed
@mbostock mbostock deleted the mbostock/parameterized branch August 29, 2024 20:41
@Fil Fil mentioned this pull request Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Server-Side Rendering (SSR) Parameterized data loaders & pages
4 participants