Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic TypeScript definition generation from JSG RTTI #113

Merged
merged 6 commits into from
Oct 27, 2022
Merged

Conversation

mrbbot
Copy link
Contributor

@mrbbot mrbbot commented Oct 17, 2022

(re-do of #110, let's try this again 🙂 )

Hey! 👋 This PR adds support for generating TypeScript types from JSG RTTI, replacing the internal autodecl script. This forms the basis for the next workers-types version. These scripts are located in this repository rather than workers-types as they depend on Bazel outputs, and we'd like to be able to share the Bazel cache in CI. Going forward, DevProd should probably be the CODEOWNER for everything in the types directory.

To generate types, run:

$ bazel build //types:types

A gist containing the generated types can be found here. This also includes a copy of the types before TypeScript transformers are applied.

(fyi, about half of the additions are from pnpm-lock.yaml)

Implementation Notes and Questions

  1. Added a jsg::fullyQualifiedTypeName() method. This behaves like jsg::typeName(), but includes namespaces and template arguments in the returned name. Namespaces are required to differentiate nested types with the same name (e.g. DurableObjectStorageOperations::GetOptions, KvNamespace::GetOptions and R2Bucket::GetOptions). Template arguments are required to differentiate jsg::IteratorBase... types generated by JSG_ITERATOR.
  2. In workerd::jsg::rtti::Builder, replaced the symbol key from jsg::typeName() to jsg::fullyQualifiedTypeName() for the reasons above. This also affects which values should be passed as parameters to workerd::jsg::rtti::Builder::structure().
  3. Changed the rtti.capnp schema. Generally, I tried to evolve the schema in a backwards-compatible way, but it would be cleaner if we're still able to make breaking changes. /cc @mikea
    1. Added fullyQualifiedName fields to Structure and StructureType schemas for the reasons above. I was hesitant to make name fully-qualified since I wasn't sure which other code depended on RTTI.
    2. Added support for jsg::LenientOptional to RTTI. When generating TypeScript definitions, we also need to be able to distinguish between optionals expecting null (kj::Maybe) and others expecting undefined. A new name field has been added to the MaybeType schema, similar to NumberType and StringType.
    3. Similarly, we need to be able to distinguish between kj::Array, kj::ArrayPtr and jsg::Sequence, so a new name field has been added to the ArrayType schema. It may be better to make this and the previous maybe field enums instead?
    4. Added a name field in a group with nested members. This ensures members coming from JSG_NESTED_TYPE_NAMED macros have the correct names.
    5. Added iterator and asyncIterator fields to the Structure schema. Whilst there are already iterable and asyncIterable boolean fields, we need to know the full method types (especially the returned (Async)Iterator type) for [Symbol.iterator]/[Symbol.asyncIterator].
  4. Added a new api-encoder.c++ entrypoint that spits out RTTI to a file. All TypeScript generation scripts are written in TypeScript, so we can use the official TypeScript compiler API for creating/processing/printing AST nodes.
  5. Setup Bazel to build/run JavaScript/TypeScript programs. I used https://github.com/aspect-build/rules_js instead of https://github.com/bazelbuild/rules_nodejs as this seems to be the actively maintained version. Note aspect-build/rules_js requires us to use pnpm. We should be able to use this setup for packaging/publishing workerd npm packages too. /cc @penalosa

TODOs

(to follow in later PRs)

  1. Overrides: whilst correct TypeScript types are being generated without them (an improvement on autodecl 😅), they're not as accurate as they could be.
  2. Parameter names: currently parameters are named param0, param1, ... . Ideally, we'd use the actual C++ parameter names here, so we need some way of getting these into the RTTI.
  3. Compatibility dates: we currently generate a single set of types with all non-experimental compatibility flags enabled. We'd like types to depend on which flags users have enabled.
  4. CI: we'd like to build/publish types automatically on PRs/releases.
  5. Multiple files: we currently generate a large single file containing all types. We'd like to split this up into multiple purpose-related files, hence this implementation splits structures into named groups.

@jasnell
Copy link
Member

jasnell commented Oct 17, 2022

@mrbbot ... ok, so the reason we had to revert this was twofold:

  1. The bazel changes here also require bazel changes for the internal platform, and
  2. The code changes here cause compile failures on the internal platform.

@mikea also needs to review the code changes here before it can land. The changes here LGTM but Mike is the domain expert so while I might sign off it should wait to land until both of the following happen:

  1. Mike has a chance to review this and sign off, and
  2. There is a corresponding internal PR that updates the internal platform and that has a successful CI run.

@mrbbot mrbbot force-pushed the bcoll/types branch 3 times, most recently from 11d9b65 to f83ad19 Compare October 18, 2022 20:53
@penalosa
Copy link
Collaborator

This may have been fixed in newer commits, but slight bug in the version you posted; everything URL related is prefixed with URL. So, for instance, the URL type is urlURL.

@mrbbot mrbbot force-pushed the bcoll/types branch 2 times, most recently from de64f5a to 4cd2b6f Compare October 19, 2022 10:50
Copy link
Contributor

@mikea mikea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through all changes outside of types/. Sending comments so far.

I know it could be a hustle, but maybe we can break this PR into several safer ones?
In particular I'd like to see the rtti change and npm setup as separate PRs, since they have very different risk profiles.

BUILD.bazel Show resolved Hide resolved
src/workerd/jsg/rtti.capnp Outdated Show resolved Hide resolved
src/workerd/tools/api-encoder.c++ Show resolved Hide resolved
@@ -0,0 +1,183 @@
import assert from "assert";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kentonv maybe we want this in src/types? or src/workerd/types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kentonv .. just pinging again on this.

However, I don't think answering this should block this PR landing because we can always move this if necessary.

#
# workerd uses Node.js scripts for generating TypeScript types.

http_archive(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will definitely make embedding workerd hard.

Maybe extract this to a separate function setup_js_dependencies in bzl file so that it can be called by downstream deps if needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, I don't think you can call load() in functions (bazelbuild/bazel#1550)? So we wouldn't be able to call rules_js_dependencies, rules_ts_dependencies, npm_translate_lock or npm_repositories in our function? We have the same problem with Rust dependencies atm: the internal repository currently duplicates all the Rust WORKSPACE rules. 😕

@mrbbot
Copy link
Contributor Author

mrbbot commented Oct 20, 2022

everything URL related is prefixed with URL. So, for instance, the URL type is urlURL.

@penalosa we'll fix this when we add overrides that can rename types. 👍 The issue is that the spec-compliant URL implementation is under an additional url:: namespace so as to not conflict with the original. We should only be including one of these sets of definitions, so renaming the url:: types back to their spec-names shouldn't be an issue.

mikea
mikea previously requested changes Oct 20, 2022
Copy link
Contributor

@mikea mikea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS code is rather straightforward (though a bit dense). I have left some comments.

Having some testing is probably my biggest concern now.

types/README.md Show resolved Hide resolved
types/src/index.ts Outdated Show resolved Hide resolved
import ts from "typescript";
import { printNode } from "../print";

// Replaces custom Iterator-like interfaces with built-in `Iterator` types:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder: why do we generate something intermediate and patch it later, rather than generate correct property immediately?

I.e. when generating a property, check that its type is our iterator?

Copy link
Contributor Author

@mrbbot mrbbot Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it this way allows us to keep the initial generation code simple and separate concerns. As JSG_ITERATOR generates 2 structures (one for the iterator, one for the next result containing the value type we need), the initial generation functions would need to be aware of the full program as opposed to just the structure they were generating code for, making testing more difficult. This approach also allows us to use TypeScript's transformer/visitation APIs that were designed for tasks like this. I'm not too worried about squeezing performance out of this program either.


// If `typeNode` has the shape `T | undefined`, returns `T`, otherwise returns
// `undefined`.
export function maybeUnwrapOptional(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to consult generated types to glimpes information from the original structure?

Copy link
Contributor Author

@mrbbot mrbbot Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used the original structure information, we'd need to separately assert the generated type nodes had the expected optional shape T | undefined in order to extract T and satisfy TypeScript. Therefore, it makes sense to do the check like this, which narrows the TypeScript type for us. This helper is also useful in structure.ts, when generating properties.

types/src/generator/structure.ts Show resolved Hide resolved
@JacobMGEvans
Copy link
Contributor

Exciting.

// Input:
// Binary Cap’n Proto file path, defaults to reading from stdin if omitted
export async function main(args?: string[]) {
const { values: options, positionals } = util.parseArgs({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just caution here that util.parseArgs() is still considered experimental in Node.js. It's API can see breaking changes outside of normal semver rules. It's not expected to change, but it's worth keeping in mind.

Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. I don't really feel qualified to sign off on all the bazel changes, however, and I would note that @mikea has outstanding "Request Changes" here that I'm not entirely comfortable clearing just yet.


This directory contains scripts for automatically generating TypeScript types
from [JSG RTTI](../src/workerd/jsg/rtti.h).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not blocking for this PR but a request for later: It would be worthwhile documenting the dependencies used here... specifically the minimum Node.js version. I know the bazel stuff brings in the right version but just for detail information, especially given that the tool uses a number of experimental Node.js APIs that are fairly recent additions.

@jasnell
Copy link
Member

jasnell commented Oct 27, 2022

It appears that all of @mikea's feedback has been addressed. I'll clear their "change requested" now.

@jasnell jasnell dismissed mikea’s stale review October 27, 2022 16:19

Feedback has been addressed

@jasnell
Copy link
Member

jasnell commented Oct 27, 2022

@mrbbot this should be ready to land. Once landed, remember to rebase the matching internal PR on workerd main and let the CI run again there to confirm it's ready to land.

@mrbbot mrbbot merged commit a137de8 into main Oct 27, 2022
@mrbbot mrbbot deleted the bcoll/types branch October 27, 2022 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants