Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: proc macro include! #3200

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
205 changes: 205 additions & 0 deletions text/0000-proc-macro-include.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
- Feature Name: `proc_macro_include`
- Start Date: 2021-11-24
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Proc macros can now effectively `include!` other files and process their contents.
This both allows proc macros to communicate that they read external files,
and to maintain spans into the external file for more useful error messages.

# Motivation
[motivation]: #motivation

- `include!` and `include_str!` are no longer required to be compiler built-ins,
and could be implemented as proc macros.
- Help incremental builds and build determinism, by proc macros telling rustc which files they read.
- Improve proc macro sandboxability and cacheability, by offering a way to implement this class of
file-reading macros without using OS APIs directly.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

## For users of proc macros

Nothing changes! You'll just see nicer errors and fewer rebuilds
from procedural macros which read external files.

## For writers of proc macros

Three new functions are provided in the `proc_macro` interface crate:

```rust
/// Read the contents of a file as a `TokenStream` and add it to build dependency graph.
///
/// The build system executing the compiler will know that the file was accessed during compilation,
/// and will be able to rerun the build when the contents of the file changes.
///
/// May fail for a number of reasons, for example, if the string contains unbalanced delimiters
/// or characters not existing in the language.
///
/// If the file fails to be read, this is not automatically a fatal error. The proc macro may
/// gracefully handle the missing file, or emit a compile error noting the missing dependency.
///
/// Source spans are constructed for the read file. If you use the spans of this token stream,
/// any resulting errors will correctly point at the tokens in the read file.
///
/// NOTE: some errors may cause panics instead of returning `io::Error`.
/// We reserve the right to change these errors into `io::Error`s later.
CAD97 marked this conversation as resolved.
Show resolved Hide resolved
fn include<P: AsRef<str>>(path: P) -> Result<TokenStream, std::io::Error>;
CAD97 marked this conversation as resolved.
Show resolved Hide resolved

/// Read the contents of a file as a string literal and add it to build dependency graph.
///
/// The build system executing the compiler will know that the file was accessed during compilation,
/// and will be able to rerun the build when the contents of the file changes.
///
/// If the file fails to be read, this is not automatically a fatal error. The proc macro may
/// gracefully handle the missing file, or emit a compile error noting the missing dependency.
///
/// NOTE: some errors may cause panics instead of returning `io::Error`.
/// We reserve the right to change these errors into `io::Error`s later.
fn include_str<P: AsRef<str>>(path: P) -> Result<Literal, std::io::Error>;

/// Read the contents of a file as raw bytes and add it to build dependency graph.
///
/// The build system executing the compiler will know that the file was accessed during compilation,
/// and will be able to rerun the build when the contents of the file changes.
///
/// If the file fails to be read, this is not automatically a fatal error. The proc macro may
/// gracefully handle the missing file, or emit a compile error noting the missing dependency.
///
/// NOTE: some errors may cause panics instead of returning `io::Error`.
/// We reserve the right to change these errors into `io::Error`s later.
fn include_bytes<P: AsRef<str>>(path: P) -> Result<Vec<u8>, std::io::Error>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for include_bytes to return Literal as well, or would that not be possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should work because Literal can be a byte string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yeah, I overlooked that possibility.

The main limitation is that the only current interface for getting the contents out of a Literal is to ToString it. syn does have a .value() for LitByteStr as well as LitStr, though, so I guess it's workable.

It's probably not good to short term require debug escaping a binary file to reparse the byte string literal if a proc macro is going to post process the file... but if it's just including the literal, it can put the Literal in the token stream, and we can offer ways to extract (byte) string literals without printing the string literal in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one limitation which needs to be solved is how do spans work. Do we just say that the byte string literal contains the raw bytes of the file (even though that would be illegal in a normal byte string, and invalid UTF-8), maybe as a new "kind" of byte string, so span offsets are mapped directly with the source file? Or are there multiple span positions (representing a \xNN in the byte string) which map to a single byte in the source file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what bytes are not allowed in byte string literals? Does the literal itself have to be valid UTF-8?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Rust source file must be valid UTF-8. Thus, the contents of a byte string literal in the source must be valid UTF-8.

Bytes that are not < 0x80 thus must be escaped to appear in a byte string literal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then another question that's worth making explicit: what does it even mean for rustc to report a span into a binary file?

I think binary includes are better served by a different API that lets rustc point into generated code, rather than trying to point into an opaque binary file.

```

As an example, consider a potential implementation of [`core::include`](https://doc.rust-lang.org/stable/core/macro.include.html):

```rust
#[proc_macro]
pub fn include(input: TokenStream) -> TokenStream {
let mut iter = input.into_iter();

let result = 'main: if let Some(tt) = iter.next() {
let TokenTree::Literal(lit) = tt &&
let LiteralValue::Str(path) = lit.value()
else {
Diagnostic::spanned(tt.span(), Level::Error, "argument must be a string literal").emit();
break 'main TokenStream::new();
}

match proc_macro::include(&path) {
Ok(token_stream) => token_stream,
Err(err) => {
Diagnostic::spanned(Span::call_site(), Level::Error, format_args!("couldn't read {path}: {err}")).emit();
TokenStream::new()
}
}
} else {
Diagnostic::spanned(Span::call_site(), Level::Error, "include! takes 1 argument").emit();
TokenStream::new()
}

if let Some(_) = iter.next() {
Diagnostic::spanned(Span::call_site(), Level::Error, "include! takes 1 argument").emit();
}

result
}
```

(RFC note: this example uses unstable and even unimplemented features for clarity.
However, this RFC in no way requires these features to be useful on its own.)

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

If a file read is unsuccessful, an encoding of the responsible `io::Error` is passed over the RPC bridge.
If a file is successfully read but fails to lex, `ErrorKind::Other` is returned.

None of these three APIs should ever cause compilation to fail.
It is the responsibility of the proc macro to fail compilation if a failed file read is fatal.

# Drawbacks
[drawbacks]: #drawbacks

This is more API surface for the `proc_macro` crate, and the `proc_macro` bridge is already complicated.
Additionally, this is likely to lead to more proc macros which read external files.
Moving the handling of `include!`-like macros later in the compiler pipeline
likely is also significantly more complicated than the current `include!` implementation.

# Alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

- [`proc_macro::tracked_path`](https://doc.rust-lang.org/stable/proc_macro/tracked_path/fn.path.html) (unstable)

This just tells the proc_macro driver that the proc macro has a dependency on the given path.
This is sufficient for tracking the file, as the proc macro can just also read the file itself,
but lacks the ability to require the proc macro go through this API, or to provide spans for errors.

Meaningfully, it'd be nice to be able to sandbox proc macros in wasm à la [watt](https://crates.io/crates/watt)
while still having proc macros capable of reading the filesystem (in a proc_macro driver controlled manner).

- Custom error type

A custom error wrapper would provide a point to attach more specific error information than just an
`io::Error`, such as the lexer error encountered by `include`. This RFC opts to use `io::Error`
directly to provide a more minimal API surface.

- Wrapped return types

Returning `Literal::string` from `include_str` and `Vec<u8>` from `include_bytes` implies that
the entire included file must be read into memory managed by the Rust global allocator.
Alternatively, a more abstract buffer type could be used which allows more efficiently working
with very large files that could be instead e.g. memmapped rather than read into a buffer.

This would likely look like `LiteralString` and `LiteralBytes` types in the `proc_macro` bridge,
but this RFC opts to use the existing `Literal` and `Vec<u8>` to provide a more minimal API surface.

- Status quo

Proc macros can continue to read files and use `include_str!` to indicate a build dependency.
This is error prone, easy to forget to do, and all around not a great experience.

# Prior art
[prior-art]: #prior-art

No known prior art.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- It would be nice for `include` to allow emitting a useful lexer error directly.
This is not currently provided for by the proposed API.
- `include!` sets the "current module path" for the included code.
It's unclear how this should behave for `proc_macro::include`,
and whether this behavior should be replicated at all.
- Should `include_str` get source code normalization (i.e. `\r\n` to `\n`)?
`include_str!` deliberately includes the string exactly as it appears on disk,
and the purpose of these APIs is to provide post-processing steps,
which could need the file to be reproduced exactly,
so the answer is likely *no*,
and the produced `Literal` should represent the exact contents of the file.
- What base directory should relative paths be resolved from?
The two reasonable answers are

- That which `include!` is relative to in the source file expanding the macro.
- That which `fs` is relative to in the proc macro execution.

Both have their merits and drawbacks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to support both options would be to take a Span that the path is relative to. Then it would make multi-level includes easier (the macro includes a path relative to the Rust source file, then the included file references another relative file so that needs to be included based on the Span from the first proc_macro::include_str call).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would Span::mixed_site be relative to?

Also, that would kinda soft-block the feature on Span::def_site, while the RFC is currently written such that additional unstable features (such as span subslicing) are incremental improvements not required for the functionality to be useful... though I suppose requiring a span would be strictly more powerful than include!-style base path, so that fits into the same category.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it should just behave the exact same as a include_str!("..") macro invocation whose tokens carry a mixed_site span.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macro_rules! x {
    () => {
        include_str!("a")
    };
}

Somewhat surprisingly, this looks for a file called "a" relative to the file in which x!() is invoked, not relative to the file that contains the definition above.

- Unknown unknowns.

# Future possibilities
[future-possibilities]: #future-possibilities

Future expansion of the proc macro APIs are almost entirely orthogonal from this feature.
As such, here is a small list of potential uses for this API:

- Processing a Rust-lexer-compatible DSL
- Multi-file parser specifications for toolchains like LALRPOP or pest
- Larger scale Rust syntax experimentations
- Pre-processing `include!`ed assets
- Embedding compiled-at-rustc-time shaders
- Escaping text at compile time for embedding in a document format