From 7445c707023ee03b1ab300122b7968018600fa41 Mon Sep 17 00:00:00 2001 From: CAD97 Date: Wed, 24 Nov 2021 19:31:59 -0600 Subject: [PATCH 1/5] Add proc macro include RFC --- text/0000-proc-macro-include.md | 175 ++++++++++++++++++++++++++++++++ 1 file changed, 175 insertions(+) create mode 100644 text/0000-proc-macro-include.md diff --git a/text/0000-proc-macro-include.md b/text/0000-proc-macro-include.md new file mode 100644 index 00000000000..0b13b664f0a --- /dev/null +++ b/text/0000-proc-macro-include.md @@ -0,0 +1,175 @@ +- Feature Name: `proc_macro_include` +- Start Date: 2021-11-24 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Proc macros can now effectively `include!` other files and process their contents. +This both allows proc macros to communicate that they read external files, +and to maintain spans into the external file for more useful error messages. + +# Motivation +[motivation]: #motivation + +- `include!` and `include_str!` are no longer required to be compiler built-ins, + and could be implemented as proc macros. +- Help incremental builds and build determinism, by proc macros telling rustc which files they read. +- Improve proc macro sandboxability and cacheability, by offering a way to implement this class of + file-reading macros without using OS APIs directly. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +## For users of proc macros + +Nothing changes! You'll just see nicer errors and fewer rebuilds +from procedural macros which read external files. + +## For writers of proc macros + +Three new functions are provided in the `proc_macro` interface crate: + +```rust +/// Read the contents of a file as a `TokenStream` and add it to build dependency info. +/// +/// The build system executing the compiler will know that the file was accessed during compilation, +/// and will be able to rerun the build when the contents of the file changes. +/// +/// May fail for a number of reasons, for example, if the string contains unbalanced delimiters +/// or characters not existing in the language. +/// +/// If the file fails to be read, this is not automatically a fatal error. The proc macro may +/// gracefully handle the missing file, or emit a compile error noting the missing dependency. +/// +/// Source spans are constructed for the read file. If you use the spans of this token stream, +/// any resulting errors will correctly point at the tokens in the read file. +/// +/// NOTE: some errors may cause panics instead of returning `io::Error`. +/// We reserve the right to change these errors into `io::Error`s later. +fn include>(path: P) -> Result; + +/// Read the contents of a file as a string and add it to build dependency info. +/// +/// The build system executing the compiler will know that the file was accessed during compilation, +/// and will be able to rerun the build when the contents of the file changes. +/// +/// If the file fails to be read, this is not automatically a fatal error. The proc macro may +/// gracefully handle the missing file, or emit a compile error noting the missing dependency. +/// +/// NOTE: some errors may cause panics instead of returning `io::Error`. +/// We reserve the right to change these errors into `io::Error`s later. +fn include_str>(path: P) -> Result; + +/// Read the contents of a file as raw bytes and add it to build dependency info. +/// +/// The build system executing the compiler will know that the file was accessed during compilation, +/// and will be able to rerun the build when the contents of the file changes. +/// +/// If the file fails to be read, this is not automatically a fatal error. The proc macro may +/// gracefully handle the missing file, or emit a compile error noting the missing dependency. +/// +/// NOTE: some errors may cause panics instead of returning `io::Error`. +/// We reserve the right to change these errors into `io::Error`s later. +fn include_str>(path: P) -> Result, std::io::Error>; +``` + +As an example, consider a potential implementation of [`core::include`](https://doc.rust-lang.org/stable/core/macro.include.html): + +```rust +#[proc_macro] +pub fn include(input: TokenStream) -> TokenStream { + let mut iter = input.into_iter(); + + let result = 'main: if let Some(tt) = iter.next() { + let TokenTree::Literal(lit) = tt && + let LiteralValue::Str(path) = lit.value() else { + Diagnostic::spanned(tt.span(), Level::Error, "argument must be a string literal").emit(); + break 'main TokenStream::new(); + } + + match proc_macro::include(&path) { + Ok(token_stream) => token_stream, + Err(err) => { + Diagnostic::spanned(Span::call_site(), Level::Error, format_args!("couldn't read {path}: {err}")).emit(); + TokenStream::new() + } + } + } else { + Diagnostic::spanned(Span::call_site(), Level::Error, "include! takes 1 argument").emit(); + TokenStream::new() + } + + if let Some(_) = iter.next() { + Diagnostic::spanned(Span::call_site(), Level::Error, "include! takes 1 argument").emit(); + } + + result +} +``` + +(RFC note: this example uses unstable and even unimplemented features for clarity. +However, this RFC in no way requires these features to be useful on its own.) + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +If a file read is unsuccessful, an encoding of the responsible `io::Error` is passed over the RPC bridge. +If a file is successfully read but fails to lex, `ErrorKind::Other` is returned. + +None of these three APIs should ever cause compilation to fail. +It is the responsibility of the proc macro to fail compilation if a failed file read is fatal. + +The author is unsure of the technical details required to implement this in the compiler. + +# Drawbacks +[drawbacks]: #drawbacks + +This is more API surface for the `proc_macro` crate, and the `proc_macro` bridge is already complicated. +Additionally, this is likely to lead to more proc macros which read external files. +Moving the handling of `include!`-like macros later in the compiler pipeline +(read: dependent on name resolution) +likely is also significantly more complicated than the current `include!` implementation. + +# Alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- [`proc_macro::tracked_path`](https://doc.rust-lang.org/stable/proc_macro/tracked_path/fn.path.html) (unstable) + +This just tells the proc_macro driver that the proc macro has a dependency on the given path. +This is sufficient for tracking the file, as the proc macro can just also read the file itself, +but lacks the ability to require the proc macro go through this API, or to provide spans for errors. + +Meaningfully, it'd be nice to be able to sandbox proc macros in wasm à la [watt](https://crates.io/crates/watt) +while still having proc macros capable of reading the filesystem (in a proc_macro driver controlled manner). + +- Status quo + +Proc macros can continue to read files and use `include_str!` to indicate a build dependency. +This is error prone, easy to forget to do, and all around not a great experience. + +# Prior art +[prior-art]: #prior-art + +No known prior art. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- It would be nice for `include` to allow emitting a useful lexer error directly. + This is not currently provided for by the proposed API. +- Unknown unknowns. + +# Future possibilities +[future-possibilities]: #future-possibilities + +Future expansion of the proc macro APIs are almost entirely orthogonal from this feature. +As such, here is a small list of potential uses for this API: + +- Processing a Rust-lexer-compatible DSL + - Multi-file parser specifications for toolchains like LALRPOP or pest + - Larger scale Rust syntax experimentations +- Pre-processing `include!`ed assets + - Embedding compiled-at-rustc-time shaders + - Escaping text at compile time for embedding in a document format From 1d641b37f7986626c15b354fd5834a39928d14e7 Mon Sep 17 00:00:00 2001 From: Christopher Durham Date: Thu, 25 Nov 2021 00:34:36 -0600 Subject: [PATCH 2/5] Update text/0000-proc-macro-include.md Co-authored-by: Jacob Lifshay --- text/0000-proc-macro-include.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-proc-macro-include.md b/text/0000-proc-macro-include.md index 0b13b664f0a..85dc2f30921 100644 --- a/text/0000-proc-macro-include.md +++ b/text/0000-proc-macro-include.md @@ -72,7 +72,7 @@ fn include_str>(path: P) -> Result; /// /// NOTE: some errors may cause panics instead of returning `io::Error`. /// We reserve the right to change these errors into `io::Error`s later. -fn include_str>(path: P) -> Result, std::io::Error>; +fn include_bytes>(path: P) -> Result, std::io::Error>; ``` As an example, consider a potential implementation of [`core::include`](https://doc.rust-lang.org/stable/core/macro.include.html): From b4aa253f4f048559f9b2a8441cdfc2495cb31daa Mon Sep 17 00:00:00 2001 From: CAD97 Date: Wed, 20 Apr 2022 16:55:05 -0500 Subject: [PATCH 3/5] Update for comments --- text/0000-proc-macro-include.md | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/text/0000-proc-macro-include.md b/text/0000-proc-macro-include.md index 85dc2f30921..b9e8d82970b 100644 --- a/text/0000-proc-macro-include.md +++ b/text/0000-proc-macro-include.md @@ -32,7 +32,7 @@ from procedural macros which read external files. Three new functions are provided in the `proc_macro` interface crate: ```rust -/// Read the contents of a file as a `TokenStream` and add it to build dependency info. +/// Read the contents of a file as a `TokenStream` and add it to build dependency graph. /// /// The build system executing the compiler will know that the file was accessed during compilation, /// and will be able to rerun the build when the contents of the file changes. @@ -50,7 +50,7 @@ Three new functions are provided in the `proc_macro` interface crate: /// We reserve the right to change these errors into `io::Error`s later. fn include>(path: P) -> Result; -/// Read the contents of a file as a string and add it to build dependency info. +/// Read the contents of a file as a string literal and add it to build dependency graph. /// /// The build system executing the compiler will know that the file was accessed during compilation, /// and will be able to rerun the build when the contents of the file changes. @@ -60,9 +60,9 @@ fn include>(path: P) -> Result; /// /// NOTE: some errors may cause panics instead of returning `io::Error`. /// We reserve the right to change these errors into `io::Error`s later. -fn include_str>(path: P) -> Result; +fn include_str>(path: P) -> Result; -/// Read the contents of a file as raw bytes and add it to build dependency info. +/// Read the contents of a file as raw bytes and add it to build dependency graph. /// /// The build system executing the compiler will know that the file was accessed during compilation, /// and will be able to rerun the build when the contents of the file changes. @@ -84,7 +84,8 @@ pub fn include(input: TokenStream) -> TokenStream { let result = 'main: if let Some(tt) = iter.next() { let TokenTree::Literal(lit) = tt && - let LiteralValue::Str(path) = lit.value() else { + let LiteralValue::Str(path) = lit.value() + else { Diagnostic::spanned(tt.span(), Level::Error, "argument must be a string literal").emit(); break 'main TokenStream::new(); } @@ -121,15 +122,12 @@ If a file is successfully read but fails to lex, `ErrorKind::Other` is returned. None of these three APIs should ever cause compilation to fail. It is the responsibility of the proc macro to fail compilation if a failed file read is fatal. -The author is unsure of the technical details required to implement this in the compiler. - # Drawbacks [drawbacks]: #drawbacks This is more API surface for the `proc_macro` crate, and the `proc_macro` bridge is already complicated. Additionally, this is likely to lead to more proc macros which read external files. Moving the handling of `include!`-like macros later in the compiler pipeline -(read: dependent on name resolution) likely is also significantly more complicated than the current `include!` implementation. # Alternatives @@ -144,6 +142,19 @@ but lacks the ability to require the proc macro go through this API, or to provi Meaningfully, it'd be nice to be able to sandbox proc macros in wasm à la [watt](https://crates.io/crates/watt) while still having proc macros capable of reading the filesystem (in a proc_macro driver controlled manner). +- Custom error type + +A custom error wrapper would provide a point to attach more specific error information than just an +`io::Error`, such as the lexer error encountered by `include`. This RFC opts to use `io::Error` +directly to provide a more minimal API surface. + +- Wrapped return types + +Returning `Literal::string` from `include_str` and `Vec` from `include_bytes` implies that +the entire included file must be read into memory managed by the Rust global allocator. +Alternatively, a more abstract buffer type could be used which allows more efficiently working +with very large files that could be instead e.g. memmapped rather than read into a buffer. + - Status quo Proc macros can continue to read files and use `include_str!` to indicate a build dependency. @@ -159,6 +170,9 @@ No known prior art. - It would be nice for `include` to allow emitting a useful lexer error directly. This is not currently provided for by the proposed API. +- `include!` sets the "current module path" for the included code. + It's unclear how this should behave for `proc_macro::include`, + and whether this behavior should be replicated at all. - Unknown unknowns. # Future possibilities From 3029688bbc51a60045873c34533246a1c2a6217d Mon Sep 17 00:00:00 2001 From: CAD97 Date: Wed, 20 Apr 2022 17:00:15 -0500 Subject: [PATCH 4/5] Discuss normalization --- text/0000-proc-macro-include.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/text/0000-proc-macro-include.md b/text/0000-proc-macro-include.md index b9e8d82970b..e75b49fb42b 100644 --- a/text/0000-proc-macro-include.md +++ b/text/0000-proc-macro-include.md @@ -155,6 +155,9 @@ the entire included file must be read into memory managed by the Rust global all Alternatively, a more abstract buffer type could be used which allows more efficiently working with very large files that could be instead e.g. memmapped rather than read into a buffer. +This would likely look like `LiteralString` and `LiteralBytes` types in the `proc_macro` bridge, +but this RFC opts to use the existing `Literal` and `Vec` to provide a more minimal API surface. + - Status quo Proc macros can continue to read files and use `include_str!` to indicate a build dependency. @@ -173,6 +176,12 @@ No known prior art. - `include!` sets the "current module path" for the included code. It's unclear how this should behave for `proc_macro::include`, and whether this behavior should be replicated at all. +- Should `include_str` get source code normalization (i.e. `\r\n` to `\n`)? + `include_str!` deliberately includes the string exactly as it appears on disk, + and the purpose of these APIs is to provide post-processing steps, + which could need the file to be reproduced exactly, + so the answer is likely *no*, + and the produced `Literal` should represent the exact contents of the file. - Unknown unknowns. # Future possibilities From f1f4aca0a3d8fff09e1a681175ad64fee6b97edf Mon Sep 17 00:00:00 2001 From: CAD97 Date: Wed, 20 Apr 2022 17:02:52 -0500 Subject: [PATCH 5/5] Discuss relative paths. --- text/0000-proc-macro-include.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/text/0000-proc-macro-include.md b/text/0000-proc-macro-include.md index e75b49fb42b..a525527047c 100644 --- a/text/0000-proc-macro-include.md +++ b/text/0000-proc-macro-include.md @@ -182,6 +182,13 @@ No known prior art. which could need the file to be reproduced exactly, so the answer is likely *no*, and the produced `Literal` should represent the exact contents of the file. +- What base directory should relative paths be resolved from? + The two reasonable answers are + + - That which `include!` is relative to in the source file expanding the macro. + - That which `fs` is relative to in the proc macro execution. + + Both have their merits and drawbacks. - Unknown unknowns. # Future possibilities