Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: proc macro
include!
#3200base: master
Are you sure you want to change the base?
RFC: proc macro
include!
#3200Changes from all commits
7445c70
1d641b3
b4aa253
3029688
f1f4aca
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
proc_macro_include
Summary
Proc macros can now effectively
include!
other files and process their contents. This both allows proc macros to communicate that they read external files, and to maintain spans into the external file for more useful error messages.Motivation
include!
andinclude_str!
are no longer required to be compiler built-ins, and could be implemented as proc macros.Guide-level explanation
For users of proc macros
Nothing changes! You'll just see nicer errors and fewer rebuilds from procedural macros which read external files.
For writers of proc macros
Three new functions are provided in the
proc_macro
interface crate:As an example, consider a potential implementation of
core::include
:(RFC note: this example uses unstable and even unimplemented features for clarity. However, this RFC in no way requires these features to be useful on its own.)
Reference-level explanation
If a file read is unsuccessful, an encoding of the responsible
io::Error
is passed over the RPC bridge. If a file is successfully read but fails to lex,ErrorKind::Other
is returned.None of these three APIs should ever cause compilation to fail. It is the responsibility of the proc macro to fail compilation if a failed file read is fatal.
Drawbacks
This is more API surface for the
proc_macro
crate, and theproc_macro
bridge is already complicated. Additionally, this is likely to lead to more proc macros which read external files. Moving the handling ofinclude!
-like macros later in the compiler pipeline likely is also significantly more complicated than the currentinclude!
implementation.Alternatives
proc_macro::tracked_path
(unstable)This just tells the proc_macro driver that the proc macro has a dependency on the given path. This is sufficient for tracking the file, as the proc macro can just also read the file itself, but lacks the ability to require the proc macro go through this API, or to provide spans for errors.
Meaningfully, it'd be nice to be able to sandbox proc macros in wasm à la watt while still having proc macros capable of reading the filesystem (in a proc_macro driver controlled manner).
A custom error wrapper would provide a point to attach more specific error information than just an
io::Error
, such as the lexer error encountered byinclude
. This RFC opts to useio::Error
directly to provide a more minimal API surface.Returning
Literal::string
frominclude_str
andVec<u8>
frominclude_bytes
implies that the entire included file must be read into memory managed by the Rust global allocator. Alternatively, a more abstract buffer type could be used which allows more efficiently working with very large files that could be instead e.g. memmapped rather than read into a buffer.This would likely look like
LiteralString
andLiteralBytes
types in theproc_macro
bridge, but this RFC opts to use the existingLiteral
andVec<u8>
to provide a more minimal API surface.Proc macros can continue to read files and use
include_str!
to indicate a build dependency. This is error prone, easy to forget to do, and all around not a great experience.Prior art
No known prior art.
Unresolved questions
It would be nice for
include
to allow emitting a useful lexer error directly. This is not currently provided for by the proposed API.include!
sets the "current module path" for the included code. It's unclear how this should behave forproc_macro::include
, and whether this behavior should be replicated at all.Should
include_str
get source code normalization (i.e.\r\n
to\n
)?include_str!
deliberately includes the string exactly as it appears on disk, and the purpose of these APIs is to provide post-processing steps, which could need the file to be reproduced exactly, so the answer is likely no, and the producedLiteral
should represent the exact contents of the file.What base directory should relative paths be resolved from? The two reasonable answers are
include!
is relative to in the source file expanding the macro.fs
is relative to in the proc macro execution.Both have their merits and drawbacks.
Unknown unknowns.
Future possibilities
Future expansion of the proc macro APIs are almost entirely orthogonal from this feature. As such, here is a small list of potential uses for this API:
include!
ed assetsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense for
include_bytes
to returnLiteral
as well, or would that not be possible?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should work because
Literal
can be a byte string.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, yeah, I overlooked that possibility.
The main limitation is that the only current interface for getting the contents out of a
Literal
is toToString
it.syn
does have a.value()
forLitByteStr
as well asLitStr
, though, so I guess it's workable.It's probably not good to short term require debug escaping a binary file to reparse the byte string literal if a proc macro is going to post process the file... but if it's just including the literal, it can put the
Literal
in the token stream, and we can offer ways to extract (byte) string literals without printing the string literal in the future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one limitation which needs to be solved is how do spans work. Do we just say that the byte string literal contains the raw bytes of the file (even though that would be illegal in a normal byte string, and invalid UTF-8), maybe as a new "kind" of byte string, so span offsets are mapped directly with the source file? Or are there multiple span positions (representing a
\xNN
in the byte string) which map to a single byte in the source file?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what bytes are not allowed in byte string literals? Does the literal itself have to be valid UTF-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Rust source file must be valid UTF-8. Thus, the contents of a byte string literal in the source must be valid UTF-8.
Bytes that are not < 0x80 thus must be escaped to appear in a byte string literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then another question that's worth making explicit: what does it even mean for rustc to report a span into a binary file?
I think binary includes are better served by a different API that lets rustc point into generated code, rather than trying to point into an opaque binary file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to support both options would be to take a
Span
that the path is relative to. Then it would make multi-level includes easier (the macro includes a path relative to the Rust source file, then the included file references another relative file so that needs to be included based on theSpan
from the firstproc_macro::include_str
call).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would
Span::mixed_site
be relative to?Also, that would kinda soft-block the feature onthough I suppose requiring a span would be strictly more powerful thanSpan::def_site
, while the RFC is currently written such that additional unstable features (such as span subslicing) are incremental improvements not required for the functionality to be useful...include!
-style base path, so that fits into the same category.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it should just behave the exact same as a
include_str!("..")
macro invocation whose tokens carry a mixed_site span.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat surprisingly, this looks for a file called
"a"
relative to the file in whichx!()
is invoked, not relative to the file that contains the definition above.