Skip to content
P-E-P edited this page Jul 26, 2023 · 7 revisions

Procedural macros (often abbreviated proc macros) are a mechanism in rust that takes input code, modify it and output valid rust code. They differ from MBEs.

A procedural macro shall be declared in an external crate rather than directly in the crate containing the code to modify, this implies at least two compilation passes, first the procedural macro crate is compiled then the actual code requiring the macro. Note that multiple procedural macros may live in the same crate.

Procedural macros must reside in the root of their crate.

Procedural macros details

There are three kinds of procedural macros that should be used depending on the intent as well as the context.

trait Titi {}

#[derive(Titi)] // Derive proc macro invocation.
struct Toto;

#[tata] // Attribute proc macro.
fn test() {
  tutu!(); // Bang/function like proc macro. same as "regular macros"!
}

Those macros are compiled as a shared object that is then dynamically loaded during the code expansion pass of the compiler. In gccrs the compiler converts the part of the ast that should be expanded back to tokens. Then it converts those tokens to procedural macro types which are very similar to tokens. Those types are contained in a TokenStream structure akin of std::vector<Token>.

Attribute

This kind of procedural macro can be found on items, trait implementations as well as trait definitions.

#[proc_macro_attribute]
pub fn my_attribute_proc_macro(attr: TokenStream, item: TokenStream) -> TokenStream

Custom derive

Custom derive procedural macros allows auto implementation of a given trait. That trait name is defined in the macro definition's attribute.

#[proc_macro_derive(TraitName)]
pub fn my_derive_proc_macro(item: TokenStream) -> TokenStream

Some attributes may even be added in the following manner:

#[proc_macro_derive(HelperAttr, attributes(helper))]
pub fn my_derive_proc_macro(item: TokenStream) -> TokenStream

Function like

#[proc_macro]
pub fn my_function_proc_macro(item: TokenStream) -> TokenStream

Note that even though function like and derive procedural macros share the same function prototype, it is not possible to annotate a function as both.

Macro priority

Macros are expanded from the outermost macro to the innermost one (lazily). So in the following situation:

#[alpha]
#[beta]
pub fn order() -> i32 {
    42
}

alpha will see the beta attribute but not itself, while beta won't see alpha.

Multiple derive macros in the same directive such as in the following snippet will be applied from left to right:

#[derive(Gamma, Iota, Mu, Gamma)]
union TUnion {
  toto: usize,
  tata: f32,
}

As they are part of the same group, Iota will not see any call to any Mu nor Gamma. Gamma will be called twice.

Macro input/output

Macro input shall remain valid rust tokens, it may not be valid rust code but it shall still be lexable. If we try to dump the input with the following macro well's see that tokens are not String but rather seemingly complex enumerations:

#[proc_macro_attribute]
pub fn show_content_types(_attr: TokenStream, item: TokenStream) -> TokenStream {
    println!("{:?}", item);
    item
}
TokenStream [Ident { ident: "pub", span: #0 bytes(387..390) }, Ident { ident: "fn", span: #0 bytes(391..393) }, Ident { ident: "example", span: #0 bytes(394..401) }, Group { delimiter: Parenthesis, stream: TokenStream [], span: #0 bytes(401..403) }, Group { delimiter: Brace, stream: TokenStream [Ident { ident: "let", span: #0 bytes(410..413) }, Ident { ident: "a", span: #0 bytes(414..415) }, Punct { ch: '=', spacing: Alone, span: #0 bytes(416..417) }, Literal { kind: Float, symbol: "3.14", suffix: Some("f64"), span: #0 bytes(418..425) }, Punct { ch: ';', spacing: Alone, span: #0 bytes(425..426) }], span: #0 bytes(404..428) }]

Those types are defined in the proc_macro crate.

  • TokenTree - Tagged union of a Punct, Ident, Group or Literal (see below).
  • Punct - Single punctuation characters like ;, <, =. May form "complex" punct (eg. << == < + <).
  • Ident - Identifiers such as variable names, true, false, _ and reserved keywords (pub. as, async...)
  • Group - Container for a TokenStream that may be enclosed with delimiters ((, {, [ ). A group with no delimiter is valid.
  • TokenStream - vector like structure containing multiple TokenTree.

proc_macro crate

Users are interacting with those types and their functions directly, this means we cannot break this API, it shall stay the same as rustc as we want any valid code accepted by rustc also accepted by our compiler. There is a major problem though, this crate is closely related to rustc. Some internal types in this crate are tied to rustc's implementation.

We cannot bypass this situation as this crate is not only used by the user's procedural macro, but also by rustc. The latter uses it for the definition of the various types that can be found in the API.

Since we cannot use the proc_macro crate directly with gccrs, we need to create our own proc_macro crate so the compiler and the user's procedural macro can retrieve the types definition and their associated functions.

The user's procedural macro will be written in rust, our proc_macro crate shall thus expose Rust function and types. But our compiler is written in C++, and can therefore only understand C/C++ types and functions. That's why we've created a "compatibility layer" through FFI.

The user's procedural macro will be statically linked against a proc_macro library written in rust which is itself linked against an internal proc_macro library which could be used by the compiler.

flowchart LR

U[my user procedural macro] -->|linked against| R[rust interface]
R -->|linked against| C[libproc_macro cpp]
G[gccrs] -->|linked against| C
Loading

Right now there is a C++ library named libproc_macro that should be renamed to something such as libproc_macro_internal, whilst the rust directory inside shall lives on its own under the libproc_macro denomination.

Currently there is some unimplemented functions in libproc_macro to convert a given string to a TokenStream type as defined in the proc_macro crate because they need to lex/parse the string. It will be required to split the lexer/parser from gccrs and put it as it's own module so we could implement those functions.

As so, the final organization might look more like this:

flowchart LR

U([my user procedural macro]) -->|linked against| R[rust interface]
R -->|linked against| C[libproc_macro cpp]
G[gccrs] -->|linked against| C
G -->|dynamically load| U
Loading

Interactions

  • The compiler interacts with a user procedural macro through calls to dlopen and dlsym.
  • The user's procedural macro interacts with proc_macro types through rust calls to the libproc_macro library.
  • The proc_macro library, acting as an interface interacts with the "internal" procedural macro library through FFI.
  • The internal proc_macro library interacts with the parser/lexer through C++ function calls.
sequenceDiagram

GCCRS->>myprocmacro: Send a tokenstream
Note right of GCCRS: Transfer using dlopen
myprocmacro-->>GCCRS: Give back a tokenstream
loop CodeExpansion
    myprocmacro->>myprocmacro: Process tokenstream and construct output tokenstream
    libproc_macro->>myprocmacro: Provides tokenstream and other rust types 
end

loop TokenConversion
    GCCRS->>GCCRS: Convert tokens to libproc_macro types back and forth
    libproc_macro_internal->>GCCRS: Provides cpp types for dlopen mechanism
end

libproc_macro->>libproc_macro_internal: Request allocations
libproc_macro_internal-->>libproc_macro: Return allocated type

Loading