Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding CrateId Metadata Hash method for Bazel #365

Merged
merged 17 commits into from
Sep 13, 2024

Conversation

irajtaghlidi
Copy link
Contributor

@irajtaghlidi irajtaghlidi commented Aug 14, 2024

The StableCrateId is a 64-bit hash used to identify different crates with potentially the same name. It is a hash of the crate name and all the -C metadata CLI options computed in StableCrateId::new. It is used in a variety of places, such as symbol name mangling, crate loading, and much more.

By default, all Rust symbols are mangled and incorporate the stable crate id. This allows multiple versions of the same crate to be included together. Cargo automatically generates -C metadata hashes based on a variety of factors, like the package version, source, and the target kind (a lib and test can have the same crate name, so they need to be disambiguated).

In Cargo, we can find this hash value as part of OUT_DIR value. Like ./target/debug/build/mbedtls-664cfd9779753e4e/out/.
This hash value would be used to provide to rustc later like:
LD_LIBRARY_PATH='/home/iraj/playground/target/debug/deps:/usr/lib' OUT_DIR=/home/iraj/playground/target/debug/build/mbedtls-platform-support-61808abfaac0885d/out rustc --crate-name mbedtls_platform_support --edition=2018 /home/iraj/.cargo/git/checkouts/rust-mbedtls-d7cb4be7371ceda7/b3a6f77/mbedtls-platform-support/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 --cfg 'feature="std"' -C metadata=7124dd98a051d608 ......

Since Bazel acts differently we need to generate a similar hash value to provide to rustc. Also, we detect the build system by comparing the OUT_DIR value.

To be able to build rust-mbedtls with Bazel we need more changes that will register in the following PRs and this PR is just focused on solving the CrateId Metadata Hash issue.

@Pagten
Copy link
Collaborator

Pagten commented Aug 14, 2024

@irajtaghlidi can you add a description to the PR, describing what changes are being made (at a high level) and why they are being made.

Copy link
Collaborator

@Taowyoo Taowyoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@irajtaghlidi
Thank you for creating this PR.
As @Pagten mentioned, could you add summary of changes in this PR in PR description?

Also, I want to learn more about the context? I assume these changes are used for build this crate with Bazel. Could you provide some links to document about it.

mbedtls/build/crate_id.rs Outdated Show resolved Hide resolved
mbedtls/build/crate_id.rs Outdated Show resolved Hide resolved
mbedtls/build/build.rs Outdated Show resolved Hide resolved
@Pagten
Copy link
Collaborator

Pagten commented Aug 16, 2024

Context

Here's some additional context information about this PR.

rustc takes a -Cmetadata argument. From this argument and other properties, rustc calculates a "Stable Crate Id" hash, which it uses for a variety of things, including symbol name mangling.

Cargo generates the -Cmetadata argument for rustc based on a variety of factors, like the package version, source, and the target kind.

Rust-mbedtls defines a couple of symbols in C code. To allow a crate to depend on multiple versions of rust-mbedtls, all symbols need to be unique. Rust-mbedtls ensures this uniqueness for the C symbols by (at build time, using its build script) fetching the -Cmetadata value generated by Cargo and appending that value to the C code symbol names.

Unfortunately, there is no "official" way for a build script to get the -Cmetadata value from Cargo. To work around this, the rust-mbedtls build script extracts the medata value from the crate's directory path in the target directory, via the OUT_DIR env variable. That path contains a component of the form mbedtls-{metadata-hash}, from which the {metadata-hash} component is extracted.

However, because the structure of the OUT_DIR path is not an official part of the interface between Cargo and build scripts:

  1. The path structure can change in future versions of Cargo.
  2. Bazel's rules_rust does not use the same path structure.

This is one of the reasons rust-mbedtls currently cannot be compiled using Bazel and that is what this PR tries to address.

Approach of the PR

The current approach of the PR is to have the rust-mbedtls build script fall back to a different way of obtaining a -Cmetadata value when it detects that it is being compiled using Bazel. The way that the value is being calculated is:

  1. Grab the compiler version using rustc_version::version().
  2. Grab the package version using the CARGO_PKG_VERSION env var.
  3. Use a copy of the StableCrateId algorithm from rustc to hash those two components.

Review comments

I think the use of StableCrateId (step 3 above) does not really make sense. The -Cmetadata value that the build script is trying to obtain is an input to the stable crate ID generated by rustc, not an output thereof. Also, it is not important to end up with exactly the same -Cmetadata for a Bazel build as the one that Cargo would generate, so there's no need to add complexity to the build script for trying to mimic Cargo's metadata value generation algorithm.

The metadata value that is generated in the current version of the PR is based only on the compiler version and the crate version. The value generated by Cargo is based on a lot more components: the crate name, the source of the crate (crates.io or otherwise), enabled feature flags, metadata of the crate's dependencies, compilation profile, compilation target, etc. If the goal is to generate a generic metadata value that can be used for various purposes, all of these components need to be included into it. But the rust-mbedtls build script only uses the metadata value for a specific purpose: to make the C code symbols unique, to allow multiple versions of the crate to be included in the same binary. Hence it can be okay to base the symbol suffix on fewer components.

Cargo's dependency resolution algorithm unifies dependencies that have the same crate name, a semver-compatible version and the same source (the source is, e.g., crates.io, git, local, ...). So, for Cargo builds, it would be sufficient to base the symbol suffix on the crate name, crate version and the crate's source.

Rules_rust's dependency resolution algorithm is slightly different from that of Cargo. It does not take the source of a crate into account for dependency unification. This means it is not possible to have, for example, a version of a crate from crates.io in the dependency graph together with the same version of the same crate from github (this may be considered a bug/limitation of rules_rust). So, for Bazel builds, it is sufficient to base the symbol suffix on the crate name and crate version.

Therefore, I propose we make the following changes:

  1. We rename the RUST_MBEDTLS_METADATA_HASH env var to something more specific, like RUST_MBEDTLS_SYMBOL_SUFFIX.
  2. For Cargo builds, we keep the existing implementation. I.e., we keep using the metadata value extracted from the OUT_DIR path as the value for RUST_MBEDTLS_SYMBOL_SUFFIX.
  3. For Bazel builds, we hash the value of the CARGO_PKG_NAME and CARGO_PKG_VERSION env variables (e.g., using DefaultHasher), convert the output to a valid symbol suffix (e.g., by formatting the u64 output of the hasher as hex) and use that as the value for RUST_MBEDTLS_SYMBOL_SUFFIX.

Happy to hear your thought on this, @irajtaghlidi and other reviewers.

@jethrogb
Copy link
Member

jethrogb commented Aug 16, 2024

I agree with Pieter on the shortcomings mentioned.

Can you add a comment in mbedtls_malloc.c why these namespaced forward functions are needed? I don't really remember, but I think it's because mbedtls_calloc and friends are C macros that can't be properly resolved by bindgen.

mbedtls/build.rs Outdated Show resolved Hide resolved
mbedtls/build.rs Outdated Show resolved Hide resolved
mbedtls/build.rs Outdated Show resolved Hide resolved
mbedtls/build.rs Outdated Show resolved Hide resolved
@irajtaghlidi irajtaghlidi force-pushed the bazel-master branch 2 times, most recently from 0aabd80 to bbc892c Compare August 27, 2024 09:25
mbedtls/build.rs Outdated Show resolved Hide resolved
@Pagten
Copy link
Collaborator

Pagten commented Aug 27, 2024

I agree with Pieter on the shortcomings mentioned.

Can you add a comment in mbedtls_malloc.c why these namespaced forward functions are needed? I don't really remember, but I think it's because mbedtls_calloc and friends are C macros that can't be properly resolved by bindgen.

Indeed, depending on how mbedtls is configured, mbedtls_calloc and mbedtls_free are either proper symbols or are macros that evaluate to calloc and free respectively. I'm not exactly sure why, but Bindgen indeed doesn't generate any bindings for these macros. There are various open issues for Bindgen related to macros.

That said, does rust-mbedtls really support mbedtls configurations where mbedtls_calloc and mbedtls_free do not evaluate to calloc and free? If not, then couldn't we just have this in alloc.rs?

extern "C" {
    #[link_name = "free"]
    pub(crate) fn mbedtls_free(n: *mut mbedtls_sys::types::raw_types::c_void);

    #[link_name = "calloc"]
    pub(crate) fn mbedtls_calloc(
        n: mbedtls_sys::types::size_t,
        size: mbedtls_sys::types::size_t,
    ) -> *mut mbedtls_sys::types::raw_types::c_void;
}

Cargo.lock Outdated Show resolved Hide resolved
Cargo.lock Outdated Show resolved Hide resolved
Pagten
Pagten previously approved these changes Sep 4, 2024
@irajtaghlidi
Copy link
Contributor Author

bors r+

@Pagten
Copy link
Collaborator

Pagten commented Sep 4, 2024

bors r+

@Pagten
Copy link
Collaborator

Pagten commented Sep 4, 2024

@Taowyoo I see you've merge all recent PRs on this repo. Aren't we using bors anymore?

@Pagten
Copy link
Collaborator

Pagten commented Sep 4, 2024

Ah I see the bors app is no longer running: https://github.com/apps/bors .

@Taowyoo can you merge this PR to master?

@jethrogb
Copy link
Member

jethrogb commented Sep 4, 2024

That said, does rust-mbedtls really support mbedtls configurations where mbedtls_calloc and mbedtls_free do not evaluate to calloc and free?

That is the intention, yes.

mbedtls/build.rs Outdated
let crate_ = out_dir_it.next().unwrap().to_string_lossy();
assert!(crate_.starts_with("mbedtls-"));
return crate_[8..].to_owned();
} else if next == "_bs.out_dir" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check correct?

When I do a build using Bazel, my OUT_DIR env var is set as, e.g.,:

OUT_DIR='${pwd}/bazel-out/k8-fastbuild/bin/external/crate_index__mbedtls-sys-auto-2.26.1/mbedtls-sys-auto_bs.out_dir'

There is no _bs.out_dir component in that path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen that format. In my setup with the latest version of bazel/Rules_rust the OUT_DIR is in this format:
OUT_DIR='${pwd}/bazel-out/k8-fastbuild/bin/external/crate_index__mbedtls-0.12.3/_bs.out_dir' \
Could you please provide your WORKSPACE/Branch to reproduce it?

@Pagten Pagten self-requested a review September 4, 2024 14:00
@Pagten Pagten dismissed their stale review September 4, 2024 14:01

Remove approval

@Taowyoo Taowyoo enabled auto-merge (squash) September 4, 2024 16:53
@Taowyoo
Copy link
Collaborator

Taowyoo commented Sep 4, 2024

Hi @Pagten , I guess there are still some comments need to be resolved?
After you approve this PR, you should be able to enable auto-merge(CI is running) or merge the PR (CI is passed)

@Taowyoo
Copy link
Collaborator

Taowyoo commented Sep 4, 2024

@Taowyoo I see you've merge all recent PRs on this repo. Aren't we using bors anymore?

Yes, bors are removed in all active repo we maintained, as well as Travis CI

mbedtls/build.rs Outdated
Comment on lines 35 to 37
let path_to_hash = Path::new(out_dir_str.as_ref());
let parts_to_hash = path_to_hash
.to_str()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially left from the previous method. Fixed now.

mbedtls/build.rs Outdated
Comment on lines 26 to 46
let last_part = out_dir_it.next().unwrap();

if last_part == "out" {
// If Cargo is used as build system.
let crate_ = out_dir_it.next().unwrap().to_string_lossy();
assert!(crate_.starts_with("mbedtls-"), "Expected directory to start with 'mbedtls-'");
return crate_[8..].to_owned(); // Return the part after "mbedtls-"
} else if out_dir_str.contains("bazel-out") {
// If Bazel is used as build system.
let parts_to_hash = out_dir_str
.split("bazel-out")
.nth(1)
.expect("Invalid Bazel out dir structure");

let string_to_hash = format!("bazel-out{}", parts_to_hash);
let mut hasher = DefaultHasher::new();
string_to_hash.hash(&mut hasher);
let hash = hasher.finish();
return format!("{:x}", hash);
} else {
panic!("Unexpected directory structure: {:?}", out_dir_str);
Copy link
Member

@jethrogb jethrogb Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let last_part = out_dir_it.next().unwrap();
if last_part == "out" {
// If Cargo is used as build system.
let crate_ = out_dir_it.next().unwrap().to_string_lossy();
assert!(crate_.starts_with("mbedtls-"), "Expected directory to start with 'mbedtls-'");
return crate_[8..].to_owned(); // Return the part after "mbedtls-"
} else if out_dir_str.contains("bazel-out") {
// If Bazel is used as build system.
let parts_to_hash = out_dir_str
.split("bazel-out")
.nth(1)
.expect("Invalid Bazel out dir structure");
let string_to_hash = format!("bazel-out{}", parts_to_hash);
let mut hasher = DefaultHasher::new();
string_to_hash.hash(&mut hasher);
let hash = hasher.finish();
return format!("{:x}", hash);
} else {
panic!("Unexpected directory structure: {:?}", out_dir_str);
let mut out_dir_it_rev = out_dir.iter().rev();
let mut out_dir_it = out_dir.iter();
if out_dir_it_rev.next().map_or(false, |p| p == "out") {
// If Cargo is used as build system.
let crate_ = out_dir_it_rev.next().unwrap().to_string_lossy();
assert!(crate_.starts_with("mbedtls-"), "Expected directory to start with 'mbedtls-'");
return crate_[8..].to_owned(); // Return the part after "mbedtls-"
} else if out_dir_it.position(|p| p == "bazel-out").is_some() {
// If Bazel is used as build system.
let mut hasher = DefaultHasher::new();
for p in out_dir_it {
p.hash(&mut hasher);
}
let hash = hasher.finish();
return format!("{:016x}", hash);
} else {
panic!("unexpected OUT_DIR format: {}", out_dir.to_string_lossy());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better implementation. Thanks.

# Configuration: c2b2998e956e739e6041daea0eaadcf22e7f68c3e819bfbfc43c92b444190896
# Execution platform: //:docker_image_platform
[237 / 238] Compiling Rust bin service1 (1 files); 0s remote, remote-cache
[237 / 238] Compiling Rust bin service1 (1 files); 10s remote, remote-cache
[237 / 238] Compiling Rust bin service1 (1 files); Downloading service1/service1; 16s remote, remote-cache
INFO: Found 4 targets...
INFO: Elapsed time: 62.137s, Critical Path: 53.13s
INFO: 10 processes: 1 internal, 9 remote.
INFO: Build completed successfully, 10 total actions

@Taowyoo Taowyoo merged commit 7116abe into fortanix:master Sep 13, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants