Skip to content

Commit

Permalink
fix: don't require system Python to perform bootstrapping (#1929)
Browse files Browse the repository at this point in the history
This is a pretty major, but surprisingly not that invasive, overhaul of
how binaries
are started. It fixes several issues and lays ground work for future
improvements.

In brief:

* A system Python is no longer needed to perform bootstrapping.
* Errors due to `PYTHONPATH` exceeding environment variable size limits
is no
  longer an issue.
* Coverage integration is now cleaner and more direct.
* The zipapp `__main__.py` entry point generation is separate from the
Bazel
  binary bootstrap generation.
* Self-executable zips now have actual bootstrap logic.

The way all of this is accomplished is using a two stage bootstrap
process. The first
stage is responsible for locating the interpreter, and the second stage
is responsible
for configuring the runtime environment (e.g. import paths). This allows
the first
stage to be relatively simple (basically find a file in runfiles), so
implementing it
in cross-platform shell is feasible. The second stage, because it's
running under the
desired interpreter, can then do things like setting up import paths,
and use the
`runpy` module to call the program's real main.

This also fixes the issue of long `PYTHONPATH` environment variables
causing an error.
Instead of passing the import paths using an environment variable, they
are embedded
into the second stage bootstrap, which can then add them to sys.path.

This also switches from running coverage as a subprocess to using its
APIs directly.
This is possible because of the second stage bootstrap, which can rely
on
`import coverage` occurring in the correct environment.

This new bootstrap method is disabled by default. It can be enabled by
setting
`--@rules_python//python/config_settings:bootstrap_impl=two_stage`. Once
the new APIs
are released, a subsequent release will make it the default. This is to
allow easier
upgrades for people defining their own toolchains.

The two-stage bootstrap ignores errors during lcov report generation,
which
partially addresses
#1434

Fixes #691

* Also fixes some doc cross references.
* Also fixes the autodetecting toolchain and directs our alias to it
  • Loading branch information
rickeylev authored Jun 2, 2024
1 parent b4b52fc commit f5b19dc
Show file tree
Hide file tree
Showing 27 changed files with 1,440 additions and 81 deletions.
15 changes: 14 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
:::{default-domain} bzl
:::

# rules_python Changelog

This is a human-friendly changelog in a keepachangelog.com style format.
Expand Down Expand Up @@ -31,7 +34,7 @@ A brief description of the categories of changes:
marked as `reproducible` and will not include any lock file entries from now
on.

* (gazelle): Remove gazelle plugin's python deps and make it hermetic.
* (gazelle): Remove gazelle plugin's python deps and make it hermetic.
Introduced a new Go-based helper leveraging tree-sitter for syntax analysis.
Implemented the use of `pypi/stdlib-list` for standard library module verification.

Expand Down Expand Up @@ -80,6 +83,16 @@ A brief description of the categories of changes:
invalid usage previously but we were not failing the build. From now on this
is explicitly disallowed.
* (toolchains) Added riscv64 platform definition for python toolchains.
* (rules) A new bootstrap implementation that doesn't require a system Python
is available. It can be enabled by setting
{obj}`--@rules_python//python:config_settings:bootstrap_impl=two_phase`. It
will become the default in a subsequent release.
([#691](https://github.com/bazelbuild/rules_python/issues/691))
* (providers) `PyRuntimeInfo` has two new attributes:
{obj}`PyRuntimeInfo.stage2_bootstrap_template` and
{obj}`PyRuntimeInfo.zip_main_template`.
* (toolchains) A replacement for the Bazel-builtn autodetecting toolchain is
available. The `//python:autodetecting_toolchain` alias now uses it.

[precompile-docs]: /precompiling

Expand Down
1 change: 1 addition & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ Issues should be triaged as follows:
functionality, should also be filed in this repository but without the
`core-rules` label.

(breaking-changes)=
## Breaking Changes

Breaking changes are generally permitted, but we follow a 3-step process for
Expand Down
31 changes: 31 additions & 0 deletions docs/sphinx/api/python/config_settings/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
:::{default-domain} bzl
:::
:::{bzl:currentfile} //python/config_settings:BUILD.bazel
:::

Expand Down Expand Up @@ -66,3 +68,32 @@ Values:
* `include_pyc`: Include `PyInfo.transitive_pyc_files` as part of the binary.
* `disabled`: Don't include `PyInfo.transitive_pyc_files` as part of the binary.
:::

::::{bzl:flag} bootstrap_impl
Determine how programs implement their startup process.

Values:
* `system_python`: Use a bootstrap that requires a system Python available
in order to start programs. This requires
{obj}`PyRuntimeInfo.bootstrap_template` to be a Python program.
* `script`: Use a bootstrap that uses an arbitrary executable script (usually a
shell script) instead of requiring it be a Python program.

:::{note}
The `script` bootstrap requires the toolchain to provide the `PyRuntimeInfo`
provider from `rules_python`. This loosely translates to using Bazel 7+ with a
toolchain created by rules_python. Most notably, WORKSPACE builds default to
using a legacy toolchain built into Bazel itself which doesn't support the
script bootstrap. If not available, the `system_python` bootstrap will be used
instead.
:::

:::{seealso}
{obj}`PyRuntimeInfo.bootstrap_template` and
{obj}`PyRuntimeInfo.stage2_bootstrap_template`
:::

:::{versionadded} 0.33.0
:::

::::
20 changes: 20 additions & 0 deletions docs/sphinx/api/python/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
:::{default-domain} bzl
:::
:::{bzl:currentfile} //python:BUILD.bazel
:::

Expand All @@ -21,3 +23,21 @@ provides:
* `PyRuntimeInfo`: The consuming target's target toolchain information

:::

::::{target} autodetecting_toolchain

A simple toolchain that simply uses `python3` from the runtime environment.

Note that this toolchain provides no build-time information, which makes it of
limited utility.

This is only provided to aid migration off the builtin Bazel toolchain
(`@bazel_tools//python:autodetecting_toolchain`), and is largely only applicable
to WORKSPACE builds.

:::{deprecated} unspecified

Switch to using a hermetic toolchain or manual toolchain configuration instead.
:::

::::
5 changes: 3 additions & 2 deletions docs/sphinx/bazel_inventory.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ bool bzl:type 1 rules/lib/bool -
int bzl:type 1 rules/lib/int -
depset bzl:type 1 rules/lib/depset -
dict bzl:type 1 rules/lib/dict -
label bzl:doc 1 concepts/labels -
label bzl:type 1 concepts/labels -
attr.bool bzl:type 1 rules/lib/toplevel/attr#bool -
attr.int bzl:type 1 rules/lib/toplevel/attr#int -
attr.label bzl:type 1 rules/lib/toplevel/attr#label -
Expand All @@ -21,6 +21,7 @@ list bzl:type 1 rules/lib/list -
python bzl:doc 1 reference/be/python -
str bzl:type 1 rules/lib/string -
struct bzl:type 1 rules/lib/builtins/struct -
target-name bzl:doc 1 concepts/labels#target-names -
Name bzl:type 1 concepts/labels#target-names -
CcInfo bzl:provider 1 rules/lib/providers/CcInfo -
CcInfo.linking_context bzl:provider-field 1 rules/lib/providers/CcInfo#linking_context -
ToolchainInfo bzl:type 1 rules/lib/providers/ToolchainInfo.html -
2 changes: 1 addition & 1 deletion docs/sphinx/pip.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ ARG=$1 # but we don't do anything with it as it's always "get"
# formatting is optional
echo '{'
echo ' "headers": {'
echo ' "Authorization": ["Basic dGVzdDoxMjPCow=="]
echo ' "Authorization": ["Basic dGVzdDoxMjPCow=="]'
echo ' }'
echo '}'
```
Expand Down
3 changes: 2 additions & 1 deletion docs/sphinx/support.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ incremental fashion.

Breaking changes are allowed, but follow a process to introduce them over
a series of releases to so users can still incrementally upgrade. See the
[Breaking Changes](contributing#breaking-changes) doc for the process.
[Breaking Changes](#breaking-changes) doc for the process.


## Experimental Features

Expand Down
23 changes: 22 additions & 1 deletion docs/sphinx/toolchains.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
:::{default-domain} bzl
:::

# Configuring Python toolchains and runtimes

This documents how to configure the Python toolchain and runtimes for different
Expand Down Expand Up @@ -193,7 +196,7 @@ load("@rules_python//python:repositories.bzl", "py_repositories")
py_repositories()
```

#### Workspace toolchain registration
### Workspace toolchain registration

To register a hermetic Python toolchain rather than rely on a system-installed interpreter for runtime execution, you can add to the `WORKSPACE` file:

Expand Down Expand Up @@ -221,3 +224,21 @@ pip_parse(
After registration, your Python targets will use the toolchain's interpreter during execution, but a system-installed interpreter
is still used to 'bootstrap' Python targets (see https://github.com/bazelbuild/rules_python/issues/691).
You may also find some quirks while using this toolchain. Please refer to [python-build-standalone documentation's _Quirks_ section](https://gregoryszorc.com/docs/python-build-standalone/main/quirks.html).

## Autodetecting toolchain

The autodetecting toolchain is a deprecated toolchain that is built into Bazel.
It's name is a bit misleading: it doesn't autodetect anything. All it does is
use `python3` from the environment a binary runs within. This provides extremely
limited functionality to the rules (at build time, nothing is knowable about
the Python runtime).

Bazel itself automatically registers `@bazel_tools//python:autodetecting_toolchain`
as the lowest priority toolchain. For WORKSPACE builds, if no other toolchain
is registered, that toolchain will be used. For bzlmod builds, rules_python
automatically registers a higher-priority toolchain; it won't be used unless
there is a toolchain misconfiguration somewhere.

To aid migration off the Bazel-builtin toolchain, rules_python provides
{obj}`@rules_python//python:autodetecting_toolchain`. This is an equivalent
toolchain, but is implemented using rules_python's objects.
44 changes: 38 additions & 6 deletions examples/bzlmod/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

import os
import pathlib
import re
import sys
import unittest

Expand Down Expand Up @@ -63,16 +64,47 @@ def test_coverage_sys_path(self):
first_item.endswith("coverage"),
f"Expected the first item in sys.path '{first_item}' to not be related to coverage",
)

# We're trying to make sure that the coverage library added by the
# toolchain is _after_ any user-provided dependencies. This lets users
# override what coverage version they're using.
first_coverage_index = None
last_user_dep_index = None
for i, path in enumerate(sys.path):
if re.search("rules_python.*~pip~", path):
last_user_dep_index = i
if first_coverage_index is None and re.search(
".*rules_python.*~python~.*coverage.*", path
):
first_coverage_index = i

if os.environ.get("COVERAGE_MANIFEST"):
self.assertIsNotNone(
first_coverage_index,
"Expected to find toolchain coverage, but "
+ f"it was not found.\nsys.path:\n{all_paths}",
)
self.assertIsNotNone(
first_coverage_index,
"Expected to find at least one uiser dep, "
+ "but none were found.\nsys.path:\n{all_paths}",
)
# we are running under the 'bazel coverage :test'
self.assertTrue(
"_coverage" in last_item,
f"Expected {last_item} to be related to coverage",
self.assertGreater(
first_coverage_index,
last_user_dep_index,
"Expected coverage provided by the toolchain to be after "
+ "user provided dependencies.\n"
+ f"Found coverage at index: {first_coverage_index}\n"
+ f"Last user dep at index: {last_user_dep_index}\n"
+ f"Full sys.path:\n{all_paths}",
)
self.assertEqual(pathlib.Path(last_item).name, "coverage")
else:
self.assertFalse(
"coverage" in last_item, f"Expected coverage tooling to not be present"
self.assertIsNone(
first_coverage_index,
"Expected toolchain coverage to not be present\n"
+ f"Found coverage at index: {first_coverage_index}\n"
+ f"Full sys.path:\n{all_paths}",
)

def test_main(self):
Expand Down
8 changes: 3 additions & 5 deletions python/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ that @rules_python//python is only concerned with the core rules.
"""

load("@bazel_skylib//:bzl_library.bzl", "bzl_library")
load("//python/private:autodetecting_toolchain.bzl", "define_autodetecting_toolchain")
load(":current_py_toolchain.bzl", "current_py_toolchain")

package(default_visibility = ["//visibility:public"])
Expand Down Expand Up @@ -318,14 +319,11 @@ toolchain_type(
# safe if you know for a fact that your build is completely compatible with the
# version of the `python` command installed on the target platform.

alias(
name = "autodetecting_toolchain",
actual = "@bazel_tools//tools/python:autodetecting_toolchain",
)
define_autodetecting_toolchain(name = "autodetecting_toolchain")

alias(
name = "autodetecting_toolchain_nonstrict",
actual = "@bazel_tools//tools/python:autodetecting_toolchain_nonstrict",
actual = ":autodetecting_toolchain",
)

# ========= Packaging rules =========
Expand Down
9 changes: 9 additions & 0 deletions python/config_settings/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
load("@bazel_skylib//rules:common_settings.bzl", "string_flag")
load(
"//python/private:flags.bzl",
"BootstrapImplFlag",
"PrecompileAddToRunfilesFlag",
"PrecompileFlag",
"PrecompileSourceRetentionFlag",
Expand Down Expand Up @@ -52,3 +53,11 @@ string_flag(
# NOTE: Only public because its an implicit dependency
visibility = ["//visibility:public"],
)

string_flag(
name = "bootstrap_impl",
build_setting_default = BootstrapImplFlag.SYSTEM_PYTHON,
values = sorted(BootstrapImplFlag.__members__.values()),
# NOTE: Only public because its an implicit dependency
visibility = ["//visibility:public"],
)
45 changes: 45 additions & 0 deletions python/private/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -376,9 +376,54 @@ exports_files(
visibility = ["//visibility:public"],
)

filegroup(
name = "stage1_bootstrap_template",
srcs = ["stage1_bootstrap_template.sh"],
# Not actually public. Only public because it's an implicit dependency of
# py_runtime.
visibility = ["//visibility:public"],
)

filegroup(
name = "stage2_bootstrap_template",
srcs = ["stage2_bootstrap_template.py"],
# Not actually public. Only public because it's an implicit dependency of
# py_runtime.
visibility = ["//visibility:public"],
)

filegroup(
name = "zip_main_template",
srcs = ["zip_main_template.py"],
# Not actually public. Only public because it's an implicit dependency of
# py_runtime.
visibility = ["//visibility:public"],
)

# NOTE: Windows builds don't use this bootstrap. Instead, a native Windows
# program locates some Python exe and runs `python.exe foo.zip` which
# runs the __main__.py in the zip file.
alias(
name = "bootstrap_template",
actual = select({
":is_script_bootstrap_enabled": "stage1_bootstrap_template.sh",
"//conditions:default": "python_bootstrap_template.txt",
}),
# Not actually public. Only public because it's an implicit dependency of
# py_runtime.
visibility = ["//visibility:public"],
)

# Used to determine the use of `--stamp` in Starlark rules
stamp_build_setting(name = "stamp")

config_setting(
name = "is_script_bootstrap_enabled",
flag_values = {
"//python/config_settings:bootstrap_impl": "script",
},
)

print_toolchains_checksums(name = "print_toolchains_checksums")

# Used for py_console_script_gen rule
Expand Down
2 changes: 1 addition & 1 deletion python/private/autodetecting_toolchain.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def define_autodetecting_toolchain(name):
# buildifier: disable=native-py
py_runtime(
name = "_autodetecting_py3_runtime",
interpreter = ":py3wrapper.sh",
interpreter = "//python/private:autodetecting_toolchain_interpreter.sh",
python_version = "PY3",
stub_shebang = "#!/usr/bin/env python3",
visibility = ["//visibility:private"],
Expand Down
4 changes: 3 additions & 1 deletion python/private/common/common.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ def create_cc_details_struct(
cc_toolchain = cc_toolchain,
)

def create_executable_result_struct(*, extra_files_to_build, output_groups):
def create_executable_result_struct(*, extra_files_to_build, output_groups, extra_runfiles = None):
"""Creates a `CreateExecutableResult` struct.
This is the return value type of the semantics create_executable function.
Expand All @@ -192,13 +192,15 @@ def create_executable_result_struct(*, extra_files_to_build, output_groups):
included as default outputs.
output_groups: dict[str, depset[File]]; additional output groups that
should be returned.
extra_runfiles: A runfiles object of additional runfiles to include.
Returns:
A `CreateExecutableResult` struct.
"""
return struct(
extra_files_to_build = extra_files_to_build,
output_groups = output_groups,
extra_runfiles = extra_runfiles,
)

def union_attrs(*attr_dicts, allow_none = False):
Expand Down
Loading

0 comments on commit f5b19dc

Please sign in to comment.