From ad24cc332d81b0f6f50798b2a3947a9fc0029ea2 Mon Sep 17 00:00:00 2001
From: Pablo Galindo Salgado <Pablogsal@gmail.com>
Date: Mon, 9 Dec 2024 00:19:05 +0000
Subject: [PATCH] PEP 768: Safe external debugger interface for CPython (#4158)

---
 peps/pep-0768.rst | 351 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 351 insertions(+)
 create mode 100644 peps/pep-0768.rst

diff --git a/peps/pep-0768.rst b/peps/pep-0768.rst
new file mode 100644
index 00000000000..6d347ba3f25
--- /dev/null
+++ b/peps/pep-0768.rst
@@ -0,0 +1,351 @@
+PEP: 768
+Title: Safe external debugger interface for CPython
+Author: Pablo Galindo Salgado <pablogsal@python.org>, Matt Wozniski <godlygeek@gmail.com>, Ivona Stojanovic <stojanovic.i@hotmail.com>
+Status: Draft
+Type: Standards Track
+Created: 25-Nov-2024
+Python-Version: 3.14
+
+Abstract
+========
+
+This PEP proposes adding a zero-overhead debugging interface to CPython that
+allows debuggers and profilers to safely attach to running Python processes. The
+interface provides safe execution points for attaching debugger code without
+modifying the interpreter's normal execution path or adding runtime overhead.
+
+A key application of this interface will be enabling pdb to attach to live
+processes by process ID, similar to ``gdb -p``, allowing developers to inspect and
+debug Python applications interactively in real-time without stopping or
+restarting them.
+
+Motivation
+==========
+
+
+Debugging Python processes in production and live environments presents unique
+challenges. Developers often need to analyze application behavior without
+stopping or restarting services, which is especially crucial for
+high-availability systems. Common scenarios include diagnosing deadlocks,
+inspecting memory usage, or investigating unexpected behavior in real-time.
+
+Very few Python tools can attach to running processes, primarily because doing
+so requires deep expertise in both operating system debugging interfaces and
+CPython internals. While C/C++ debuggers like GDB and LLDB can attach to
+processes using well-understood techniques, Python tools must implement all of
+these low-level mechanisms plus handle additional complexity. For example, when
+GDB needs to execute code in a target process, it:
+
+1. Uses ptrace to allocate a small chunk of executable memory (easier said than done)
+2. Writes a small sequence of machine code - typically a function prologue, the
+   desired instructions, and code to restore registers
+3. Saves all the target thread's registers
+4. Changes the instruction pointer to the injected code
+5. Lets the process run until it hits a breakpoint at the end of the injected code
+6. Restores the original registers and continues execution
+
+Python tools face this same challenge of code injection, but with an additional
+layer of complexity. Not only do they need to implement the above mechanism,
+they must also understand and safely interact with CPython's runtime state,
+including the interpreter loop, garbage collector, thread state, and reference
+counting system. This combination of low-level system manipulation and
+deep domain specific interpreter knowledge makes implementing Python debugging tools
+exceptionally difficult.
+
+The few tools (see for example `DebugPy
+<https://github.com/microsoft/debugpy/blob/43f41029eabce338becbd1fa1a09727b3cfb1140/src/debugpy/_vendored/pydevd/pydevd_attach_to_process/linux_and_mac/attach.cpp#L4>`__
+and `Memray
+<https://github.com/bloomberg/memray/blob/main/src/memray/_memray/inject.cpp>`__)
+that do attempt this resort to suboptimal and unsafe methods,
+using system debuggers like GDB and LLDB to forcefully inject code. This
+approach is fundamentally unsafe because the injected code can execute at any
+point during the interpreter's execution cycle - even during critical operations
+like memory allocation, garbage collection, or thread state management. When
+this happens, the results are catastrophic: attempting to allocate memory while
+already inside ``malloc()`` causes crashes, modifying objects during garbage
+collection corrupts the interpreter's state, and touching thread state at the
+wrong time leads to deadlocks.
+
+Various tools attempt to minimize these risks through complex workarounds, such
+as spawning separate threads for injected code or carefully timing their
+operations or trying to select some good points to stop the process. However,
+these mitigations cannot fully solve the underlying problem: without cooperation
+from the interpreter, there's no way to know if it's safe to execute code at any
+given moment. Even carefully implemented tools can crash the interpreter because
+they're fundamentally working against it rather than with it.
+
+
+Rationale
+=========
+
+
+Rather than forcing tools to work around interpreter limitations with unsafe
+code injection, we can extend CPython with a proper debugging interface that
+guarantees safe execution. By adding a few thread state fields and integrating
+with the interpreter's existing evaluation loop, we can ensure debugging
+operations only occur at well-defined safe points. This eliminates the
+possibility of crashes and corruption while maintaining zero overhead during
+normal execution.
+
+The key insight is that we don't need to inject code at arbitrary points - we
+just need to signal to the interpreter that we want code executed at the next
+safe opportunity. This approach works with the interpreter's natural execution
+flow rather than fighting against it.
+
+After describing this idea to the PyPy development team, this proposal has
+already `been implemented in PyPy <https://github.com/pypy/pypy/pull/5135>`__,
+proving both its feasibility and effectiveness. Their implementation
+demonstrates that we can provide safe debugging capabilities with zero runtime
+overhead during normal execution.  The proposed mechanism not only reduces risks
+associated with current debugging approaches but also lays the foundation for
+future enhancements. For instance, this framework could enable integration with
+popular observability tools, providing real-time insights into interpreter
+performance or memory usage. One compelling use case for this interface is
+enabling pdb to attach to running Python processes, similar to how gdb allows
+users to attach to a program by process ID (``gdb -p <pid>``). With this
+feature, developers could inspect the state of a running application, evaluate
+expressions, and step through code dynamically. This approach would align
+Python's debugging capabilities with those of other major programming languages
+and debugging tools that support this mode.
+
+Specification
+=============
+
+
+This proposal introduces a safe debugging mechanism that allows external
+processes to trigger code execution in a Python interpreter at well-defined safe
+points. The key insight is that rather than injecting code directly via system
+debuggers, we can leverage the interpreter's existing evaluation loop and thread
+state to coordinate debugging operations.
+
+The mechanism works by having debuggers write to specific memory locations in
+the target process that the interpreter then checks during its normal execution
+cycle. When the interpreter detects that a debugger wants to attach, it executes the
+requested operations only when it's safe to do so - that is, when no internal
+locks are held and all data structures are in a consistent state.
+
+
+Runtime State Extensions
+------------------------
+
+A new structure is added to PyThreadState to support remote debugging:
+
+.. code-block:: C
+
+    typedef struct _remote_debugger_support {
+        int debugger_pending_call;
+        char debugger_script[MAX_SCRIPT_SIZE];
+    } _PyRemoteDebuggerSupport;
+
+
+This structure is appended to ``PyThreadState``, adding only a few fields that
+are **never accessed during normal execution**. The ``debugger_pending_call`` field
+indicates when a debugger has requested execution, while ``debugger_script``
+provides Python code to be executed when the interpreter reaches a safe point.
+
+
+Debug Offsets Table
+-------------------
+
+
+Python 3.12 introduced a debug offsets table placed at the start of the
+PyRuntime structure. This section contains the ``_Py_DebugOffsets`` structure that
+allows external tools to reliably find critical runtime structures regardless of
+`ASLR <https://en.wikipedia.org/wiki/Address_space_layout_randomization>`__ or
+how Python was compiled.
+
+This proposal extends the existing debug offsets table with new fields for
+debugger support:
+
+.. code-block:: C
+
+    struct _debugger_support {
+        uint64_t eval_breaker;            // Location of the eval breaker flag
+        uint64_t remote_debugger_support; // Offset to our support structure
+        uint64_t debugger_pending_call;   // Where to write the pending flag
+        uint64_t debugger_script;         // Where to write the script
+    } debugger_support;
+
+These offsets allow debuggers to locate critical debugging control structures in
+the target process's memory space. The ``eval_breaker`` and ``remote_debugger_support``
+offsets are relative to each ``PyThreadState``, while the ``debugger_pending_call``
+and ``debugger_script`` offsets are relative to each ``_PyRemoteDebuggerSupport``
+structure, allowing the new structure and its fields to be found regardless of
+where they are in memory.
+
+Attachment Protocol
+-------------------
+When a debugger wants to attach to a Python process, it follows these steps:
+
+1. Locate ``PyRuntime`` structure in the process:
+
+   - Find Python binary (executable or libpython) in process memory (OS dependent process)
+   - Extract ``.PyRuntime`` section offset from binary's format (ELF/Mach-O/PE)
+   - Calculate the actual ``PyRuntime`` address in the running process by relocating the offset to the binary's load address
+
+2. Access debug offset information by reading the ``_Py_DebugOffsets`` at the start of the ``PyRuntime`` structure.
+
+3. Use the offsets to locate the desired thread state
+
+4. Use the offsets to locate the debugger interface fields within that thread state
+
+5. Write control information:
+
+   - Write python code to be executed into the ``debugger_script`` field in ``_PyRemoteDebuggerSupport``
+   - Set ``debugger_pending_call`` flag in ``_PyRemoteDebuggerSupport``
+   - Set ``_PY_EVAL_PLEASE_STOP_BIT`` in the ``eval_breaker`` field
+
+Once the interpreter reaches the next safe point, it will execute the script
+provided by the debugger.
+
+Interpreter Integration
+-----------------------
+
+The interpreter's regular evaluation loop already includes a check of the
+``eval_breaker`` flag for handling signals, periodic tasks, and other interrupts. We
+leverage this existing mechanism by checking for debugger pending calls only
+when the ``eval_breaker`` is set, ensuring zero overhead during normal execution.
+This check has no overhead. Indeed, profiling with Linux ``perf`` shows this branch
+is highly predictable - the ``debugger_pending_call`` check is never taken during
+normal execution, allowing modern CPUs to effectively speculate past it.
+
+
+When a debugger has set both the ``eval_breaker`` flag and ``debugger_pending_call``,
+the interpreter will execute the provided debugging code at the next safe point
+and executes the provided code. This all happens in a completely safe context, since
+the interpreter is guaranteed to be in a consistent state whenever the eval breaker
+is checked.
+
+.. code-block:: c
+
+    // In ceval.c
+    if (tstate->eval_breaker) {
+        if (tstate->remote_debugger_support.debugger_pending_call) {
+            tstate->remote_debugger_support.debugger_pending_call = 0;
+            if (tstate->remote_debugger_support.debugger_script[0]) {
+               if (PyRun_SimpleString(tstate->remote_debugger_support.debugger_script)<0) {
+                   PyErr_Clear();
+               };
+               // ...
+            }
+        }
+    }
+
+
+Python API
+----------
+
+To support safe execution of Python code in a remote process without having to
+re-implement all these steps in every tool, this proposal extends the ``sys`` module
+with a new function. This function allows debuggers or external tools to execute
+arbitrary Python code within the context of a specified Python process:
+
+.. code-block:: python
+
+  def remote_exec(pid: int, code: str) -> None:
+      """
+      Executes a block of Python code in a given remote Python process.
+
+      Args:
+           pid (int): The process ID of the target Python process.
+           code (str): A string containing the Python code to be executed.
+      """
+
+An example usage of the API would look like:
+
+.. code-block:: python
+
+    import sys
+    # Execute a print statement in a remote Python process with PID 12345
+    try:
+        sys.remote_exec(12345, "print('Hello from remote execution!')")
+    except Exception as e:
+        print(f"Failed to execute code: {e}")
+
+
+Backwards Compatibility
+=======================
+
+This change has no impact on existing Python code or interpreter performance.
+The added fields are only accessed during debugger attachment, and the checking
+mechanism piggybacks on existing interpreter safe points.
+
+
+Security Implications
+=====================
+
+This interface does not introduce new security concerns as it relies entirely on
+existing operating system security mechanisms for process memory access. Although
+the PEP doesn't specify how memory should be written to the target process, in practice
+this will be done using standard system calls that are already being used by other
+debuggers and tools. Some examples are:
+
+* On Linux, the ``process_vm_readv()`` and ``process_vm_writev()`` system calls
+  are used to read and write memory from another process. These operations are
+  controlled by ptrace access mode checks - the same ones that govern debugger
+  attachment. A process can only read from or write to another process's memory
+  if it has the appropriate permissions (typically requiring either root or the
+  ``CAP_SYS_PTRACE`` capability, though less security minded distributions may
+  allow any process running as the same uid to attach).
+
+* On macOS, the interface would leverage ``mach_vm_read_overwrite()`` and
+  ``mach_vm_write()`` through the Mach task system. These operations require
+  ``task_for_pid()`` access, which is strictly controlled by the operating
+  system. By default, access is limited to processes running as root or those
+  with specific entitlements granted by Apple's security framework.
+
+* On Windows, the ``ReadProcessMemory()`` and ``WriteProcessMemory()`` functions
+  provide similar functionality. Access is controlled through the Windows
+  security model - a process needs ``PROCESS_VM_READ`` and ``PROCESS_VM_WRITE``
+  permissions, which typically require the same user context or appropriate
+  privileges. These are the same permissions required by debuggers, ensuring
+  consistent security semantics across platforms.
+
+All mechanisms ensure that:
+
+1. Only authorized processes can read/write memory
+2. The same security model that governs traditional debugger attachment applies
+3. No additional attack surface is exposed beyond what the OS already provides for debugging
+
+The memory operations themselves are well-established and have been used safely
+for decades in tools like GDB, LLDB, and various system profilers.
+
+It’s important to note that any attempt to attach to a Python process via this
+mechanism would be detectable by system-level monitoring tools. This
+transparency provides an additional layer of accountability, allowing
+administrators to audit debugging operations in sensitive environments.
+
+Further, the strict reliance on OS-level security controls ensures that existing
+system policies remain effective. For enterprise environments, this means
+administrators can continue to enforce debugging restrictions using standard
+tools and policies without requiring additional configuration. For instance,
+leveraging Linux’s ``ptrace_scope`` or macOS’s ``taskgated`` to restrict
+debugger access will equally govern the proposed interface.
+
+By maintaining compatibility with existing security frameworks, this design
+ensures that adopting the new interface requires no changes to established
+security practices, thereby minimizing barriers to adoption.
+
+How to Teach This
+=================
+
+For tool authors, this interface becomes the standard way to implement debugger
+attachment, replacing unsafe system debugger approaches. A section in the Python
+Developer Guide could describe the internal workings of the mechanism, including
+the ``debugger_support`` offsets and how to interact with them using system
+APIs.
+
+End users need not be aware of the interface, benefiting only from improved
+debugging tool stability and reliability.
+
+Reference Implementation
+========================
+
+https://github.com/pablogsal/cpython/commits/remote_pdb/
+
+
+Copyright
+=========
+
+This document is placed in the public domain or under the CC0-1.0-Universal
+license, whichever is more permissive.