Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gazelle seems to not like PEP 695 – Type Parameter Syntax #2396

Closed
dougthor42 opened this issue Nov 12, 2024 · 9 comments · Fixed by #2496
Closed

Gazelle seems to not like PEP 695 – Type Parameter Syntax #2396

dougthor42 opened this issue Nov 12, 2024 · 9 comments · Fixed by #2496

Comments

@dougthor42
Copy link
Collaborator

TL;DR: I think we need to bump go-tree-sitter for python 3.12 support.

🐞 bug report

Affected Rule

Gazelle

Is this a regression?

I could argue it either way, lol. I'm going to go with "probably" because it's related to #1895.

Description

PEP 695 added a Type Parameter syntax that looks like:

# Before
from typing import TypeVar

_T = TypeVar("_T")

def func(a: _T, b: _T) -> _T:
    ...

# After
def func[T](a: T, b: T) -> T:
    ...

If a python file uses this new def func[T](...) syntax, and the file has if __name__ == "__main__", then Gazelle will incorrectly generate a py_library target rather than py_binary.

🔬 Minimal Reproduction

  1. Make a python file:
# foo.py
def func[T](a: T, b: T) -> T:
    ...

if __name__ == "__main__":
    print("hi")
  1. Run gazelle bazel run //:gazelle

Expected Result

py_binary(
    name = "foo",
    srcs = ["foo.py"],
)

Actual Result

py_library(
    name = "foo",
    srcs = ["foo.py"],
)

🔥 Exception or Error

None

🌍 Your Environment

Operating System:

gLinux

Output of bazel version:

$ bazel version
Bazelisk version: v1.20.0
Build label: 7.3.2
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Tue Oct 1 17:46:05 2024 (1727804765)
Build timestamp: 1727804765
Build timestamp as int: 1727804765

Rules_python version:

$ bazel mod graph                                                                                                       
<root> ([email protected])                    
├───[email protected]            
│   └─── ...
├───[email protected]                 
│   └─── ...
├───[email protected]             
│   └─── ...     
├───[email protected]                 
│   └─── ...
├───[email protected] 
│   └─── ...
├───[email protected] 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   ├───[email protected] (*) 
│   ├───[email protected] 
│   │   └───[email protected] (*) 
│   └───[email protected] 
│       └───[email protected] (*) 
└───[email protected] 
    ├───[email protected] (*) 
    ├───[email protected] (*) 
    ├───[email protected] (*) 
    └───[email protected] (*)

Anything else relevant?

I haven't been able to find a human-readable changelog of tree-sitter or go-tree-sitter that explicitly says when it added support for python 3.12...

@dougthor42 dougthor42 changed the title Gazelle seems to lot like PEP 695 – Type Parameter Syntax Gazelle seems to not like PEP 695 – Type Parameter Syntax Nov 12, 2024
@dougthor42
Copy link
Collaborator Author

ah, dang:

// at github.com/smacker/go-tree-sitter@latest (after v0.0.0-20240422154435-0628b34cbf9c we used)
// "__main__" is the second child of b. But now, it isn't.
// we cannot use the latest go-tree-sitter because of the top level reference in scanner.c.
// https://github.com/smacker/go-tree-sitter/blob/04d6b33fe138a98075210f5b770482ded024dc0f/python/scanner.c#L1

I just tried this locally and got the same issue. @hunshcn do you have any additional insight for this?

@dougthor42
Copy link
Collaborator Author

Opened smacker/go-tree-sitter#175.

@maffoo
Copy link
Contributor

maffoo commented Nov 12, 2024

I think type parameter support was added to the tree-sitter grammer in tree-sitter/tree-sitter-python@71977ea, and should be included in grammer versions 0.20.4 and later. It looks like this was first included in go-tree-sitter in smacker/go-tree-sitter@8516e80.

@dougthor42
Copy link
Collaborator Author

This is looking more and more like an issue with tree-sitter, but I may have to update Gazelle to fallback to the old python-based parser because of limitations elsewhere.


In our code base, I narrowed things down to a single line diff:

# get_deps.py
# using this line is fine - tree-sitter correctly parses the module
def search_one_more_level(graph: dict[T, set[T]], seen: set[T], routes: list[list[T]], target: T) -> list[T] | None:

# using this line cause tree-sitter to fail parsing - see below
def search_one_more_level[T](graph: dict[T, set[T]], seen: set[T], routes: list[list[T]], target: T) -> list[T] | None:

The only difference between the two lines is the addition of [T] after the function name.

I added some debugging prints to gazelle/python/file_parser.go:

--- a/gazelle/python/file_parser.go
+++ b/gazelle/python/file_parser.go
@@ -17,6 +17,7 @@ package python
 import (
        "context"
        "fmt"
+       "log"
        "os"
        "path/filepath"
        "strings"
@@ -184,6 +185,10 @@ func (p *FileParser) Parse(ctx context.Context) (*ParserOutput, error) {
        if err != nil {
                return nil, err
        }
+       if strings.Contains(p.relFilepath, "get_deps.py") {
+               log.Printf("after parsing file, got rootNode=%q", rootNode)
+       }
 
        p.output.HasMain = p.parseMain(ctx, rootNode)

And ran gazelle.

# the "good" code logs:
gazelle: after parsing file, got rootNode="(module (comment) (comment) (comment) (expression_statement (string)) (import_statement name: (dotted_name (identifier))) ...

# the "bad" code logs:
gazelle: after parsing file, got rootNode="(ERROR (comment) (comment) (comment) (expression_statement (string)) (import_statement name: (dotted_name (identifier))) ...

So something is up with tree-sitter where it errors out but doesn't return an err. The net result of this is that the subsequent call to FileParser.parseMain fails and the file is incorrectly labeled as a library (instead of a binary).

@dougthor42
Copy link
Collaborator Author

I think type parameter support was added to the tree-sitter grammer in tree-sitter/tree-sitter-python@71977ea, and should be included in grammer versions 0.20.4 and later. It looks like this was first included in go-tree-sitter in smacker/go-tree-sitter@8516e80.

@maffoo yes I agree; the 3.12 grammar was added to tree-sitter a while ago. rules_python is currently pinning to the version just before go-tree-sitter updated the python grammar. rules_python pins 0628b34 and the python grammar was updated in the next commit 8516e80 just like you said.

image


I tried forking go-tree-sitter and hacking things to use the old scanner.cc, which doesn't reference ../array.h, but that naive approach sadly did not work.

@maffoo
Copy link
Contributor

maffoo commented Nov 15, 2024

As I understand it we are using the standard rules_go rules to import the go-tree-sitter repository:

go_repository(
name = "com_github_smacker_go_tree_sitter",
importpath = "github.com/smacker/go-tree-sitter",
sum = "h1:7QZKUmQfnxncZIJGyvX8M8YeMfn8kM10j3J/2KwVTN4=",
version = "v0.0.0-20240422154435-0628b34cbf9c",
)

And then we use two targets which are created automatically by this import:

"@com_github_smacker_go_tree_sitter//:go-tree-sitter",
"@com_github_smacker_go_tree_sitter//python",

Is it possible to define a more custom way to build that instead of relying on targets generated automatically by rules_go, since clearly rules_go is not able to build this package by default? I guess rules_go does more than just invoking go build under the hood, but maybe there's a way to customize it a bit, e.g. by telling it that the @com_github_smacker_go_tree_sitter//python target should include some files from the parent directory.

@maffoo
Copy link
Contributor

maffoo commented Nov 15, 2024

Actually, looks like the go_repository rules comes from gazelle itself, not from rules_go: https://github.com/bazel-contrib/bazel-gazelle/blob/master/repository.md#go_repository. There are a lot of options there, so maybe there's a way to customize it to work with the newer go-tree-sitter.

@maffoo
Copy link
Contributor

maffoo commented Nov 15, 2024

I checked out the go-tree-sitter repo and ran gazelle to generate build files as described in the rules_go docs. To get //python to build, I added a filegroup in the root BUILD.bazel:

filegroup(
    name = "common",
    srcs = ["alloc.c", "alloc.h", "api.h", "array.h"],
    visibility = [":__subpackages__"],
)

and then added a reference to this in the go_library in python/BUILD.bazel:

go_library(
    name = "python",
    srcs = [
        "//:common",
        "binding.go",
        ...

This was enough to get //python to build. Given that, it seems like one way to get things working with the new go-tree-sitter would be to add these changes as patches when we bring in go-tree-sitter as a go_repository.

@dougthor42
Copy link
Collaborator Author

Thanks Matthew! My next attempt was going to be to just move/copy those C flies into the python dir and tweak headers accordingly (also via a patch).

github-merge-queue bot pushed a commit that referenced this issue Nov 18, 2024
While investigating #2396 and why #2413 doesn't appear to be working for
us, I realized that one of the things I was making heavy use of was
additional parser logging that I had added. This adds some of that
logging. I also throw in some documentation because I found it helpful.

Users can (attempt to) get additional parse failure information by
setting the `GAZELLE_VERBOSE` environment variable to `1`. Eg:

```console
$ GAZELLE_VERBOSE=1 bazel run //:gazelle
```

Here are some example logs:

```console
$ GAZELLE_VERBOSE=1 bazel run //:gazelle
INFO: Invocation ID: a4e026d8-17df-426c-b1cc-d3980690dd53
...
INFO: Running command line: bazel-bin/gazelle
INFO: Streaming build results to: https://btx.cloud.google.com/invocations/a4e026d8-17df-426c-b1cc-d3980690dd53
gazelle: WARNING: failed to parse "hello/get_deps.py". The resulting BUILD target may be incorrect.
gazelle: Parse error at {Row:1 Column:0}:
def search_one_more_level[T]():
gazelle: The above was parsed as: (ERROR (identifier) (call function: (list (identifier)) arguments: (argument_list)))
gazelle: ERROR: failed to generate target "//hello:get_deps" of kind "py_library": a target of kind "pyle_py_binary" with the same name already exists. Use the '# gazelle:python_library_naming_convention' directive to change the naming convention.
$
$ bazel run //:gazelle
INFO: Invocation ID: 726c9fd6-f566-4c30-95ef-c4781ad155de
...
INFO: Running command line: bazel-bin/gazelle
INFO: Streaming build results to: https://btx.cloud.google.com/invocations/726c9fd6-f566-4c30-95ef-c4781ad155de
gazelle: WARNING: failed to parse "hello/get_deps.py". The resulting BUILD target may be incorrect.
gazelle: ERROR: failed to generate target "//hello:get_deps" of kind "py_library": a target of kind "pyle_py_binary" with the same name already exists. Use the '# gazelle:python_library_naming_convention' directive to change the naming convention.
```

---------

Co-authored-by: Richard Levasseur <[email protected]>
Co-authored-by: Richard Levasseur <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Dec 14, 2024
… Parameter Syntax) by using dougthor42's fork of go-tree-sitter (#2496)

Replaces #2413.

Fixes #2396.

This updates the `go-tree-sitter` dependency to use my fork that
includes `BUILD.bazel` files. Specifically, the `BUILD.bazel` files in
the fork include references to top-level code like `array.h` which the
original Gazelle-generated files for `go-tree-sitter` were not able to
handle. I also include the test cases that @maffoo created in #2413 and
verified that they (a) fail before the fix and (b) pass after the fix.

The fork is: https://github.com/dougthor42/go-tree-sitter

The branch that includes all changes is:
https://github.com/dougthor42/go-tree-sitter/tree/for-rules-python-gazelle-plugin

A couple notes:
+ I have a PR open to get `go-tree-sitter` into BCR
[here](bazelbuild/bazel-central-registry#3366).
However:
1. I'm having trouble getting tests to pass and to get things running
locally to validate it
2. Using BCR would not fix things for people who still use WORKSPACE
(right?)
+ The fork is _mostly_ [autogenerated BUILD.bazel files from
gazelle](smacker/go-tree-sitter@cfa9bdf)
but also contains:
+ [manual updates so that build files reference the toplevel `array.h`
and other
files](smacker/go-tree-sitter@63f89cd)
+ [replace all `smacker` with `dougthor42` so that `go build`
works](smacker/go-tree-sitter@8a73cbd)
    + various other more minor things.
+ I was unable to get `go mod edit -replace` to work, so I've just
manually updated `go.mod` and whatnot everywhere. If someone with more
go knowledge has a suggestion I'm happy to hear it.

---------

Co-authored-by: Matthew Neeley <[email protected]>
Co-authored-by: Ignas Anikevicius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants