Semgrep integration and Move Tree-sitter grammar
Most files within this repo are auto-generated by tree-sitter
. The only files you need to care about:
grammar.js
: the main grammar rules for move programming language;src/scanner.c
: the external scanner used ingrammar.js
. Currently, it is used to scan block (document) comments and line document comments. It’s unlikely you will need to update it or add new scanners;batch-test.py
: a Python script for testing the grammar. It will recursively scan the given paths and test files ending with.move
against the grammar. Usage:python3 batch-test.py <PATH> [ ... <PATH> ]
. You should runtree-sitter generate
each time you modify the grammar before testing..github/workflows/test-on-repo.yaml
: GitHub Workflow configurations.
Before contributing to the grammar rules, install and configure tree-sitter
. A good way to install it is going through tree-sitter's Getting Started section.
It is recommended to use a node version manager for Node.js runtimes.
By the time you have finished, you should have these installed and configured:
- Node.js (optimally installed by a version manager);
- A working C compiler (for macOS user, this is shipped by Xcode Command Line Tools);
tree-sitter
installed either throughcargo
ornpm
. Be sure thattree-sitter
can be found within$PATH
.- (Optional) Rust compiler and Cargo.
Additionally, you may also want to install Python
for batch testing the rules.
You need to execute tree-sitter init-config
under the repo to initialize tree-sitter
for the first time.
Mostly likely, grammar.js
is the only file requiring modifications. It is rare to update src/scanner.c
.
Tp learn how to write tree-sitter grammar DSL, see:
- https://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl
- https://tree-sitter.github.io/tree-sitter/creating-parsers#writing-the-grammar
In addition, a few sources you may need:
-
https://github.com/tree-sitter/tree-sitter-rust: Rust’s tree-sitter grammars.
-
https://github.com/tree-sitter/tree-sitter-javascript: JavaScript’s tree-sitter grammars.
-
third_party/move/move-compiler/src/parser/syntax.rs: Move’s top-down parser, the de-facto grammar reference.
Be aware, the documents within
syntax.rs
(especially the doc comments before a parsing method) could be incomplete or wrong. You should always read the codes for reference.Also, when contributing to this repo, be sure to pull
aptos-core
periodically in case of new language features.
After you finish coding, run npm run format
to format your code.
Finally, run tree-sitter generate
to check
- whether
grammar.js
contains any syntax errors; - whether the rules contain any conflicts. Tree-sitter’s documents serve as a great literature for resolving conflicts.
To test the grammar on an individual file, run:
$ tree-sitter parse ${MOVE_FILE}
Some useful flags for debugging:
-d
: show parsing debug log;-D
: produce the log.html file with debugging parsing graphs.
You would get a parsing tree in the standard output after execution. An error message may be present at the last line, and you can jump to the place based on the line and column number. Line numbers and column numbers start from 0.
Remember to test the rule on a larger scale using batch-test.py
.
You should run
npm run format
tree-sitter generate
before committing. Remember to include all updated generated code into your git commit.