Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
cdump committed Nov 25, 2023
0 parents commit 3b8e1c7
Show file tree
Hide file tree
Showing 52 changed files with 4,643 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
benchmark/datasets/*.tar.gz filter=lfs diff=lfs merge=lfs -text
75 changes: 75 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
name: release

permissions:
contents: write

on:
push:
tags:
- '*.*.*'

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # need tags to generate release notes

- name: Install Python
uses: actions/setup-python@v4
with:
python-version: '3.11'

- name: Install NodeJS
uses: actions/setup-node@v4
with:
node-version: 20

- name: Install python poetry
run: |
curl -sSL https://install.python-poetry.org | python -
echo "$HOME/.poetry/bin" >> $GITHUB_PATH
- name: Python - check and build
id: buildpy
run: |
poetry install
poetry run ruff evmole
poetry run black --check evmole
poetry build
echo "wheel_name=evmole-${GITHUB_REF#refs/tags/}-py3-none-any.whl" >> $GITHUB_OUTPUT
- name: NodeJS - check and build
id: buildjs
run: |
cd js/
npm ci
npm run build
npm pack
echo "tarball_name=evmole-${GITHUB_REF#refs/tags/}.tgz" >> $GITHUB_OUTPUT
- name: Generate Release Notes
run: |
echo '## Changes since previous release:' > changelog.md
git log --oneline $(git describe --tags --abbrev=0 HEAD^)..HEAD --pretty=format:"- [%h](https://github.com/cdump/evmole/commit/%H) %s" >> changelog.md
- name: Release
uses: softprops/action-gh-release@v1
with:
name: Release ${{ github.ref_name }}
draft: false
prerelease: false
body_path: changelog.md
files: |
dist/${{ steps.buildpy.outputs.wheel_name }}
js/${{ steps.buildjs.outputs.tarball_name }}
- name: Publish to NPM and PyPI
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: |
poetry publish
npm publish
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
benchmark/datasets/*
!benchmark/datasets/*.tar.gz

benchmark/results/*
!benchmark/results/.gitkeep

__pycache__/
node_modules/

dist/
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Maxim Andreev

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
124 changes: 124 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# EVMole

[![PyPI](https://img.shields.io/pypi/v/evmole)](https://pypi.org/project/evmole)
[![npm](https://img.shields.io/npm/v/evmole)](https://www.npmjs.com/package/evmole)
[![license](https://img.shields.io/github/license/cdump/evmole)](./LICENSE)

Extracts [function selectors](https://docs.soliditylang.org/en/latest/abi-spec.html#function-selector) from EVM bytecode, even for unverified contracts.

- Python & JavaScript implementations
- Clean code with zero dependencies
- [Faster and more accurate](#Benchmark) than other tools
- Tested on Solidity and Vyper compiled contracts

## Usage

### JavaScript
```sh
$ npm i evmole
```
```javascript
import {functionSelectors} from 'evmole'

const code = '0x6080604052600436106025575f3560e01c8063b69ef8a8146029578063d0e30db014604d575b5f80fd5b3480156033575f80fd5b50603b5f5481565b60405190815260200160405180910390f35b60536055565b005b345f8082825460639190606a565b9091555050565b80820180821115608857634e487b7160e01b5f52601160045260245ffd5b9291505056fea2646970667358221220354240f63068d555e9b817619001b0dff6ea630d137edc1a640dae8e3ebb959864736f6c63430008170033'
console.log( functionSelectors(code) )
// Output(list): [ 'b69ef8a8', 'd0e30db0' ]
```

### Python
```sh
$ pip install evmole
```
```python
from evmole import function_selectors

code = '0x6080604052600436106025575f3560e01c8063b69ef8a8146029578063d0e30db014604d575b5f80fd5b3480156033575f80fd5b50603b5f5481565b60405190815260200160405180910390f35b60536055565b005b345f8082825460639190606a565b9091555050565b80820180821115608857634e487b7160e01b5f52601160045260245ffd5b9291505056fea2646970667358221220354240f63068d555e9b817619001b0dff6ea630d137edc1a640dae8e3ebb959864736f6c63430008170033'
print( function_selectors(code) )
# Output(list): ['b69ef8a8', 'd0e30db0']
```

See [examples](./examples) for more

## Benchmark

<i>FP/FN</i> - [False Positive/False Negative](https://en.wikipedia.org/wiki/False_positives_and_false_negatives) errors; smaller is better

<table>
<tr>
<td>Dataset</td>
<td></td>
<td><a href="benchmark/providers/simple/"><b><i>simple</i></b></a></td>
<td><a href="benchmark/providers/whatsabi/"><b><i>whatsabi</i></b></a></td>
<td><a href="benchmark/providers/evmole-js/"><b><i>evmole-js</i></b></a> (<a href="benchmark/providers/evmole-py/"><b><i>py</i></b></a>)</td>
</tr>
<tr>
<td rowspan="3"><i><b>largest1k</b><br>1000 contracts<br>24427 functions</i></td>
<td><i>FP/FN contracts:</i></td>
<td>95 / 9</td>
<td>38 / 8</td>
<td>1 / 0 :1st_place_medal:</td>
</tr>
<tr>
<td><i>FP/FN functions:</i></td>
<td>749 / 12</td>
<td>38 / 8 :1st_place_medal: :2nd_place_medal:</td>
<td>192 / 0 :2nd_place_medal: :1st_place_medal:</td>
</tr>
<tr>
<td><i>Time:</i></td>
<td>2.06s</td>
<td>3.8s</td>
<td>1.99s (2.09s) :rocket:</td>
</tr>
<tr><td colspan="6"></td></tr>
<tr>
<td rowspan="3"><i><b>random50k</b><br>50000 contracts<br>1171102 functions</i></td>
<td><i>FP/FN contracts:</i></td>
<td>4136 / 77</td>
<td>251 / 31</td>
<td>1 / 9 :1st_place_medal:</td>
</tr>
<tr>
<td><i>FP/FN functions:</i></td>
<td>14652 / 96</td>
<td>261 / 32</td>
<td>3 / 10 :1st_place_medal:</td>
</tr>
<tr>
<td><i>Time:</i></td>
<td>32.3s</td>
<td>71.13s</td>
<td>25.63s (33.56s) :rocket:</td>
</tr>
<tr><td colspan="6"></td></tr>
<tr>
<td rowspan="3"><i><b>vyper</b><br>780 contracts<br>21244 functions</i></td>
<td><i>FP/FN contracts:</i></td>
<td>185 / 480</td>
<td>178 / 780</td>
<td>0 / 0 :1st_place_medal:</td>
</tr>
<tr>
<td><i>FP/FN functions:</i></td>
<td>197 / 12971</td>
<td>181 / 21244</td>
<td>0 / 0 :1st_place_medal:</td>
</tr>
<tr>
<td><i>Time:</i></td>
<td>1.71s</td>
<td>2.52s</td>
<td>1.58s (1.8s) :rocket:</td>
</tr>
</table>

See [benchmark/README.md](./benchmark/) for the methodology and commands to reproduce these results

## How it works

Short: Executes code with a custom EVM and traces CALLDATA usage.

Long: TODO

## License
MIT
38 changes: 38 additions & 0 deletions benchmark/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
PROVIDERS ?= etherscan simple whatsabi evmole-py evmole-js
DATASETS ?= largest1k random50k vyper
DOCKER ?= docker
DOCKER_CPUS ?= 1
DOCKER_PREFIX ?= evmole-bench

DATASET=$(shell pwd)/datasets
RES=$(shell pwd)/results

BUILD_TARGETS=$(addsuffix .build, $(PROVIDERS))
RUN_TARGETS=$(foreach p,$(PROVIDERS),$(addprefix $(p)/, $(DATASETS)))
UNPACK_TARGETS=$(foreach d,$(DATASETS),$(addprefix datasets/, $(d)))

benchmark: build run

build: $(BUILD_TARGETS)

run: $(RUN_TARGETS)

$(BUILD_TARGETS):
$(info [*] Building $(basename $@)...)
# special hack for evmole:
[ "$@" = "evmole-py.build" ] && cp -r ../evmole providers/evmole-py/ || true
[ "$@" = "evmole-js.build" ] && cp -r ../js providers/evmole-js/ || true
$(DOCKER) build -t $(DOCKER_PREFIX)-$(basename $@) providers/$(basename $@)
[ "$@" = "evmole-py.build" ] && rm -rf providers/evmole-py/evmole || true
[ "$@" = "evmole-js.build" ] && rm -rf providers/evmole-js/js || true

$(UNPACK_TARGETS):
$(info [*] Unpacking $@...)
tar -C datasets/ -zxf $@.tar.gz

.SECONDEXPANSION:
$(RUN_TARGETS): datasets/$$(notdir $$@)
$(info [*] Running $@...)
/bin/time -f '%e' $(DOCKER) run --cpus=$(DOCKER_CPUS) --rm -v $(DATASET)/$(notdir $@):/dataset -v $(RES):/mnt -it $(DOCKER_PREFIX)-$(subst /,,$(dir $@)) /dataset /mnt/$(subst /,_,$@).json 2> $(RES)/$(subst /,_,$@).time

.PHONY: benchmark build run $(BUILD_TARGETS) $(RUN_TARGETS)
80 changes: 80 additions & 0 deletions benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Benchmarks

Test accuracy and speed of different function-signature extractors

For results, refer to the [main README.md](../README.md#Benchmark).

## Methodology
1. Get N Etherscan-verified contracts, save the bytecode and ABI to `datasets/NAME/ADDR.json`.
2. Extract function signatures from the bytecode. Each tool runs inside a Docker container and is limited to 1 CPU (see `providers/NAME` and `Makefile`).
3. Assume selectors from Etherscan's ABI as ground truth.
4. Compare the results with it and count [False Positives and False Negatives](https://en.wikipedia.org/wiki/False_positives_and_false_negatives).

## Reproduce
Set the performance mode using `sudo cpupower frequency-set -g performance` and run `make` ([GNU Make](https://www.gnu.org/software/make/)) inside the `benchmark/` directory.

To use [Podman](https://podman.io/) instead of Docker: `DOCKER=podman make`


You can run only specific step; for example:
```sh
# Only build docker-images
$ make build

# Only run tests
$ make run

# Build `etherscan` docker image
$ make etherscan.build

# Run `etherscan` on dataset `largest1k`
$ make etherscan/largest1k
```

To process results run `compare.py`:
```sh
$ python3 compare.py

# compare in web-browser
$ ../.venv/bin/python3 compare.py --web-listen 127.0.0.1:8080
```


## How datasets/ was constructed

1. Clone [tintinweb/smart-contract-sanctuary](https://github.com/tintinweb/smart-contract-sanctuary)

2. Find all solidity contracts:
```sh
$ cd smart-contract-sanctuary/ethereum/contracts/mainnet/

# (contract_size_in_bytes) (contract_file_path)
$ find ./ -name "*.sol" -printf "%s %p\n" > all.txt
```

3. Get ~1200 largest (by size) contracts:
```sh
$ cat all.txt | sort -rn | head -n 1200 | cut -d'/' -f3 | cut -d'_' -f1 > top.txt
```

4. Get ~55.000 random contracts
```sh
$ cat all.txt | cut -d'/' -f3 | cut -d'_' -f1 | sort -u | shuf | head -n 55000 > random.txt
```

5. Get all vyper contracts:
```sh
$ find ./ -type f -name '*.vy' | cut -d'/' -f3 | cut -d'_' -f1 > vyper.txt
```

6. Download contracts code & abi:
```sh
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=top.txt --out-dir=datasets/largest_1k --limit=1000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=random.txt --out-dir=datasets/random_10k --limit=10000 --code-regexp='^0x(?!73).'
$ poetry run python3 datasets/download.py --etherscan-api-key=CHANGE_ME --addrs-list=vyper.txt --out-dir=datasets/vyper --code-regexp='^0x(?!73).'
```

We use `--code-regexp='^0x(?!73).'` to:
1. Skip contract with empty code (`{"code": "0x",`) - these are self-destructed contracts.
2. Skip contract with code starting from `0x73` (`PUSH20` opcode).
Compiled Solidity libraries [begins with this code](https://docs.soliditylang.org/en/v0.8.23/contracts.html#call-protection-for-libraries), and because [Non-storage structs are referred to by their fully qualified name](https://docs.soliditylang.org/en/v0.8.23/contracts.html#function-signatures-and-selectors-in-libraries) it's not yet supported by our reference Etherscan extractor (`providers/etherscan`). This issue may be fixed later.
Loading

0 comments on commit 3b8e1c7

Please sign in to comment.