This project explores how far we can get with simple enumeration of all possible instantions of method signature to determine its type stability.
Main script: $JULIA_STS/scripts/loop-over-packages.sh
. The list of packages
for processing is taken from a text file (every line holds a package name) that
can be supplied as the script argument. By default (no argument), the script
reads scripts/pkgs.txt
with the names of several popular Julia packages.
No assumptions other than Julia and internet connectivity (to download packages) are made.
Preferably in a fresh directory, run is as follows:
❯ $JULIA_STS/scripts/loop-over-packages.sh
...
❯ $JULIA_STS/scripts/aggregate.sh
...
This should create one subdirectory per package on the list each with the
following contents: severalcsv
and one txt
. The aggregate script collects
data from the txt
's in an aggregate.csv
file.
Main script: $JULIA_STS/scripts/process-package.jl
. It also works as one
iteration of the loop in the batch-package approach above.
No assumptions other than Julia and internet connectivity (to download packages) available.
Preferably in a fresh directory, run this:
❯ julia "$JULIA_STS/scripts/process-package.jl" Multisets
...
❯ ls *.csv *.txt
Multisets-agg.txt Multisets.csv
❯ $JULIA_STS/scripts/aggregate.sh
...
❯ ls *.csv
aggregate.csv Multisets.csv
Process a module (check stability of all of its methods) in a Julia session:
checkModule(MyModule)
- Assumes:
MyModule
is loaded. - Effects: in the current directory creates:
MyModule.csv
with raw results of analysis;MyModule-agg.txt
with aggregate stats (possibly, for further analysis).
A (possibly empty) set of agg
-files can be turned into a single CSV via calling
scripts/aggregate.sh
. For just one file, it only adds the heading. The result
is written in aggregate.csv
.
See examples in the tests
directory.
Normally, we don't check methods with Any
in the parameters types
because there is no sensible way to enumerate subtypes of Any
without
any additional knowledge. One way we can get around it is to collect
a set of types and test only enumerate that set every time we see
an Any
-parameter. That's what the “types database” idea about.
The implementation with further comments in the source code lives in src/typesDB.jl (you may be interested in the format of the database). A simple example of how to use it is shown in the test suite:
juila> f(x)=1
juila> typesdb_cfg = build_typesdb_scfg("merged-small.csv")
juila> is_stable_method((@which f(1)), typesdb_cfg)
Stb(2)
The testing database in "merged-small.csv"
is really small.
For a realistic example of a database with 10K types collected from
tracing test suites of popular Julia packages, refer to the types-database
directory in the julia-sts-data
repository.
There is a suite of scripts to examine the git history of packages
and report how the stability of methods changed over time.
These are located in $JULIA_STS/scripts/timeline
:
batch.jl
: The main script which reads the list of packages from a file and calls the other scripts for each.history.jl
: Takes care of git management (clones the package, checks out individual commits). Also the only part that's using multiple threads to speed up the processing.process-package.jl
: Runs the stability checks for a given package for a given revision (a slightly modified version ofscripts/process-package.jl
).aggregate.jl
: Creates a CSV summary of all the processed revisions of a package.filter.jl
: Tries to filter the interesting stability cases with a heuristic.plot.jl
: Creates a pdf plot that shows for each commit the number of unstable methods, as well as the rate of stable and unstable methods to all methods.
These scripts assume packages DataStructures
(for OrderedDict
), CSV
, DataFrames
, Plots
(for filtering and plotting). Please make sure they are installed:
using Pkg; Pkg.add(["DataStructures", "CSV", "DataFrames", "Plots"])
The pipeline produces several outputs (in addition to the regular output of the stability checker):
<PKG_NAME>/<IDX>-<COMMIT_HASH>/timeline_info.csv
: stores some metadata that is used during aggregation<PKG_NAME>.csv
: contains information about individual commits (from<PKG>-agg.txt
), filtering duplicate lines (i.e., only commits that cause changes)<PKG_NAME>-filtered.csv
: further filters the results to only commits that don't change the number of methods but introduce/remove unstable ones<PKG_NAME>.pdf
: shows three plots (if we have enough data) - the absolute number of unstable methods, the rate of stable methods to all methods, and the rate of unstable methods to all methods (thex
axis for all of these is the number of a commit as it appears in<PKG_NAME>.csv
- i.e., two consecutive points are not necessarily consecutive commits as there can be commits in between that don't change stability)https___github_com_<PKG_REPO>_git_pkgs.txt
: records the repository url, as well as a list of all packages in that repository and their respective subdirectoriestimeline_error_log.txt
: records stdout and strerr for failed runs, common for all packages (the normal log contains line numbers within this file for easier navigation)finished.txt
: lists completed packages, these are skipped when the pipeline is restarted
For example, to check the Multisets
package using 8 threads, create a text file with the package name (in general, this file should contain one package name per line):
echo "Multisets" > /tmp/packages
Then run the batch.jl
script:
julia --threads=8 $JULIA_STS/scripts/timeline/batch.jl \
/tmp/packages \
/tmp/scratch \
/tmp/timeline \
&> /tmp/timeline_log
To watch the progress, you can tail -f --retry /tmp/timeline_log
.
This will write the intermediate files to /tmp/scratch
(a subdirectory for the Multisets
package that contains a subdirectory for each processed commit and an error log
timeline_error_log.txt
).
The output files (Multisets.csv
, Multisets-filtered.csv
, Multisets.pdf
) are written to
/tmp/timeline
.
The log should contain something like the following:
$ cat /tmp/timeline_log
[ Info: [2023-05-05T15:37:31.995] == Using 8 threads ==
[ Info: [2023-05-05T15:37:32.651] === Checking `https://github.com/scheinerman/Multisets.jl.git' ===
[ Info: [2023-05-05T15:37:33.762] Writing to `/tmp/scratch/https___github_com_scheinerman_Multisets_jl_git_pkgs.txt'
[ Info: [2023-05-05T15:37:34.518] Checking 65 commits
[ Info: [2023-05-05T15:37:35.261] Progress: 0.0% (0 tasks done, 0 skipped, elapsed 0.72 s, est. remaining Inf d)
[ Info: [2023-05-05T15:37:35.270] Thread #5, commit #6 (c878909), pkg Multisets: skipping (can't parse Project.toml)
...
[ Info: [2023-05-05T15:37:35.311] Thread #5, commit #6 (c878909): done in 0.27 s
[ Info: [2023-05-05T15:37:35.312] Thread #6, commit #51 (a1c89dc), pkg Multisets: processing [email protected]
...
[ Info: [2023-05-05T15:37:35.400] Writing to `/tmp/scratch/Multisets/000051-a1c89dc/timeline_info.csv'
...
[ Info: [2023-05-05T15:37:52.898] Thread #6, commit #51 (a1c89dc), pkg Multisets: done with [email protected] after 17.59 s
[ Info: [2023-05-05T15:37:52.900] Thread #6, commit #51 (a1c89dc): done in 17.86 s
...
[ Info: [2023-05-05T15:38:14.835] Progress: 83.08% (18 tasks done, 36 skipped, elapsed 40.31 s, est. remaining 8.21 s)
...
[ Info: [2023-05-05T15:38:36.918] Work distribution: [1, 1, 3, 4, 5, 4, 2, 2]
[ Info: [2023-05-05T15:38:36.983] === Aggregating results for `Multisets' ===
[ Info: [2023-05-05T15:38:37.442] Writing to `/tmp/timeline/Multisets.csv'
[ Info: [2023-05-05T15:38:53.255] Writing to `/tmp/timeline/Multisets-filtered.csv'
[ Info: [2023-05-05T15:39:21.646] Writing to `/tmp/timeline/Multisets.pdf'
[ Info: [2023-05-05T15:39:27.181] === Done with https://github.com/scheinerman/Multisets.jl.git in 1.91 m
And the results can be found in:
$ ls /tmp/scratch/
https___github_com_scheinerman_Multisets_jl_git_pkgs.txt Multisets
$ ls /tmp/scratch/Multisets/
000044-19f1220 000048-5291b2e 000052-c556e27 000056-dc50172 000060-b9e0cb9 000064-692a160
000045-09eb09c 000049-fab665c 000053-90ae494 000057-92b8a76 000061-0dd0d96 000065-fe08c9c
000046-89277a2 000050-d7bc545 000054-a26cbf1 000058-ea34a45 000062-30c9321
000047-257bbb9 000051-a1c89dc 000055-3db912f 000059-1970e9a 000063-e66d7df
$ ls /tmp/timeline
Multisets.csv Multisets-filtered.csv Multisets.pdf
If there are errors, the log will mention the file and line number to look at to see the details, such as:
[ Info: ...: failed, please check log /tmp/scratch/timeline_error_log.txt:2795580
Stdout and stderr of the failed commit can then be seen for instance with (mind the +
before the line number):
tail -n +2795580 /tmp/scratch/timeline_error_log.txt | less
The results of the scripts are stored in the julia-sts-data
repository instead of this one.