You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I might want to split my tests up into multiple files, corresponding to separate modules. I would do this for separation of concerns, better organization, and reduce polluting the global namespace (e.g. with shared data or set-up/tear-down associated with several specific testsets). However, ReTest discourages this practice due to its current model of parallelism.
Consider the following example:
# runtests.jlusing Distributed
N_WORKERS =2addprocs(N_WORKERS, exeflags="--project=$(Base.active_project())")
# Propagate importing ReTest across all workers@everywhereusing ReTest
# Assume that each module_x.jl defines a `ModuleX` in the global namespace.@everywhereinclude("path/to/module_1.jl")
@everywhereinclude("path/to/module_2.jl")
...@everywhereinclude("path/to/module_M.jl")
# Call ReTest from the main process, and dispatch the workers across the testsets.retest(Module1, Module2, ..., ModuleM; ...)
If M=100, and each ModuleX defined a single testset, the user would reasonably assume that ReTest would achieve reasonable parallelism and approximately halve the time taken due to having 2 workers. This is not the case, as ReTest only dispatches the workers across the testsets of a single module. This then means that enabling parallelism would have absolutely no effect at all, and due to overhead would simply be slower. On top of that, shuffling would also have no effect, as it also only works per-module, and so any hidden dependencies between modules would never be detected. This code block in src/ReTest.jl in the source confirms this:
for imod ineachindex(modules)
mod, pat = modules[imod]
tests = alltests[imod]
...
shuffle &&shuffle!(tests)
...#TODO: move printer task out of worker?
worker =@taskbegin...
While this is fine and can be worked around (e.g. consolidating tests into a single file), this seems like an unnecessary restriction that the library imposes. Instead, it would be better if all of the testsets across all modules were gathered into a single global set that can then be divided across the workers. In this model of parallelism, the previewer task would work in a similar way to as it does currently, going through the testsets in the order that they appear to retest. Instead of having the workers follow in lockstep, they would simply return the results to the previewer process as and when they complete their testsets. When the previewer reaches a testset and the result is there, it immediately evaluates it and displays it. If it doesn't, it waits on the result until one of the workers completes it.
This has several advantages:
It improves parallelism in the case of many test modules and files
Parallel performance and good design through separation of concerns are no longer mutually exclusive goals
Shuffling can occur across modules as opposed to only per-module, greatly increasing the number of possible combinations
More opportunities for optimizing load-balancing are available, assuming the number of modules > 1
Developers that use a pattern like a single include("path/to/all_my_tests.jl") can continue to do so
I'd be happy to contribute towards this, as this seems like a significant improvement in flexibility for the developer and would go a long way towards making this a package that can be used for production.
The text was updated successfully, but these errors were encountered:
Thanks for opening this issue, these are all very good points! I haven't these limitations myself yet, so your feedback coming from real usecase is very useful. I will try to think about it in the upcoming weeks and discuss with you here on a possible path forward.
As a user, I might want to split my tests up into multiple files, corresponding to separate modules. I would do this for separation of concerns, better organization, and reduce polluting the global namespace (e.g. with shared data or set-up/tear-down associated with several specific testsets). However, ReTest discourages this practice due to its current model of parallelism.
Consider the following example:
If M=100, and each
ModuleX
defined a single testset, the user would reasonably assume that ReTest would achieve reasonable parallelism and approximately halve the time taken due to having 2 workers. This is not the case, as ReTest only dispatches the workers across the testsets of a single module. This then means that enabling parallelism would have absolutely no effect at all, and due to overhead would simply be slower. On top of that, shuffling would also have no effect, as it also only works per-module, and so any hidden dependencies between modules would never be detected. This code block insrc/ReTest.jl
in the source confirms this:While this is fine and can be worked around (e.g. consolidating tests into a single file), this seems like an unnecessary restriction that the library imposes. Instead, it would be better if all of the testsets across all modules were gathered into a single global set that can then be divided across the workers. In this model of parallelism, the previewer task would work in a similar way to as it does currently, going through the testsets in the order that they appear to
retest
. Instead of having the workers follow in lockstep, they would simply return the results to the previewer process as and when they complete their testsets. When the previewer reaches a testset and the result is there, it immediately evaluates it and displays it. If it doesn't, it waits on the result until one of the workers completes it.This has several advantages:
include("path/to/all_my_tests.jl")
can continue to do soI'd be happy to contribute towards this, as this seems like a significant improvement in flexibility for the developer and would go a long way towards making this a package that can be used for production.
The text was updated successfully, but these errors were encountered: