-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate unit tests to pytest? #79
Comments
Hi @kalekundert Thanks a lot for opening this issue! I am not familiar with Let me comment briefly on the failed tests:
The following failures are suspicious, though. I will take a look, thanks for reporting that!
Ahah nice, thanks, I really appreciate this comment! 😄 |
I personally would favor a port to pytest too. Let me know, if any help would be needed. |
Another aspect I like with vanilla pytest is, that tests can be marked. This way, I can isolate GPU-only tests from short runtime tests. The latter are usually the default and will be run for every commit (for the sake of rapid feedback). The GPU unit tests I then schedule to run once a day or once a week. |
I don't have any immediate plans to work on this, so feel free to start working on it yourself if you want to. I actually don't think it'd be that big a change. Pytest can run unittest-style tests, so you wouldn't have to rewrite any code. Just add some marks and configure CI to run pytest. If you do want to work on this, one thing that might be useful is this JSON file I compiled. It's just a dictionary where the keys are test names and the values are how long it takes for that test to run, in seconds. My intention was to use this to quickly mark all the "slow" tests. Some tests will be missing from this dictionary. Those are likely, but not necessarily, the ones that took >4h to complete. |
While I was working on #78, the test suite caused me a lot of trouble. There were two main problems:
Running the tests takes a really long time. I ended up writing some scripts to run the tests on my institution's cluster, but even then ≈50 of the tests (individual test functions, not test files) run up against a 4h time limit. The whole suite takes >34h to run, not counting any tests that were killed for exceeding the time limit. There are also a handful of tests that require >32 GB of RAM!
There are ≈30 tests that are either broken or flaky. This makes it a bit scary to develop new features, because you're not sure which test failures you're actually responsible for. And of course, even if you realize that you're not responsible for a particular failure, you have no way of knowing if you broke something relating to these tests. I do think this relates back to the first problem, because if it were practical to run the whole test suite for each commit, there would surely be fewer broken tests.
Migrating to pytest wouldn't solve the underlying issue, but it has a few features that I think would help a lot. The biggest is the ability to "mark" tests. There are two marks that I'd want to use extensively. The first is
xfail
, which indicates that a test is currently broken and expected to fail. The failures are still reported, so you know they're happening, but they don't affect whether or not the whole suite is considered to pass or fail. This means that you can require new contributions to not break any currently working tests, while not having to fix all the broken/flaky tests up front. The second mark isslow
, which would indicate that a test takes more than a few seconds to complete. With this information, GitHub could be configured to automatically run all the "not slow" tests for each PR, which would help protect against breakages. The slow tests could either be gradually made fast enough to include in the automated test suite, or just be run manually when there's some reason to.Pytest also supports running tests in parallel across multiple processes, via the
pytest-xdist
plugin. This alone wouldn't be enough to make the tests run in a reasonable amount of time, but it would help. Finally, pytest makes it easier to share code between tests (because it doesn't require every test to belong to a class) and has features like parametrization that can cut down on boilerplate. These last two things wouldn't make the tests run any faster, but would make them easier to write.Is this something you'd be in favor of? I'm not asking for you to do anything yourself, I'm more asking for permission to do something like this the next time I want to make a non-trivial change to the code. (With no guarantees that I'll ever actually get around to it.)
In case it's of any interest, here's the test report I generated for the most recent commit (
94380ef
):The upsampling equivariance tests are the ones that are flaky. The mixed precision tests probably aren't really failures, I just wasn't running on nodes with GPUs.
To end on a more positive note, I should say that I really appreciate how extensive the test suite is. There are so many academic projects out there with not enough tests, but this is the only one I know of with too many tests! It's a good problem to have, I think.
The text was updated successfully, but these errors were encountered: