Add visual regression tests with playwright #41

martinRenou · 2022-01-21T10:11:45Z

.github/workflows/test.yaml

martinRenou · 2022-01-21T14:43:16Z

@CAM-Gerlach this is a first draft of the visual regression testing. Please tell me what you think so far.

I am currently checking the diff between the actual screenshot and the expected result. We should probably check that diff with a possible percentage of error (there seems to be a slight difference in rendering fonts between Windows/Mac/Ubuntu).
Either we allow a percentage of error, or we store reference screenshots on an OS-basis: one reference directory per platform.

I am not sure which is the best, or if there is another more sensible approach?

CAM-Gerlach · 2022-01-22T04:16:47Z

Thanks, looking great so far! Some general comments:

Since running the tests with playwright is somewhat non-trivial, could this be put behind a custom Pytest flag, e.g. --compare-screenshots, like --open-browser currently is (which should also be retained for manual testing and experimentation)? This also is needed to only run on a specific set of CI jobs. You could also enable the warning filters inside there.
To avoid encouraging a browser monoculture, could we use FF instead of chromium?
If we do deliberately change something with the UI, or add a new test, we'll need to generate new screenshots, which should hopefully be fairly low friction to avoid excessive maintenance overhead. As such, could you add a Python script, GH Actions job or some other automated, cross-platform mechanism to do so (or at least document step by step what needs to be done) .
Right now, it looks like the artifacts from different Python versions on the same platform are overwriting each other—you could include the Python version in the filename to fix that, but the above approach (just running it on one Python version) is probably better
I suggest running optipng on the PNG output (CI-side, if its available); only takes a few seconds to run on all the images but saves 25-33% space for free

I am currently checking the diff between the actual screenshot and the expected result. We should probably check that diff with a possible percentage of error (there seems to be a slight difference in rendering fonts between Windows/Mac/Ubuntu).
Either we allow a percentage of error, or we store reference screenshots on an OS-basis: one reference directory per platform.

I'd think per-OS screenshots, if you can manage that, since we'd need to pick and test a threshold very carefully to avoid false negatives, and there's the risk that it could miss significant changes.

martinRenou · 2022-01-22T07:01:56Z

Thanks for the review, those are all very good points I'll finish this on Monday 😊

martinRenou · 2022-01-24T12:28:01Z

I suggest running optipng on the PNG output (CI-side, if its availible); only takes a few seconds to run on all the images but saves 25-33% space for free

Are you suggesting this for the build artifacts? Or would you like this to be done for references that are stored in the repo as well?

martinRenou · 2022-01-24T15:33:50Z

~~Trying the action (this might not work as it's not part of master yet):~~

~~Please generate reference screenshots~~

That does not work

CAM-Gerlach · 2022-01-25T01:15:10Z

FYI, I mean to suggest this when I replied before, but you might want to remove --maxfail 5 in addopts in pytest.ini—then you'll get the full output for all tests when they fail, not just the first five. I shouldn't have added it in the first place, that line was a copy-paste from another repo I set up with it that needed it for some reason.

Are you suggesting this for the build artifacts? Or would you like this to be done for references that are stored in the repo as well?

Ideally for both, but most importantly for the reference screenshots, since they will be stored in the repo and each version will take up space for all time, so the smaller we can get them, the better. Note that this should be done as part of the GitHub workflow that generates them, rather than inside the test code itself.

CAM-Gerlach · 2022-01-25T01:28:35Z

Trying the action (this might not work as it's not part of master yet):

Yeah, I'm sure that's what happened. We can do a best effort on this PR, merge it and then test it and follow up if we need to fix anything with that part. Or, you can merge it on your fork and test it there,, since that should work.

BTW, you can get rid of all build and install steps and replace all that with just pip install --upgrade .; also, it only needs to run on Py3.8 that the visual regression tests themselves will run on.

Let me know if you need anything!

martinRenou · 2022-01-25T08:05:53Z

FYI, I mean to suggest this when I replied before, but you might want to remove --maxfail 5 in addopts in pytest.ini—then you'll get the full output for all tests when they fail, not just the first five. I shouldn't have added it in the first place, that line was a copy-paste from another repo I set up with it that needed it for some reason.

Right, thanks! I was a bit confused why it was behaving this way but didn't look into it.

Note that this should be done as part of the GitHub workflow that generates them, rather than inside the test code itself.

👍🏽

Or, you can merge it on your fork and test it there

Yes, I was doing this yesterday and it does not work properly. I'm looking into it.

martinRenou · 2022-01-25T09:19:21Z

I am getting there: martinRenou#8

Commenting "Please generate reference screenshots" in the PR properly triggers a commit of the references :) I still need to polish a bit and fix some remaining issues, but this is taking shape

martinRenou · 2022-01-25T17:06:05Z

I resolved all of your comments :)

As you can see from the references, we can reproduce this issue #38. I'll try to fix it in a separate PR.

CAM-Gerlach

Thanks so much for all your hard work on this, @martinRenou ! This looks really great. I just have some relatively minor comments, and then this should be good to go.

Top level comments:

I suggest running optipng on the reference PNGs output by the CI (as well as those committed here); since they will be committed irrevocably to the repo once merged, and will inexorably increase its size, the smaller we can make them, the better.
To be more clear, consistent and descriptive about what we're referring to, follow idiomatic English convention and avoid confusion with more typical meanings of "references" in English ("bibliography", people that will vouch for your character, etc), could you rename all instances of references to reference_screenshots or similar (as appropriate)?

To note, I'm not super comfortable with asyncio beyond the very basics (since we don't really use it in my main lines of work) and this doesn't seem to be too performance-critical given the complexity, but I suppose this is acceptable since it is only in the tests and mostly isolated to the one fixture...so long as you're willing to help fix any async-related issues that may come up.

Thanks again!

.github/workflows/test.yaml

.github/workflows/update_references.yaml

conftest.py

docrepr/tests/test_output.py

martinRenou · 2022-01-26T09:36:19Z

I suggest running optipng on the reference PNGs output by the CI (as well as those committed here); since they will be committed irrevocably to the repo once merged, and will inexorably increase its size, the smaller we can make them, the better.

I've implemented the compression with Pillow instead: https://github.com/spyder-ide/docrepr/pull/41/files#diff-a31c7ed5d35f5ed8233994868c54d625b18e6bacb6794344c4531e62bd9dde59R97-R99 this logic also works for the reference screenshots (it has been used for the references on this PR).

I am not sure optipng can be easily installed on all platforms? Maybe Pillow's compression should be enough? I can decrease the quality even more if you want.

To note, I'm not super comfortable with asyncio beyond the very basics (since we don't really use it in my main lines of work) and this doesn't seem to be too performance-critical given the complexity, but I suppose this is acceptable since it is only in the tests and mostly isolated to the one fixture...so long as you're willing to help fix any async-related issues that may come up.

At first I implemented it without asyncio, as playwright provides a sync API: https://github.com/microsoft/playwright-python#example

The issue was that the sync API was not cleaning up resources and shutting down the browser properly (it was generating many warnings that needed to be filtered). We can roll-back to the sync API but we'll have to add filters. Also there was a warning that I was not able to filter for some reason, that's why I gave up and used the async API.

martinRenou · 2022-01-26T09:36:24Z

Thanks for the review!

CAM-Gerlach · 2022-01-27T06:25:55Z

Thanks for the quick and through response!

I've implemented the compression with Pillow instead: https://github.com/spyder-ide/docrepr/pull/41/files#diff-a31c7ed5d35f5ed8233994868c54d625b18e6bacb6794344c4531e62bd9dde59R97-R99 this logic also works for the reference screenshots (it has been used for the references on this PR).

Yep, I'd noticed that—thanks. Unfortunately, Pillow's optimization still appears to be leaving a lot on the table, assuming its what you're using for the current reference screenshots—running optipng on them, even at the default (relatively low) optimization level, still results in between 10% a and 50% decrease in the size of each image, which reduces the total size of the screenshot directory from 1.5 MB to 1.1 MB, or between a 25% and 30% savings. That's not trivial considering the repo will grow in size by nearly that much every time we regenerate them, assuming they all change (consistent optimization might or might not reduce those deltas).

It actually shouldn't be that tough to install optipng on all GHA platforms—on Windows, its just choco install optipng, on Mac its brew install optipng and on Ubuntu, apt install optipng, then run it with optipng docrepr/tests/reference_screenshots/*.png. That's probably the simplest approach, but you could also upload the archives from each job, then have another job that runs after all three jobs have completed that merges, optimizes and commits them. This allows the three jobs to run in parallel, make a single commit at the end and not have to install optipng on each platform, but probably isn't worth the work unless you really want to.

The issue was that the sync API was not cleaning up resources and shutting down the browser properly (it was generating many warnings that needed to be filtered). We can roll-back to the sync API but we'll have to add filters. Also there was a warning that I was not able to filter for some reason, that's why I gave up and used the async API.

Makes sense, thanks 👍 I saw you used the non-async API before, but I figured there was a reason you switched (and noticed most of the warnings were gone).

martinRenou · 2022-01-27T09:18:48Z

Just added optipng to the update-reference-screenshots CI job :) thanks for your patience!

You can see that it runs properly here:
https://github.com/martinRenou/docrepr/runs/4963763427?check_suite_focus=true

Which was triggered by martinRenou#24

CAM-Gerlach · 2022-01-28T00:33:39Z

@martinRenou LGTM, thanks! Now that #40 is merged, the output should change for a number of tests; to avoid an extra copy of the images and them getting out of sync, do you mind rebasing this and using the CIs on your fork to generate the updated reference screenshots for each OS (make sure you use the CI-optimized ones, since different optipng builds can produce slightly different results, I've found)? Once that's done, I'll go ahead and merge this. Thanks!

martinRenou · 2022-01-28T08:58:43Z

Rebased! :)

CAM-Gerlach

LGTM, thanks @martinRenou !

martinRenou force-pushed the playwright_tests branch from 291eef6 to 405fa27 Compare January 21, 2022 10:18

martinRenou added the type:Enhancement label Jan 21, 2022

martinRenou commented Jan 21, 2022

View reviewed changes

.github/workflows/test.yaml Outdated Show resolved Hide resolved

martinRenou force-pushed the playwright_tests branch 9 times, most recently from 9d22386 to e58a8f2 Compare January 21, 2022 14:34

CAM-Gerlach assigned martinRenou Jan 22, 2022

CAM-Gerlach added this to the v0.2.0 milestone Jan 22, 2022

martinRenou force-pushed the playwright_tests branch from e58a8f2 to a67fdb1 Compare January 24, 2022 12:06

martinRenou force-pushed the playwright_tests branch 2 times, most recently from 5f0c09e to 0e90d0d Compare January 24, 2022 15:31

martinRenou force-pushed the playwright_tests branch 2 times, most recently from b66e806 to 44cb1b6 Compare January 24, 2022 16:00

martinRenou force-pushed the playwright_tests branch 2 times, most recently from db1199b to c44b772 Compare January 25, 2022 13:34

martinRenou force-pushed the playwright_tests branch 6 times, most recently from 4986787 to 23d1039 Compare January 25, 2022 17:05

martinRenou marked this pull request as ready for review January 25, 2022 17:11

martinRenou requested a review from CAM-Gerlach January 25, 2022 17:11

CAM-Gerlach requested changes Jan 26, 2022

View reviewed changes

martinRenou force-pushed the playwright_tests branch 2 times, most recently from 366cd64 to 5ecb849 Compare January 26, 2022 09:29

martinRenou force-pushed the playwright_tests branch 5 times, most recently from 0bbb0d8 to f3f87ae Compare January 27, 2022 09:04

martinRenou force-pushed the playwright_tests branch from f3f87ae to f0d8f29 Compare January 28, 2022 07:56

Visual regression tests with playwright

5ede725

martinRenou force-pushed the playwright_tests branch from f0d8f29 to 5ede725 Compare January 28, 2022 08:12

martinRenou mentioned this pull request Jan 28, 2022

Release Docrepr 0.2.0 #42

Closed

CAM-Gerlach approved these changes Jan 28, 2022

View reviewed changes

CAM-Gerlach merged commit 2ea63ed into spyder-ide:master Jan 28, 2022

martinRenou deleted the playwright_tests branch January 28, 2022 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add visual regression tests with playwright #41

Add visual regression tests with playwright #41

martinRenou commented Jan 21, 2022 •

edited

Loading

martinRenou commented Jan 21, 2022

CAM-Gerlach commented Jan 22, 2022 •

edited

Loading

martinRenou commented Jan 22, 2022 •

edited

Loading

martinRenou commented Jan 24, 2022

martinRenou commented Jan 24, 2022 •

edited

Loading

CAM-Gerlach commented Jan 25, 2022 •

edited

Loading

CAM-Gerlach commented Jan 25, 2022

martinRenou commented Jan 25, 2022

martinRenou commented Jan 25, 2022

martinRenou commented Jan 25, 2022 •

edited

Loading

CAM-Gerlach left a comment •

edited

Loading

martinRenou commented Jan 26, 2022 •

edited

Loading

martinRenou commented Jan 26, 2022

CAM-Gerlach commented Jan 27, 2022

martinRenou commented Jan 27, 2022 •

edited

Loading

CAM-Gerlach commented Jan 28, 2022

martinRenou commented Jan 28, 2022

CAM-Gerlach left a comment

Add visual regression tests with playwright #41

Add visual regression tests with playwright #41

Conversation

martinRenou commented Jan 21, 2022 • edited Loading

martinRenou commented Jan 21, 2022

CAM-Gerlach commented Jan 22, 2022 • edited Loading

martinRenou commented Jan 22, 2022 • edited Loading

martinRenou commented Jan 24, 2022

martinRenou commented Jan 24, 2022 • edited Loading

CAM-Gerlach commented Jan 25, 2022 • edited Loading

CAM-Gerlach commented Jan 25, 2022

martinRenou commented Jan 25, 2022

martinRenou commented Jan 25, 2022

martinRenou commented Jan 25, 2022 • edited Loading

CAM-Gerlach left a comment • edited Loading

Choose a reason for hiding this comment

martinRenou commented Jan 26, 2022 • edited Loading

martinRenou commented Jan 26, 2022

CAM-Gerlach commented Jan 27, 2022

martinRenou commented Jan 27, 2022 • edited Loading

CAM-Gerlach commented Jan 28, 2022

martinRenou commented Jan 28, 2022

CAM-Gerlach left a comment

Choose a reason for hiding this comment

martinRenou commented Jan 21, 2022 •

edited

Loading

CAM-Gerlach commented Jan 22, 2022 •

edited

Loading

martinRenou commented Jan 22, 2022 •

edited

Loading

martinRenou commented Jan 24, 2022 •

edited

Loading

CAM-Gerlach commented Jan 25, 2022 •

edited

Loading

martinRenou commented Jan 25, 2022 •

edited

Loading

CAM-Gerlach left a comment •

edited

Loading

martinRenou commented Jan 26, 2022 •

edited

Loading

martinRenou commented Jan 27, 2022 •

edited

Loading