Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add visual regression tests with playwright #41

Merged
merged 1 commit into from
Jan 28, 2022

Conversation

martinRenou
Copy link
Collaborator

@martinRenou martinRenou commented Jan 21, 2022

Fix #39

@martinRenou martinRenou force-pushed the playwright_tests branch 9 times, most recently from 9d22386 to e58a8f2 Compare January 21, 2022 14:34
@martinRenou
Copy link
Collaborator Author

@CAM-Gerlach this is a first draft of the visual regression testing. Please tell me what you think so far.

I am currently checking the diff between the actual screenshot and the expected result. We should probably check that diff with a possible percentage of error (there seems to be a slight difference in rendering fonts between Windows/Mac/Ubuntu).
Either we allow a percentage of error, or we store reference screenshots on an OS-basis: one reference directory per platform.

I am not sure which is the best, or if there is another more sensible approach?

@CAM-Gerlach CAM-Gerlach added this to the v0.2.0 milestone Jan 22, 2022
@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Jan 22, 2022

Thanks, looking great so far! Some general comments:

  • Since running the tests with playwright is somewhat non-trivial, could this be put behind a custom Pytest flag, e.g. --compare-screenshots, like --open-browser currently is (which should also be retained for manual testing and experimentation)? This also is needed to only run on a specific set of CI jobs. You could also enable the warning filters inside there.
  • To avoid encouraging a browser monoculture, could we use FF instead of chromium?
  • If we do deliberately change something with the UI, or add a new test, we'll need to generate new screenshots, which should hopefully be fairly low friction to avoid excessive maintenance overhead. As such, could you add a Python script, GH Actions job or some other automated, cross-platform mechanism to do so (or at least document step by step what needs to be done) .
  • Right now, it looks like the artifacts from different Python versions on the same platform are overwriting each other—you could include the Python version in the filename to fix that, but the above approach (just running it on one Python version) is probably better
  • I suggest running optipng on the PNG output (CI-side, if its available); only takes a few seconds to run on all the images but saves 25-33% space for free

I am currently checking the diff between the actual screenshot and the expected result. We should probably check that diff with a possible percentage of error (there seems to be a slight difference in rendering fonts between Windows/Mac/Ubuntu).
Either we allow a percentage of error, or we store reference screenshots on an OS-basis: one reference directory per platform.

I'd think per-OS screenshots, if you can manage that, since we'd need to pick and test a threshold very carefully to avoid false negatives, and there's the risk that it could miss significant changes.

@martinRenou
Copy link
Collaborator Author

martinRenou commented Jan 22, 2022

Thanks for the review, those are all very good points I'll finish this on Monday 😊

@martinRenou
Copy link
Collaborator Author

I suggest running optipng on the PNG output (CI-side, if its availible); only takes a few seconds to run on all the images but saves 25-33% space for free

Are you suggesting this for the build artifacts? Or would you like this to be done for references that are stored in the repo as well?

@martinRenou martinRenou force-pushed the playwright_tests branch 2 times, most recently from 5f0c09e to 0e90d0d Compare January 24, 2022 15:31
@martinRenou
Copy link
Collaborator Author

martinRenou commented Jan 24, 2022

Trying the action (this might not work as it's not part of master yet):

Please generate reference screenshots

That does not work

@martinRenou martinRenou force-pushed the playwright_tests branch 2 times, most recently from b66e806 to 44cb1b6 Compare January 24, 2022 16:00
@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Jan 25, 2022

FYI, I mean to suggest this when I replied before, but you might want to remove --maxfail 5 in addopts in pytest.ini—then you'll get the full output for all tests when they fail, not just the first five. I shouldn't have added it in the first place, that line was a copy-paste from another repo I set up with it that needed it for some reason.

Are you suggesting this for the build artifacts? Or would you like this to be done for references that are stored in the repo as well?

Ideally for both, but most importantly for the reference screenshots, since they will be stored in the repo and each version will take up space for all time, so the smaller we can get them, the better. Note that this should be done as part of the GitHub workflow that generates them, rather than inside the test code itself.

@CAM-Gerlach
Copy link
Member

Trying the action (this might not work as it's not part of master yet):

Yeah, I'm sure that's what happened. We can do a best effort on this PR, merge it and then test it and follow up if we need to fix anything with that part. Or, you can merge it on your fork and test it there,, since that should work.

BTW, you can get rid of all build and install steps and replace all that with just pip install --upgrade .; also, it only needs to run on Py3.8 that the visual regression tests themselves will run on.

Let me know if you need anything!

@martinRenou
Copy link
Collaborator Author

FYI, I mean to suggest this when I replied before, but you might want to remove --maxfail 5 in addopts in pytest.ini—then you'll get the full output for all tests when they fail, not just the first five. I shouldn't have added it in the first place, that line was a copy-paste from another repo I set up with it that needed it for some reason.

Right, thanks! I was a bit confused why it was behaving this way but didn't look into it.

Note that this should be done as part of the GitHub workflow that generates them, rather than inside the test code itself.

👍🏽

Or, you can merge it on your fork and test it there

Yes, I was doing this yesterday and it does not work properly. I'm looking into it.

@martinRenou
Copy link
Collaborator Author

I am getting there: martinRenou#8

Commenting "Please generate reference screenshots" in the PR properly triggers a commit of the references :) I still need to polish a bit and fix some remaining issues, but this is taking shape

@martinRenou martinRenou force-pushed the playwright_tests branch 2 times, most recently from db1199b to c44b772 Compare January 25, 2022 13:34
@martinRenou martinRenou force-pushed the playwright_tests branch 6 times, most recently from 4986787 to 23d1039 Compare January 25, 2022 17:05
@martinRenou
Copy link
Collaborator Author

martinRenou commented Jan 25, 2022

I resolved all of your comments :)

As you can see from the references, we can reproduce this issue #38. I'll try to fix it in a separate PR.

@martinRenou martinRenou marked this pull request as ready for review January 25, 2022 17:11
Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for all your hard work on this, @martinRenou ! This looks really great. I just have some relatively minor comments, and then this should be good to go.

Top level comments:

  • I suggest running optipng on the reference PNGs output by the CI (as well as those committed here); since they will be committed irrevocably to the repo once merged, and will inexorably increase its size, the smaller we can make them, the better.
  • To be more clear, consistent and descriptive about what we're referring to, follow idiomatic English convention and avoid confusion with more typical meanings of "references" in English ("bibliography", people that will vouch for your character, etc), could you rename all instances of references to reference_screenshots or similar (as appropriate)?

To note, I'm not super comfortable with asyncio beyond the very basics (since we don't really use it in my main lines of work) and this doesn't seem to be too performance-critical given the complexity, but I suppose this is acceptable since it is only in the tests and mostly isolated to the one fixture...so long as you're willing to help fix any async-related issues that may come up.

Thanks again!

.github/workflows/test.yaml Outdated Show resolved Hide resolved
.github/workflows/test.yaml Outdated Show resolved Hide resolved
.github/workflows/update_references.yaml Outdated Show resolved Hide resolved
.github/workflows/update_references.yaml Outdated Show resolved Hide resolved
.github/workflows/update_references.yaml Outdated Show resolved Hide resolved
conftest.py Outdated Show resolved Hide resolved
conftest.py Outdated Show resolved Hide resolved
conftest.py Outdated Show resolved Hide resolved
docrepr/tests/test_output.py Outdated Show resolved Hide resolved
docrepr/tests/test_output.py Outdated Show resolved Hide resolved
@martinRenou martinRenou force-pushed the playwright_tests branch 2 times, most recently from 366cd64 to 5ecb849 Compare January 26, 2022 09:29
@martinRenou
Copy link
Collaborator Author

martinRenou commented Jan 26, 2022

I suggest running optipng on the reference PNGs output by the CI (as well as those committed here); since they will be committed irrevocably to the repo once merged, and will inexorably increase its size, the smaller we can make them, the better.

I've implemented the compression with Pillow instead: https://github.com/spyder-ide/docrepr/pull/41/files#diff-a31c7ed5d35f5ed8233994868c54d625b18e6bacb6794344c4531e62bd9dde59R97-R99 this logic also works for the reference screenshots (it has been used for the references on this PR).

I am not sure optipng can be easily installed on all platforms? Maybe Pillow's compression should be enough? I can decrease the quality even more if you want.

To note, I'm not super comfortable with asyncio beyond the very basics (since we don't really use it in my main lines of work) and this doesn't seem to be too performance-critical given the complexity, but I suppose this is acceptable since it is only in the tests and mostly isolated to the one fixture...so long as you're willing to help fix any async-related issues that may come up.

At first I implemented it without asyncio, as playwright provides a sync API: https://github.com/microsoft/playwright-python#example

The issue was that the sync API was not cleaning up resources and shutting down the browser properly (it was generating many warnings that needed to be filtered). We can roll-back to the sync API but we'll have to add filters. Also there was a warning that I was not able to filter for some reason, that's why I gave up and used the async API.

@martinRenou
Copy link
Collaborator Author

Thanks for the review!

@CAM-Gerlach
Copy link
Member

Thanks for the quick and through response!

I've implemented the compression with Pillow instead: https://github.com/spyder-ide/docrepr/pull/41/files#diff-a31c7ed5d35f5ed8233994868c54d625b18e6bacb6794344c4531e62bd9dde59R97-R99 this logic also works for the reference screenshots (it has been used for the references on this PR).

Yep, I'd noticed that—thanks. Unfortunately, Pillow's optimization still appears to be leaving a lot on the table, assuming its what you're using for the current reference screenshots—running optipng on them, even at the default (relatively low) optimization level, still results in between 10% a and 50% decrease in the size of each image, which reduces the total size of the screenshot directory from 1.5 MB to 1.1 MB, or between a 25% and 30% savings. That's not trivial considering the repo will grow in size by nearly that much every time we regenerate them, assuming they all change (consistent optimization might or might not reduce those deltas).

It actually shouldn't be that tough to install optipng on all GHA platforms—on Windows, its just choco install optipng, on Mac its brew install optipng and on Ubuntu, apt install optipng, then run it with optipng docrepr/tests/reference_screenshots/*.png. That's probably the simplest approach, but you could also upload the archives from each job, then have another job that runs after all three jobs have completed that merges, optimizes and commits them. This allows the three jobs to run in parallel, make a single commit at the end and not have to install optipng on each platform, but probably isn't worth the work unless you really want to.

The issue was that the sync API was not cleaning up resources and shutting down the browser properly (it was generating many warnings that needed to be filtered). We can roll-back to the sync API but we'll have to add filters. Also there was a warning that I was not able to filter for some reason, that's why I gave up and used the async API.

Makes sense, thanks 👍 I saw you used the non-async API before, but I figured there was a reason you switched (and noticed most of the warnings were gone).

@martinRenou martinRenou force-pushed the playwright_tests branch 5 times, most recently from 0bbb0d8 to f3f87ae Compare January 27, 2022 09:04
@martinRenou
Copy link
Collaborator Author

martinRenou commented Jan 27, 2022

Just added optipng to the update-reference-screenshots CI job :) thanks for your patience!

You can see that it runs properly here:
https://github.com/martinRenou/docrepr/runs/4963763427?check_suite_focus=true

Which was triggered by martinRenou#24

@CAM-Gerlach
Copy link
Member

@martinRenou LGTM, thanks! Now that #40 is merged, the output should change for a number of tests; to avoid an extra copy of the images and them getting out of sync, do you mind rebasing this and using the CIs on your fork to generate the updated reference screenshots for each OS (make sure you use the CI-optimized ones, since different optipng builds can produce slightly different results, I've found)? Once that's done, I'll go ahead and merge this. Thanks!

@martinRenou
Copy link
Collaborator Author

Rebased! :)

@martinRenou martinRenou mentioned this pull request Jan 28, 2022
Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @martinRenou !

@CAM-Gerlach CAM-Gerlach merged commit 2ea63ed into spyder-ide:master Jan 28, 2022
@martinRenou martinRenou deleted the playwright_tests branch January 28, 2022 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve tests to actually check that the rendered content is what we expected
2 participants