You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a brainstorm that I've had but don't have time to work on right now so I'm recording it here so I don't forget. No one should feel obligated to make it work unless they really want to. I'll get back to it eventually.
It occurs to me that the expensive part of the test process might be the image comparisons (I really should explore just where the processor time is actually spent). As such, if we could find a way to compare images faster this would help speed things up. Obviously reducing the image size helps with this, but, as we know from experience, reducing the image resolution means that small but important differences may get over looked (this is why we introduced the density option). What is needed is a fast comparison which is also strict.
Looking at ImageMagick's page on image comparison I found a section on finding duplicate images which makes two suggestions for quick methods to identify images which are identical. The first, using md5 checksums, won't work for us because we know we are comparing images created at different times and the difference in the file metadata will cause the hashes to be different. However, the second, which uses identify to compute a hash signature based purely on the image data (and not the file metadata) might work. My idea is that it might be faster to run identify on the two images and compare the signatures than to run compare. I have not yet tested this idea to see if it pans out.
If it does pan out, my concern is that this might be too strict. The signature comparison will only show if the images are exactly the same or not while our current compare check has a threshold below which the differences are not considered significant. As a result using the signature comparison as a replacement metric would cause false positive failures. To get around this, I'm thinking about using the signature comparison as a gatekeeper check. If the signature comparison shows that the images are different, then we run the existing compare check to determine if said difference is significant. This, however, involves a trade-off. Since we'd be running two comparisons on any image which is not identical, images which change (even insignificantly) would have their comparison slowed down. As a result, we'd need the signature comparison to be significantly faster and to expect that most changes will change only a small number of tests for this to actually save time on average. Indeed, assuming that the signature comparison is faster, it would be useful to know just how many tests need to change before this process starts costing additional time. In that manner we could make a more educated decision about whether it's a worth while change.
The text was updated successfully, but these errors were encountered:
Before you spend too much time tweaking things, you might want to try the -n flag first. You can run the tests without verifying anything by using the -n flag.
On my computer, if running everything (i.e., no cached images) takes 100% of the time, and 8% of the time is saved when all images are cached, then my system takes 80% of the time when using the -n flag (which skips the image conversion and comparison completely).
So, if I understand what you're saying correctly, things break down like this:
Time spent converting test expectations = 8% (the time saved by caching the images)
Time spent generating test results = 80% (the time spent when using -n)
Time spent converting test results = 8% (i.e. probably the same as the saved by caching images since we're talking about converting the same number and sorts of images)
That would leave 4% as the time spent actually doing the comparisons, and thus even if my idea could save time, it isn't going to be able to save much.
I'll check the -n option on my machine to make sure the percentages are roughly the same, but if you're right, then clearly this sort of optimization is more effort that it could possibly be worth.
This is a brainstorm that I've had but don't have time to work on right now so I'm recording it here so I don't forget. No one should feel obligated to make it work unless they really want to. I'll get back to it eventually.
It occurs to me that the expensive part of the test process might be the image comparisons (I really should explore just where the processor time is actually spent). As such, if we could find a way to compare images faster this would help speed things up. Obviously reducing the image size helps with this, but, as we know from experience, reducing the image resolution means that small but important differences may get over looked (this is why we introduced the density option). What is needed is a fast comparison which is also strict.
Looking at ImageMagick's page on image comparison I found a section on finding duplicate images which makes two suggestions for quick methods to identify images which are identical. The first, using md5 checksums, won't work for us because we know we are comparing images created at different times and the difference in the file metadata will cause the hashes to be different. However, the second, which uses
identify
to compute a hash signature based purely on the image data (and not the file metadata) might work. My idea is that it might be faster to runidentify
on the two images and compare the signatures than to runcompare
. I have not yet tested this idea to see if it pans out.If it does pan out, my concern is that this might be too strict. The signature comparison will only show if the images are exactly the same or not while our current
compare
check has a threshold below which the differences are not considered significant. As a result using the signature comparison as a replacement metric would cause false positive failures. To get around this, I'm thinking about using the signature comparison as a gatekeeper check. If the signature comparison shows that the images are different, then we run the existingcompare
check to determine if said difference is significant. This, however, involves a trade-off. Since we'd be running two comparisons on any image which is not identical, images which change (even insignificantly) would have their comparison slowed down. As a result, we'd need the signature comparison to be significantly faster and to expect that most changes will change only a small number of tests for this to actually save time on average. Indeed, assuming that the signature comparison is faster, it would be useful to know just how many tests need to change before this process starts costing additional time. In that manner we could make a more educated decision about whether it's a worth while change.The text was updated successfully, but these errors were encountered: