-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up make tests
#16295
Speed up make tests
#16295
Conversation
LGTM, thank you @twm! I don't get any local performance boost from these on macOS, but I tried on my Linux machine last night and got roughly 6% as well (I didn't seem to hit the |
Awesome, glad this helps! Should I keep rebasing this (any my other PRs) while they await review/merge? |
Thanks for providing the context and details. I've been trying to understand how this happens, and can't seem to dig it up anything. Tests that emit metrics are often configured to use the Could you retry the test suite without the modification of starting the notdatadog container, but exclude this set of tests confirm/refute that this is happening from those tests only? Since we would prefer to keep the amount of containers started for tests low, I'd prefer to not autostart another one if possible.
Would gladly accept this on its own, nice speedup! |
I totally appreciate that, and am happy to dig in more. This was just the hack that let me get #16260 into reviewable shape. I'm currently working to reproduce. It would slightly surprise me — there are only about a dozen tests there — but possible. I recall from a past life that the DataDog client is aggressive about doing DNS resolution up front.
Awesome, I've extracted it to #16384. |
On Given that, I think the extreme slowdown I observed was caused related to link congestion issues on my end. I identified some bad hardware on my network that was causing packet loss a few days after filing this. As DNS is pretty sensitive to loss I think that may explain it. Would it still be helpful to track down the |
Glad that the slowdown is no longer apparent.
I think so. Calls are explicitly allowed to Lines 90 to 91 in a1f916b
but removing that doesn't cause any failures, so that's probably not it either. You could try removing the allowance from the config and running the tests on Linux - maybe there's some sneaky differences in how container networking bridges work these days (there definitely are). If you have the time and desire to hunt further, please do! |
FWIW, my colleagues @DarkaMaul and @facutuesca were able to reproduce the |
I did spend some time on this a few days ago and found that the resolutions are definitely not solely from I have two theories to look into:
The first theory seems strongest to me. I don't understand why |
3e7f14c
to
aadac72
Compare
I found the test suite took ~30 minutes to run on my system, with CPU and I/O curiously idle for the bulk of that period. It turns out that they were furiously attempting to resolve the hostname "notdatadog". As I'm working in a VM behind NAT they were getting rate-limited by dnsmasq, which logged messages like this: Jul 16 23:16:24 vmhost dnsmasq[2230]: Maximum number of concurrent DNS queries reached (max: 150) The quick fix is to run the notdatadog stub container alongside the tests, which gives a runtime of ~1:30 on my system.
pytest-socket resolves the hostnames in the --allow-hosts setting for each test, so removing notdatadog saves ~4k DNS resolution attempts. Note that the tests run with a stripe container present. Resolving the stripe hostname doesn't result in a DNS resolution because getaddrinfo resolves the name by reading the /etc/hosts file populated by Docker (same story for localhost).
Some experimentation with packet capture indicates that yes, the culprit is This is ready for review. |
Just a bit late to the party here, but I can confirm from my tests that the culprit is indeed the pytest-socket hostname normalization because it calls The problem stems from python3 -c 'import socket; socket.getaddrinfo("not-existing", None)' 0,03s user 0,01s system 0% cpu 5,071 total And without : python3 -c 'import socket; socket.getaddrinfo("not-existing", None)' 0,03s user 0,01s system 24% cpu 0,156 total Adding |
I put that there back in February, with a note that without it, runnings tests with debug log level spat out lots of errors. I can't seem to reporduce that now, so probably not necessary any more.
Indeed, that could be great!
Thanks for boiling down the reproduction issue. I'm going to go with these changes, thanks for figuring out the complex interaction with Tailscale! |
COMMAND_ARGS="$@" | ||
COMMAND_ARGS=( "$@" ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice improvement
PR for this here: miketheman/pytest-socket#369 |
nodatadog
DNS requests from escaping to the wider world while running the tests (see 9da9f2a for context)I also tried:
METRICS_BACKEND=warehouse.metrics.NullMetrics
on thetests
container indocker-compose.yml
pytest --dist=worksteal
which I've seen make a big difference.Neither had a measurable effect on runtime.