Bedrock performance tooling and initial optimisations #15513

stevejalim · 2024-11-18T11:43:51Z

This changeset adds support for profiling Bedrock using django-silk locally (or anywhere the bedrock_test image is used - but not in production).

It also contains some optimisations - via cacheing - to reduce the DB queries executed on the busiest pages.

Significant changes and points to review

Please revew this PR commit by commit, paying sceptical attention to the usage of cacheing etc. They contain details of the number queries saved.

I know that at the moment Bedrock is using a verson of Django's LocMemCache backend - this means that for 45 pods in production, we'll still get plenty of cache misses until the pods have all had a call that warms the cache. It might be that, given the TTL of the cached items, we never really get to that point in prod, but we will in Dev and Stage where there are far fewer pods.

We'd certainly get more cacheing uplift if we had a shared cache backend, such as Redis. Given we now have Redis in play for the rq backend, we could switch it at this point (expanding this PR), or as a separate change - opinions welcome!

Issue / Bugzilla link

#15505

Testing

Unit tests passing should be enough here, but feel free to follow the notes in profiling/hit_popular_pages.py to test drive things yourself.

Questions

Is the addition of django-silk work mentioning in formal documentation?

codecov · 2024-11-18T11:57:29Z

Codecov Report

Attention: Patch coverage is 90.47619% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.89%. Comparing base (e775d32) to head (9e6ee0f).

Files with missing lines	Patch %	Lines
bedrock/settings/base.py	44.44%	5 Missing ⚠️
bedrock/urls.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #15513      +/-   ##
==========================================
+ Coverage   78.82%   78.89%   +0.07%     
==========================================
  Files         158      158              
  Lines        8282     8338      +56     
==========================================
+ Hits         6528     6578      +50     
- Misses       1754     1760       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests
JS Bundle Analysis - Avoid shipping oversized bundles

stevejalim · 2024-11-19T19:17:48Z

Talking with @bkochendorfer, we're both OK with the idea of using Redis for the Web deployments too (the same Redis we use for the CMS task queues because there's little real-world risk of task eviction if Redis gets low on memory). As such, I will follow this PR with another one focused on using Redis as an FE cache backend.

@pmac @robhudson Are you aware of any issues that are likley switching from LocMem to Redis for a cache - is there anywhere where we specifically exploit the local nature of it?

pmac · 2024-11-19T20:46:11Z

I don't know of any issues specific to locmem other than this will be a lot slower. I am a little concerned about using the same redis for cache and queue. We are doing this with basket, but that's a much lower volume. My understanding is that you'd use very different settings in redis for a cache vs. queue.

stevejalim · 2024-11-20T10:59:44Z

I don't know of any issues specific to locmem other than this will be a lot slower. I am a little concerned about using the same redis for cache and queue. We are doing this with basket, but that's a much lower volume. My understanding is that you'd use very different settings in redis for a cache vs. queue.

I hear you. I'm happy to try these cache uses based on LocMemCache for now and see what the uplift is like. My leaning towards a shared cache was to reduce cache misses from pods with cold local-memory caches, but doing things one step as a time is good - may not need to use a shared cache once the pods are warm.

profiling/hit_popular_pages.py

robhudson · 2024-11-21T01:53:25Z

bedrock/careers/views.py

+        if qs is None:
+            qs = Position.objects.exclude(job_locations="Remote")
+            cache.set(_key, qs, settings.CACHE_TIME_SHORT)
+        return qs


Should we cache the queryset or cache the list of objects?

Ideally yeah, we'd cache the objects, not the qs, but it's a bit more fiddly than I expected.

Because these are CBVs, cacheing the queryset in get_queryset is easier to do (and to discover) than cacheing the Position objects themselves, because there's no clean way to slot in cacheing when the view sets the object_list attribute that I can see

Later, even though we don't use pagination in these pages, the list view runs the data through paginate_queryset() which currently (Django 4.2.x) takes a queryset arg but treats it as a general iterable -- all of which means we could make get_queryset return a list of objects (from the cache), not a QuerySet, but it kind of muddies things.

What do you think?

Thanks for sharing that. It's a bit more complex than first glance. I'm actually a bit surprised we can cache the queryset objects, which seems more complex. I suppose they get pickled? But this is fine with me, I was mostly curious about your thoughts on the consideration. Thanks!

robhudson · 2024-11-21T02:05:35Z

bedrock/careers/tests/test_models.py

 from bedrock.careers.models import Position
 from bedrock.careers.tests import PositionFactory
 from bedrock.mozorg.tests import TestCase


 class TestPositionModel(TestCase):
+    def setUp(self):
+        cache.clear()
+


for bonus points you could test that the db call doesn't get triggered on a 2nd call to the wrapped methods.

Yeah - good call. Time to break out assertNumQueries!

robhudson

r+wc

Also add script to hit some common/popular URLs to give djanjo-silk some traffic to capture

Saves 11 SQL queries on the releasnotes page by cacheing the country code lookup for an hour. Tested on /en-US/firefox/132.0.1/releasenotes/ Cold cache: 14 queries / 2066ms Warm cache: 3 queries / 222ms

…ngful a name

Caches derived values (location options, etc) for longer than the actual page of positions Takes careers listing page down from 9 queries to 2 for the CMS deployment and 0 on the Web deployment

It's used on the newsletter form which shows up in a bunch of places on the site.

stevejalim requested a review from a team as a code owner November 18, 2024 11:43

stevejalim requested review from pmac and robhudson November 18, 2024 11:44

robhudson reviewed Nov 21, 2024

View reviewed changes

profiling/hit_popular_pages.py Outdated Show resolved Hide resolved

robhudson reviewed Nov 21, 2024

View reviewed changes

robhudson approved these changes Nov 21, 2024

View reviewed changes

stevejalim added 7 commits November 21, 2024 11:50

Add support for Django-silk profiling in local/non-prod only

d9a93e8

Also add script to hit some common/popular URLs to give djanjo-silk some traffic to capture

Add cacheing to geo.valid_country_code for performance boost

f04e774

Saves 11 SQL queries on the releasnotes page by cacheing the country code lookup for an hour. Tested on /en-US/firefox/132.0.1/releasenotes/ Cold cache: 14 queries / 2066ms Warm cache: 3 queries / 222ms

Rename 'default' cache time to CACHE_TIME_SHORT to make it more meani…

b6b234b

…ngful a name

Add cacheing to Careers views to reduce number of DB hits

ecb0281

Caches derived values (location options, etc) for longer than the actual page of positions Takes careers listing page down from 9 queries to 2 for the CMS deployment and 0 on the Web deployment

Cache the newsletter lookup for 6 hours

8867fa8

It's used on the newsletter form which shows up in a bunch of places on the site.

Add missing subdep to hashed requirements

858c264

Reinstate missing function call in _log() helper 🤦

9e6ee0f

stevejalim force-pushed the 15505-bedrock-perf-pass branch from 6784cc0 to 9e6ee0f Compare November 21, 2024 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bedrock performance tooling and initial optimisations #15513

Bedrock performance tooling and initial optimisations #15513

stevejalim commented Nov 18, 2024 •

edited

Loading

codecov bot commented Nov 18, 2024 •

edited

Loading

stevejalim commented Nov 19, 2024

pmac commented Nov 19, 2024

stevejalim commented Nov 20, 2024

robhudson Nov 21, 2024

stevejalim Nov 21, 2024

robhudson Nov 21, 2024

robhudson Nov 21, 2024

stevejalim Nov 21, 2024

robhudson left a comment

Bedrock performance tooling and initial optimisations #15513

Are you sure you want to change the base?

Bedrock performance tooling and initial optimisations #15513

Conversation

stevejalim commented Nov 18, 2024 • edited Loading

Significant changes and points to review

Issue / Bugzilla link

Testing

Questions

codecov bot commented Nov 18, 2024 • edited Loading

Codecov Report

stevejalim commented Nov 19, 2024

pmac commented Nov 19, 2024

stevejalim commented Nov 20, 2024

robhudson Nov 21, 2024

Choose a reason for hiding this comment

stevejalim Nov 21, 2024

Choose a reason for hiding this comment

robhudson Nov 21, 2024

Choose a reason for hiding this comment

robhudson Nov 21, 2024

Choose a reason for hiding this comment

stevejalim Nov 21, 2024

Choose a reason for hiding this comment

robhudson left a comment

Choose a reason for hiding this comment

stevejalim commented Nov 18, 2024 •

edited

Loading

codecov bot commented Nov 18, 2024 •

edited

Loading