-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(scripts): add automated ODD memory usage analysis #16847
base: edge
Are you sure you want to change the base?
Conversation
on: | ||
schedule: | ||
- cron: '30 12 * * *' | ||
workflow_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this line lets us manually run an action on-the-fly from the Actions tab? Is that true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, yeah. But it may have to be merged to the default branch first so it actually shows up in the actions tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is complex enough that it should just be a javascript action with its own deps that lives in .github/actions/odd-memory-stats
or something, but it does look great!
MIXPANEL_INGEST_SECRET: ${{ secrets.MIXPANEL_INGEST_SECRET }} | ||
MIXPANEL_PROJECT_ID: ${{ secrets.OT_APP_MIXPANEL_ID }} | ||
run: | | ||
OUTPUT=$(node ./scripts/resource-monitor/perform-memory-analysis "$MIXPANEL_INGEST_USER" "$MIXPANEL_INGEST_SECRET" "$MIXPANEL_PROJECT_ID") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these don't have to be passed through the environment; you can have this be $(note ./scripts/resource-monitor/perform-memory-analysis ${{ secrets.MIXPANEL_INGEST_USER }}
for instance.
even better, and more on this later, is to just not put this through a shell action at all and implement this as a node action and use the api that it gives you, and then you don't have to deal with any of this at all
@@ -0,0 +1,84 @@ | |||
// Analysis is based on one-tailed, Pearson's correlation coefficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you implement this as a javascript action, you actually bundle locally from a separate deps list (see e.g. the helper action in oe-core; this is a rollup output plus source pair) and you can use a stats package if you want.
Closes EXEC-828
Overview
This PR adds a script and an action for performing ODD memory usage analysis in an automated manner.
Recently, we added a resource monitor to the shell layer on the ODD. The monitor scapes various data from preselected processes and the ODD in general, pushing this data to Mixpanel.
While the data is nice and possible to analyze, there are really two problems with it:
What Results Indicate (And Don't Indicate)
This script answers "is memory consumption increasing as uptime increases on a monitored ODD process" and "is ODD memory decreasing as uptime", both pointing to broad, general-usage memory leaks. Note that it is definitely possible for memory leaks to occur in short-lived function calls/views/etc, and these leaks likely would not be captured by this analysis.
How the script works
The script is the interesting part of this PR. On a high level, it:
chore_release-8.2.0
or8.1.0
.AGGREGATED_PROCESSES
, since for example, electron's renderer process can include a lot of unique flags that create a lot of unnecessary disparate results when taken individually. Similarly, there are a lot of processes we don't care to analyze, such aspython3
processes, so these are added to aBLACKLISTED_PROCESSES
and not included in the final results.Some implementation notes
I decided against parameterizing a couple things that could be debatably parameterized, notably the number of past valid builds to report too and the timeframe to analyze (we always look at the past month of available data), because I don't think this matters that much. Once a build is built, we don't expect to see any real change in memory performance month-to-month, and if users really want to compare against previous builds with resource monitor data, it's probably best to look at previous CI runs than doing a lot of rapid querying of the Mixpanel API.
Sample output
Test Plan and Hands on Testing
Changelog
Review requests
Risk assessment
low