Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster CI: Sparse run matrix, cancel previous jobs on the same branch #635

Merged
merged 11 commits into from
Nov 13, 2024

Conversation

burnpanck
Copy link
Contributor

This PR starts using a dynamic run matrix in the four heavier workflows.

It uses a small python script to generate a subset of all possible combinations. For the two biggest workflows (Conan and CMake), it currently employs the following strategy:

  1. Test each compiler/platform with a single configuration:
  • C++20
  • std::format where supported
  • contract checking with gsl-lite
  • Debug build
  1. Then, it adds randomly selected combinations until each individual settings value appears in at least two combinations

This currently generates 26 different combinations, out of 261 total acceptable combinations (I added Clang 19 to the test-set).

Hopefully, the python code to generate the run matrix is readable enough.

Another feature I added is to use concurrency groups to cancel previous runs on the same branch that haven't completed yet

@mpusz
Copy link
Owner

mpusz commented Nov 12, 2024

Great! Thanks a lot! This should help to speed up our CI.

One note, though. clang-19 may still be broken (llvm/llvm-project#110231)

@burnpanck
Copy link
Contributor Author

One note, though. clang-19 may still be broken (llvm/llvm-project#110231)

Ah right, I observed the same locally, and I thought I'd give it a shot in CI. I'll remove Clang 19 then for now.

@burnpanck
Copy link
Contributor Author

Finally, CI passes! As always, took me a lot of trial an error until it fully worked online in CI - do you want me to squash the history here?

Overall, the changes here seem to be extraordinarily effective: With the exception of the tasks on macos runners (none of which have been picked up yet), all checks passed within less than 15 mins of the commit here. While, as of now, on @mpusz's and on @chiphogg's branches, there are still queued workflows from 10 hours ago and earlier. The immediate cancelling of earlier workflows on the branch obviously helps a lot, but it leads to commits being marked as "failed" even if no job actually failed. This would only affect commits made in rapid succession, for which we usually don't care if intermediate ones don't go green. So I believe this fixes #633 and is ready to merge.

Unfortunately, workflows started on branches prior to this PR will not get cancelled, because those workflows are not associated with any "concurrency group", but its just a couple more hours until those will timeout :-).

@mpusz mpusz linked an issue Nov 12, 2024 that may be closed by this pull request
@mpusz
Copy link
Owner

mpusz commented Nov 12, 2024

@burnpanck, thanks a lot!

Do you think that it is ready to merge? Maybe we should disable Mac OS-based CI for a while?

@burnpanck
Copy link
Contributor Author

I think we can merge. I was trying to find information about some site-wide outage on GitHub's hosted macos runners, but I couldn't find anything on the GitHub status page nor on reddit. If they are limiting the number of active jobs on macos per repo, then hopefully, in a couple hours when all the past ones timeout, we might finally catch up to the new commits (where we now only have ~8 jobs on macos as it seems). If tomorrow things still don't progress, I think we should assume severe performance issue in GitHub and disable the macos runners.

Copy link
Collaborator

@chiphogg chiphogg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go a HUGE way towards restoring sanity in terms of workload, without meaningfully sacrificing our coverage. Thank you @burnpanck!

(I didn't review the code changes, but @mpusz has already improved, and I'm excited to see this change.)

with:
python-version: 3.x
- id: set-matrix
run: python .github/generate-job-matrix.py --preset clang-tidy --seed 42 --debug combinations counts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some GitHub Actions wizardry I had not seen before. Cool!

@mpusz mpusz merged commit 6bcb46c into mpusz:master Nov 13, 2024
68 of 76 checks passed
@mpusz
Copy link
Owner

mpusz commented Nov 13, 2024

@burnpanck, is there a way to ensure that there is at least one configuration that tests import_std per each CI run?
For now import std; works only on clang-18 and requires cxxmodules=True && contracts=none && formatting=std::format. This is a very particular configuration, and it might be skipped using an automated approach.

@mpusz
Copy link
Owner

mpusz commented Nov 13, 2024

Also, if other compilers will get this configuration it should not enable import_std as they do not support it. This means that the logic for this flag in a GitHub Action script might need a minor change to check for the compiler type as well.

@burnpanck
Copy link
Contributor Author

Absolutely. The specific configuration is generated in generate-job-matrix.py using that CombinationCollector. It currently provides two methods; all_combinations, pulling in all combinations of a specified submatrix, and sample_combinations, pulling in a sample of combinations of a specified submatrix of the build. In both cases, the submatrix is defined by restricting to the "options" given as keyword arguments, falling back to the full set for aspects that are omitted in keyword arguments. CombinationCollector then automatically builds the union, so matrix elements are never included twice (and sample_combinations respects combinations that have already been included). So all it takes is a call to sample_combinations(formatting="std::format", contracts="none", config=configs["Clang-18 (x86-64)"]), so it samples at least one from that subset.

The code here should be exactly compatible with previous versions of the GitHub Action scripts (with the exception of the renaming from conan-config to conan_config out of convenience), as it generates the exact same GitHub Action variables. In particular, it appears that conan-config was always being set to the empty string already before, and the cxx_modules variable is set by the "config" (i.e. the compiler) directly.

Note also that this script always uses a fixed random seed (can be changed using --seed 42). So unless the random number generator changes (e.g. between python versions) or the subsets to be sampled change, it will always pick the same subset of tests. The script prints the combination to stdout (with --debug combinations), so that you can even verify locally if you like the particular random sample you got.

@mpusz
Copy link
Owner

mpusz commented Nov 13, 2024

So all it takes is a call to sample_combinations(formatting="std::format", contracts="none", config=configs["Clang-18 (x86-64)"]), so it samples at least one from that subset.

I think cxx_modules="True" is also needed? Also, it should probably be clang-18+, but clang-19 is buggy now. Could you please add such a configuration to the configs that handle import_std?

conan-config was always being set to the empty string already before

Yes, it was needed before but not anymore. I left the option for the future, though.

it will always pick the same subset of tests

Do you mean that we can repro a specific CI run, or does it mean that every CI is the same and we are not testing other configurations?

@burnpanck
Copy link
Contributor Author

burnpanck commented Nov 13, 2024

I think cxx_modules="True" is also needed?

Currently, thats part of the config "axis", which also includes the compiler. I just copied/re-encoded the "config" configurations we had before. Apparently, cxx_modules was enabled for all clang >= 17, and it still is. We could change that into another matrix "axis" though, and add a supports_cxx_module informative property to the configuration.

Could you please add such a configuration to the configs that handle import_std?

Done: #637.

Do you mean that we can repro a specific CI run, or does it mean that every CI is the same and we are not testing other configurations?

Every CI is the same; we're not testing other configurations. I considered that preferrable for reproducibility reasons, but it would be easy to change.

@mpusz
Copy link
Owner

mpusz commented Nov 13, 2024

We could change that into another matrix "axis" though, and add a supports_cxx_module informative property to the configuration.

We can rename cxx_modules to be consistent with std::format indeed. Also, we could provide a similar property for import_std and enable it only for clang-18+. With time gcc and MSVC will support modules but may still have problems with import_std so having this as a separate informative property could be useful then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit a number of configuration tested in CI
3 participants