-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cargo QoL improvements #4228
cargo QoL improvements #4228
Conversation
Previously, only the `firecracker` binary could be run via cargo run. Now, we can also do `cargo run --bin cpu-template-helper` for example. I am keeping the jailer excluded from the `default-members` list, as this seems to have been an intentional choice made in 3bf285c. Signed-off-by: Patrick Roy <[email protected]>
Previously, it could only compile examples from the main firecracker package. To compile examples from package "x", we need to pass `-p x`. Signed-off-by: Patrick Roy <[email protected]>
This prevents them from being compiled if only `cargo build` is being run, where they really do not need to be compiled and only increase build time needlessly. Signed-off-by: Patrick Roy <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4228 +/- ##
==========================================
- Coverage 82.21% 81.75% -0.47%
==========================================
Files 236 236
Lines 29193 29187 -6
==========================================
- Hits 24001 23861 -140
- Misses 5192 5326 +134
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
This has |
I dont think this matters much, the tests are still there after all |
It seems this is always how we approach this. If we functionally ignore it by never taking action as a result of the metric. Why do we track it? Maybe we should remove it if it is just noise? This PR can act as a good example, we should clarify the story here. |
To me, we track code coverage to identify gaps in our testing. E.g. a low coverage percentage can be a hint that some PR is missing some unit tests, or the annotations that codecov adds to the PR diff shows some uncovered lines which then leads us to add more testing (this is how I discovered the initrd issues in #4073). I agree that looking at the metric in isolation is meaningless. To me its a case-by-case decision on every PR. In this specific case, the coverage drops, but there is no gap in testing, so I ignore it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the files were designed to be tests so changing them to examples doesn't add much value to it.
If improving the build time is the only reason why the tests were turned to examples then, could we add the details on by how much did the build improve? I don't mind an additional build time of few minutes if codecov can show me a 100%coverage.
The compile time impact to But independently of that, I feel fairly strongly about not exposing testing details as binary targets. Currently for example |
The build improvement is ~20secs which is too small to justify removing the tests however, I understand why having those weird binaries could cause a confusion and this concern was also raised when they were first added. I agree that the bins could cause confusion where they are right now but, replacing them with a python integration test goes against our code coverage policy. I suggest we create an issue to do the right thing (moving to correct place while maintaining coverage) but until then I don't think we should remove them. |
I want to stress that no tests are being removed in this PR. We are still exercising exactly the same code paths and checking exactly the same assertions. Here, we have a case of code being very hard to unit tests (we could mock the logger, which would also solve this problem, but even then we cannot properly check any assertions because the log messages contain thread ids, and if the logs arent generated by a separate process, these become non-deterministic). Whenever code is very hard to unit tests, and workarounds introduce drawbacks (such as accidentally vending dummy binaries to users, or adding significant complexity to the testing code, as Jonathan mentions his alternative would do), we should use integration testing for it. It is also a matter of consistency with the rest of our testing framework: UFFD and seccomp tests are cargo example that are driven by python integration tests (and when we did this, even promoting them to examples from crates hidden inside of our testing framework went through a lot of discussion because of the same point: It puts testing code into a more prominent place alongside production code). And we have also in the past decided that we value sensibility of our testing over the code coverage number, when we decided to remove "coverage generating" tests for debug implementations, because they added no functional value. Because the code coverage number is a proxy for the property that we actually care about: Code being tested. And if we only focus on the proxy and try to game the number, we lose track of that goal, and that is when it becomes useless.
I am not aware of us having any policy regarding code coverage (I know we decided against introducing static checks in our CI for it, because it was impossible to meaningfully phrase failure conditions that do not rule our valid usecases). We only have a team-set goal to reach 85% coverage, but even here I could argue that the 81.77% reported on the HEAD of this PR is a more accurate representation of our progress towards this goal: We currently include the source code of |
I would agree that tests that run the entire binary tool with files as inputs are integration tests by nature and should not be contributing into our unit test coverage metric. In some cases (like this one) unit testing is hard and it is fine to resort to integration tests only while having clear visibility of gaps in unit test coverage (which we are fine with). |
We might not have a written policy but from what I see we concluded with using unit test instead of integration test for code coverage in multiple decision discussions we had internally. The problem was not with this particular unit test being less efficient and more complex vs the integration test, it was which approach provides better visibility. I didn't see codecov as a proxy but as a tool which provides us a visibility and metric on coverage. Even if integration test in this case might provide us better testing we don't have a metric to know if the testing or coverage was reduced if someone modifies the test. I imagine it would be hard for someone to review all the corner cases if the interaction test was modified. Having said that, I also agree that we can never achieve 100% code coverage and with a good enough explanation as you provided it seems logical to go ahead with this change. BTW, thanks for being patient with me and explaining this in this detail. I really appreciate that you didn't take the easy path to just remove the commit and fought for the right thing. |
Changes
Turns example binaries in log-instrument into actual examples, and allows us to use
cargo run
to execute our helper toolsReason
I couldn't do
cargo run --bin cpu-template-helper
before + compile time improvements.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following
Developer Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
CHANGELOG.md
.TODO
s link to an issue.rust-vmm
.