Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable linux-x64 dev-innerloop #108581

Merged
merged 4 commits into from
Oct 13, 2024
Merged

Disable linux-x64 dev-innerloop #108581

merged 4 commits into from
Oct 13, 2024

Conversation

am11
Copy link
Member

@am11 am11 commented Oct 7, 2024

Lately, this leg just times out with pending status on GItHub:

##[error]
,##[error]Agent failed with exception: The machine running request aa3690cc-533c-40f0-8670-ebfaae75e388 restarted. Azure DevOps can't recover from restarts.
,##[warning]Received request to deprovision: The request was cancelled by the remote provider.

From dotnet/dnceng#3879, it is currently the only leg running into this issue.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 7, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 7, 2024
@am11 am11 added area-Infrastructure-coreclr community-contribution Indicates that the PR has been added by a community member and removed community-contribution Indicates that the PR has been added by a community member needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 7, 2024
Copy link
Contributor

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

@lewing lewing requested a review from agocke October 7, 2024 23:06
@steveisok steveisok requested a review from a team October 9, 2024 16:26
@steveisok
Copy link
Member

I'm fine with disabling until we either fix the mega-build into pieces or bump the memory.

@am11
Copy link
Member Author

am11 commented Oct 9, 2024

Fix the mega-build by breaking it into smaller pieces.

Cc: @ericstj. This is related to the OOM issue we encountered with the -allConfigurations leg running on Linux (see dotnet/dnceng#3879 (comment)).

Could we consider splitting the build into groups? For example, we could run multiple dotnet invocations and terminate the process after each one to manage memory usage more effectively.

@jkotas
Copy link
Member

jkotas commented Oct 9, 2024

This is related to the OOM issue we encountered with the -allConfigurations leg running

Do we know what consumes the memory and whether it is expected? We should check that this is not a symptom of a product issue.

@steveisok
Copy link
Member

This is related to the OOM issue we encountered with the -allConfigurations leg running

Do we know what consumes the memory and whether it is expected? We should check that this is not a symptom of a product issue.

@agocke suggested we may have broken something in main and to see how it behaves after #107772 went in.

@jkotas
Copy link
Member

jkotas commented Oct 9, 2024

behaves after #107772 went in.

linux-x64 dev-innerloop has been failing for like a month. It started a long before #107772 .

@am11
Copy link
Member Author

am11 commented Oct 9, 2024

LKG PR was merged 20 hours ago, since than I have seen multiple PRs running into the innerloop timeout issue. Not sure if that is related to -allConfigurations leg (which is purely about libs subset building all assemblies for every possible platform combination), but I might be missing something.

@agocke
Copy link
Member

agocke commented Oct 10, 2024

Yeah, I guess that wasn't it. However, I notice that release/9.0 runs seem to be succeeding. So I'm still worried this is a product issue of some sort.

@ericstj
Copy link
Member

ericstj commented Oct 10, 2024

Agree with what @jkotas says. There's no reason for us to think that the product shouldn't be capable of building the -allconfigurations leg. If we're hitting OOMs that's a symptom of a memory leak in the build processes.

@am11
Copy link
Member Author

am11 commented Oct 10, 2024

There's no reason for us to think that the product shouldn't be capable of building the -allconfigurations leg

CI machine is running out of memory after building assemblies for 50-60 mins. Locally allConfigurations build succeeds because I have 32 gigs on my linux box (and not many running processes). I don't think anyone was suggesting that the entire product is incapable..

@jkotas
Copy link
Member

jkotas commented Oct 10, 2024

CI machine is running out of memory after building assemblies for 50-60 mins

Right, it suggests that there is "leak" that accumulates to be big-enough to take the CI machine down with long-enough build. A single assembly takes no more than 10's seconds to build. All memory required to build given assembly should be released after we are done building it if there is a memory pressure.

@ericstj
Copy link
Member

ericstj commented Oct 11, 2024

I pulled down the log files here and the thing that looks suspect here is some hotreload tasks. I made a mention in the issue.
image

Totally non-scientific but my sampling of logs from passing builds always showed these tasks running around the time we see 95% memory warnings. Could be relevant.

This build leg isn't just building allConfigurations, it's also building all the tests. Not sure if that's relevant but it is a bit unusual. Probably gives the system the biggest load of concurrent builds of managed code we have to offer.

linux-x64 dev-innerloop has been failing for like a month.

Could it be correlated to us updating the SDK to RC1?

@am11
Copy link
Member Author

am11 commented Oct 13, 2024

@jkotas should we merge this until dotnet/dnceng#3879 is resolved since it's failure rate is high these days? If you think we should keep it enabled, lets close this PR and track it in the issue.

cc @dotnet/dotnet-hotreload-utils-admin

@jkotas
Copy link
Member

jkotas commented Oct 13, 2024

I think we should have this leg disabled against an issue in this repo that is specific to linux_x64_dev_innerloop

dotnet/dnceng#3879 looks too generic, not very actionable.

@am11
Copy link
Member Author

am11 commented Oct 13, 2024

Opened #108821.

@jkotas jkotas merged commit 7b69459 into dotnet:main Oct 13, 2024
144 of 149 checks passed
@am11 am11 deleted the patch-13 branch October 13, 2024 16:39
@steveisok
Copy link
Member

Next steps are figuring out how we're going to go about investigating and who is going to do it.

ericstj added a commit to ericstj/runtime that referenced this pull request Oct 14, 2024
ericstj added a commit that referenced this pull request Oct 14, 2024
* Revert "Disable linux-x64 dev-innerloop (#108581)"

This reverts commit 7b69459.

* Don't build tests in linux_x64_dev_innerloop

Testing to determine if building tests were the source of OOMs in CI
@github-actions github-actions bot locked and limited conversation to collaborators Nov 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants