Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EBPF-601] gpu: add function to retrieve visible devices for a process #30510

Merged
merged 14 commits into from
Nov 4, 2024

Conversation

gjulianm
Copy link
Contributor

@gjulianm gjulianm commented Oct 25, 2024

What does this PR do?

This PR adds code to the pkg/gpu/cuda package to filter the visible GPU devices for a process, using the value of the CUDA_VISIBLE_DEVICES environment variable. This is needed to enable correct multi-gpu support, as the process selects a device based on its index in the list of GPUs visible to it.

Motivation

Describe how to test/QA your changes

Possible Drawbacks / Trade-offs

Additional Notes

Copy link

cit-pr-commenter bot commented Oct 25, 2024

Regression Detector

@gjulianm gjulianm added changelog/no-changelog qa/done QA done before merge and regressions are covered by tests labels Oct 28, 2024
@agent-platform-auto-pr
Copy link
Contributor

agent-platform-auto-pr bot commented Oct 28, 2024

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=47855509 --os-family=ubuntu

Note: This applies to commit b39574d

@gjulianm gjulianm marked this pull request as ready for review October 28, 2024 13:03
@gjulianm gjulianm requested a review from a team as a code owner October 28, 2024 13:03
Copy link
Contributor

@val06 val06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed

pkg/gpu/cuda/env.go Show resolved Hide resolved
pkg/gpu/cuda/env.go Outdated Show resolved Hide resolved
pkg/gpu/cuda/env.go Outdated Show resolved Hide resolved
pkg/gpu/cuda/env.go Show resolved Hide resolved
pkg/gpu/cuda/env.go Show resolved Hide resolved
pkg/gpu/cuda/env.go Show resolved Hide resolved
pkg/gpu/cuda/env.go Show resolved Hide resolved
pkg/util/kernel/proc.go Outdated Show resolved Hide resolved
pkg/gpu/cuda/env_test.go Show resolved Hide resolved
pkg/util/kernel/proc.go Show resolved Hide resolved
@github-actions github-actions bot added the long review PR is complex, plan time to review it label Oct 29, 2024
@gjulianm
Copy link
Contributor Author

gjulianm commented Nov 4, 2024

/merge

@dd-devflow
Copy link

dd-devflow bot commented Nov 4, 2024

🚂 MergeQueue: pull request added to the queue

The median merge time in main is 22m.

Use /merge -c to cancel this operation!

@dd-mergequeue dd-mergequeue bot merged commit 3580727 into main Nov 4, 2024
291 checks passed
@dd-mergequeue dd-mergequeue bot deleted the guillermo.julian/cuda-visible-devices branch November 4, 2024 10:35
@github-actions github-actions bot added this to the 7.61.0 milestone Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog component/system-probe long review PR is complex, plan time to review it qa/done QA done before merge and regressions are covered by tests team/ebpf-platform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants