-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
after helm install gpu-operator, no kata-qemu-nvidia-gpu runtimeclass, only kata-nvidia-gpu #59
Comments
@zvonkok Hi, I am a colleague of @acblbtpccc , we are trying to reproduce the steps of the documentation provided by nVidia directly here: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-kata.html Sorry for the bump on an old issue, I think we could have done better introducing ourselves ^^; Would you have a few minutes to spare to give us some pointers on what we obviously did wrong on this? @zvonkok your help would be greatly appreciated, thank you so much in advance! |
@zvonkok I hope this message finds you well. I wanted to bring to your attention that I've opened a related issue kata-containers/kata-containers#10360 when attempting to run directly from Kata Containers with GPU passthrough. I would greatly appreciate if you could take a look at this issue when you have a moment. I'm looking forward to your insights and thank you in advance for your time and expertise. Additionally, I watched your interview videos on Youtube, which were very informative. If possible, would you be willing to share the environment configuration you used? This would be incredibly helpful for us to reference when trying to reproduce the setup. Thank you again for your consideration and assistance. |
Hi Christopher, I noticed your comments in this issue. Are these artifacts still not open now? Does this mean we are still unable to reproduce the results in the official docs? We are looking forward to your insights regarding some challenges we've encountered while using GPU-Operator with Kata. Your expertise would be greatly appreciated. Thank you in advance for your time and assistance. /cc @goutnet |
OS: Ubuntu 20.04
CPU: AMD EPYC 9354
GPU: NVIDIA RTX A6000 * 8
I have already labeled the node, (master and worker on same machine)
If I use kata-qemu-nvidia-gpu(which is included in the docs for 24.3.0), the pod cannot start
If I use kata-nvidia-gpu(which is not in the docs for 24.3.0) runtimeclass, the output is as follows:
After compare the helm manifest, I guess that the difference may due to the kata-manager version.
The helm commands used is
The results above seems indicate that the docs is for kata-manager v0.1.0 rather than kata-manager v0.2.0, may I ask is there any documents for kata-manager v0.2.0? Or can I downgrade to kata-manager v0.1.0?
The text was updated successfully, but these errors were encountered: