after helm install gpu-operator, no kata-qemu-nvidia-gpu runtimeclass, only kata-nvidia-gpu #59

acblbtpccc · 2024-07-26T13:53:59Z

OS: Ubuntu 20.04
CPU: AMD EPYC 9354
GPU: NVIDIA RTX A6000 * 8

I have already labeled the node, (master and worker on same machine)

If I use kata-qemu-nvidia-gpu(which is included in the docs for 24.3.0), the pod cannot start

If I use kata-nvidia-gpu(which is not in the docs for 24.3.0) runtimeclass, the output is as follows:

After compare the helm manifest, I guess that the difference may due to the kata-manager version.

The helm commands used is

The results above seems indicate that the docs is for kata-manager v0.1.0 rather than kata-manager v0.2.0, may I ask is there any documents for kata-manager v0.2.0? Or can I downgrade to kata-manager v0.1.0?

acblbtpccc · 2024-08-18T18:43:27Z

Hi, I found that this problem is caused by the artifact image is not accessible now, which is needed by the k8s-kata-manager

May I ask any one have the rights to fix this?

goutnet · 2024-09-27T08:19:33Z

@zvonkok Hi, I am a colleague of @acblbtpccc , we are trying to reproduce the steps of the documentation provided by nVidia directly here:

https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-kata.html

Sorry for the bump on an old issue, I think we could have done better introducing ourselves ^^;

Would you have a few minutes to spare to give us some pointers on what we obviously did wrong on this?

@zvonkok your help would be greatly appreciated, thank you so much in advance!

acblbtpccc · 2024-09-29T10:38:22Z

@zvonkok
Hi Zvonkok,

I hope this message finds you well. I wanted to bring to your attention that I've opened a related issue kata-containers/kata-containers#10360 when attempting to run directly from Kata Containers with GPU passthrough. I would greatly appreciate if you could take a look at this issue when you have a moment. I'm looking forward to your insights and thank you in advance for your time and expertise.

Additionally, I watched your interview videos on Youtube, which were very informative. If possible, would you be willing to share the environment configuration you used? This would be incredibly helpful for us to reference when trying to reproduce the setup.

Thank you again for your consideration and assistance.

@goutnet

acblbtpccc · 2024-09-29T15:53:52Z

@cdesiniotis

Hi Christopher, I noticed your comments in this issue. Are these artifacts still not open now? Does this mean we are still unable to reproduce the results in the official docs?

We are looking forward to your insights regarding some challenges we've encountered while using GPU-Operator with Kata. Your expertise would be greatly appreciated.

Thank you in advance for your time and assistance.

/cc @goutnet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

after helm install gpu-operator, no kata-qemu-nvidia-gpu runtimeclass, only kata-nvidia-gpu #59

after helm install gpu-operator, no kata-qemu-nvidia-gpu runtimeclass, only kata-nvidia-gpu #59

acblbtpccc commented Jul 26, 2024 •

edited

Loading

acblbtpccc commented Aug 18, 2024

goutnet commented Sep 27, 2024

acblbtpccc commented Sep 29, 2024

acblbtpccc commented Sep 29, 2024 •

edited

Loading

after helm install gpu-operator, no kata-qemu-nvidia-gpu runtimeclass, only kata-nvidia-gpu #59

after helm install gpu-operator, no kata-qemu-nvidia-gpu runtimeclass, only kata-nvidia-gpu #59

Comments

acblbtpccc commented Jul 26, 2024 • edited Loading

acblbtpccc commented Aug 18, 2024

goutnet commented Sep 27, 2024

acblbtpccc commented Sep 29, 2024

acblbtpccc commented Sep 29, 2024 • edited Loading

acblbtpccc commented Jul 26, 2024 •

edited

Loading

acblbtpccc commented Sep 29, 2024 •

edited

Loading