GPU lanes requirement - will 4x do? #122

ashishpandey · 2023-09-30T04:09:13Z

ashishpandey
Sep 30, 2023

I have been very excited by this project and have been following for a few weeks

I am about to try a setup with GTX 1070 on a headless unraid server I have. Will passthrough the GPU to a docker container and play around

The only problem - I only have a PCI-e 3.0 4x and a PCI-e 2.0 4x slot free on the server. How much of the bus is saturated for WIS use case? Is a GTX 1070 running on 4x connection still manage to cope, or is the workload chatty between the GPU and the rest of the system?

I might move to a dedicated appliance when things start working, but for now would prefer to not expand my hardware footprint if possible

Answered by kristiankielhofner

Sep 30, 2023

Bus utilization for Willow and WIS use cases is extremely low. What you effectively end up doing is passing a few seconds of audio worth of data to the model on the GPU and potentially a few seconds of audio back from the TTS model (if configured). We also aggressively cache TTS responses so any invocations after the first matching text string are provided by the nginx frontend proxy and don't even get to WIS itself. In practice these responses end up being nearly instant.

While all of my cards are full x16 PCIe 3.0/4.0 I doubt even x1 PCIe of any version is going to make much of a difference but we'd love to see the performance stats with this configuration to validate this theory!

Just …

View full answer

kristiankielhofner · 2023-09-30T11:58:25Z

kristiankielhofner
Sep 30, 2023
Maintainer

Bus utilization for Willow and WIS use cases is extremely low. What you effectively end up doing is passing a few seconds of audio worth of data to the model on the GPU and potentially a few seconds of audio back from the TTS model (if configured). We also aggressively cache TTS responses so any invocations after the first matching text string are provided by the nginx frontend proxy and don't even get to WIS itself. In practice these responses end up being nearly instant.

While all of my cards are full x16 PCIe 3.0/4.0 I doubt even x1 PCIe of any version is going to make much of a difference but we'd love to see the performance stats with this configuration to validate this theory!

Just to make sure, there also seems to be a persistent impression that WIS is GPU only. It's also among the fastest speech to text on CPU implementations around. The issue is that these kinds of use cases are significantly more suited to the fundamental architectural advantages provided by GPU. If you look at the benchmarks you'll see that your ~$100 six year old GTX 1070 stomps all over even recent high end CPUs.

We emphasize the GPU approach because it is (currently) the only way to achieve commercial voice assistant competitive response times. In fact, with GPU, WIS, and Willow completely hosted locally you will see response times significantly better than commercial alternatives. With your GTX 1070 you should see a response time from end of speech to output provided to your configured command endpoint well within our target goal of 500ms. On the other end, I routinely use a GTX 3090 for development and testing (I have GTX 1060 and 1070 as well) and I see response times including Home Assistant completing the command in under 250ms - with roughly half of that being Home Assistant itself (WIS inference times for typical speech commands is roughly 50ms).

That said, you can get started with WIS today on CPU, get your GPU configured, and WIS will automatically utilize the GPU on the next WIS startup.

Welcome to Willow!

0 replies

ashishpandey · 2023-09-30T14:34:09Z

ashishpandey
Sep 30, 2023
Author

Bus utilization for Willow and WIS use cases is extremely low.

Thanks for confirming

we'd love to see the performance stats with this configuration to validate this theory!

If I manage to run it on x4, I will report back. I only play on weekends and dont have the GPU at hand, so might be some time

That said, you can get started with WIS today on CPU, get your GPU configured, and WIS will automatically utilize the GPU on the next WIS startup.

Yes, the CPU benchmarks caught my eye, hence the question. My plan is exactly this - start now with CPU, then drop in a GPU a bit later. But if for some reason x4 was a no go, I would have rather started on another CPU than the one I plan to right now

Thanks for the amazing project

5 replies

kristiankielhofner Sep 30, 2023
Maintainer

No worries!

We've had some pushback on the "emphasize GPU" approach (especially with Nvidia) but practically speaking it comes down to the physical fundamentals. It's not that our implementation is slow on CPU (quit the contrary - as I said it's among the fastest) but there's just no magic to get around the fact that a GTX 1070 has 1920 cores and 256 GB/s of memory bandwidth. When you're trying to performantly transcribe arbitrary human speech that architecture comes in handy ;).

jhergeth Nov 13, 2023

I am planning to do the same (old GTX1080 in an old PC as a headless server) and I am also interested in what the recommended/minimum specs for the (old) PC are.

So PCIe-performance is not an issue. What about the rest? Does the used OS (Ubuntu LTS? other suggestions?) totally specify the specs for the PC/Server/whatever?

May be a raspberry PI4/5 with a "small" PCIe slot for the GPU would be enough. The GPU doing the number crunching, the PI doing the data transfer and connectivity. This might also be an interesting choice from a power consumption perspective.

ashishpandey Nov 13, 2023
Author

@jhergeth I have been running WIS in a docker container on a xeon based server, with only 2 processor cores allocated and a GTX 1070 passes through. The CPU usage is minimal. I could run GPU on a 4x socket as discussed above, but have since moved to a full 16x socket through some hardware changes. I am currently running this on an unraid OS (slackware based), so I am assuming having working CUDA is the key rather than a specific OS

A raspberry pi is unlikely to work, as far as I know there are no CUDA drivers for pi. On CPU only, the pi is too weak as shown in benchmarks in readme

jhergeth Nov 13, 2023

@ashishpandey Of course, I forgot CUDA! So no Raspberry (would have been interesting).
But the basic question remains. What are the minimum specs for the host? Does it need lots of memory or CPU or ... nothing ...?

kristiankielhofner Nov 14, 2023
Maintainer

With WIS and GPU the CPU requirements are minimal. CPU still has an impact on speed of course but with GPU the actual difference is negligible. WIS uses a lot of libraries and has a lot of functionality; it currently uses about 4GB of RAM with TTS.

Depending on your skill level and experience we often recommend PopOS LTS (22.04) for new users. The Nvidia install image "just works" and the Pop team manages the Nvidia driver to make sure it's always completely in sync with the managed kernel and the rest of the OS. Otherwise Ubuntu LTS (22.04) is more-or-less the same once you install the Nvidia driver.

We recommend Nvidia driver 535.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU lanes requirement - will 4x do? #122

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

GPU lanes requirement - will 4x do? #122

ashishpandey Sep 30, 2023

Replies: 2 comments · 5 replies

kristiankielhofner Sep 30, 2023 Maintainer

ashishpandey Sep 30, 2023 Author

kristiankielhofner Sep 30, 2023 Maintainer

jhergeth Nov 13, 2023

ashishpandey Nov 13, 2023 Author

jhergeth Nov 13, 2023

kristiankielhofner Nov 14, 2023 Maintainer

ashishpandey
Sep 30, 2023

Replies: 2 comments 5 replies

kristiankielhofner
Sep 30, 2023
Maintainer

ashishpandey
Sep 30, 2023
Author

kristiankielhofner Sep 30, 2023
Maintainer

ashishpandey Nov 13, 2023
Author

kristiankielhofner Nov 14, 2023
Maintainer