GPU lanes requirement - will 4x do? #122
-
I have been very excited by this project and have been following for a few weeks I am about to try a setup with GTX 1070 on a headless unraid server I have. Will passthrough the GPU to a docker container and play around The only problem - I only have a PCI-e 3.0 4x and a PCI-e 2.0 4x slot free on the server. How much of the bus is saturated for WIS use case? Is a GTX 1070 running on 4x connection still manage to cope, or is the workload chatty between the GPU and the rest of the system? I might move to a dedicated appliance when things start working, but for now would prefer to not expand my hardware footprint if possible |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Bus utilization for Willow and WIS use cases is extremely low. What you effectively end up doing is passing a few seconds of audio worth of data to the model on the GPU and potentially a few seconds of audio back from the TTS model (if configured). We also aggressively cache TTS responses so any invocations after the first matching text string are provided by the nginx frontend proxy and don't even get to WIS itself. In practice these responses end up being nearly instant. While all of my cards are full x16 PCIe 3.0/4.0 I doubt even x1 PCIe of any version is going to make much of a difference but we'd love to see the performance stats with this configuration to validate this theory! Just to make sure, there also seems to be a persistent impression that WIS is GPU only. It's also among the fastest speech to text on CPU implementations around. The issue is that these kinds of use cases are significantly more suited to the fundamental architectural advantages provided by GPU. If you look at the benchmarks you'll see that your ~$100 six year old GTX 1070 stomps all over even recent high end CPUs. We emphasize the GPU approach because it is (currently) the only way to achieve commercial voice assistant competitive response times. In fact, with GPU, WIS, and Willow completely hosted locally you will see response times significantly better than commercial alternatives. With your GTX 1070 you should see a response time from end of speech to output provided to your configured command endpoint well within our target goal of 500ms. On the other end, I routinely use a GTX 3090 for development and testing (I have GTX 1060 and 1070 as well) and I see response times including Home Assistant completing the command in under 250ms - with roughly half of that being Home Assistant itself (WIS inference times for typical speech commands is roughly 50ms). That said, you can get started with WIS today on CPU, get your GPU configured, and WIS will automatically utilize the GPU on the next WIS startup. Welcome to Willow! |
Beta Was this translation helpful? Give feedback.
-
Thanks for confirming
If I manage to run it on x4, I will report back. I only play on weekends and dont have the GPU at hand, so might be some time
Yes, the CPU benchmarks caught my eye, hence the question. My plan is exactly this - start now with CPU, then drop in a GPU a bit later. But if for some reason x4 was a no go, I would have rather started on another CPU than the one I plan to right now Thanks for the amazing project |
Beta Was this translation helpful? Give feedback.
Bus utilization for Willow and WIS use cases is extremely low. What you effectively end up doing is passing a few seconds of audio worth of data to the model on the GPU and potentially a few seconds of audio back from the TTS model (if configured). We also aggressively cache TTS responses so any invocations after the first matching text string are provided by the nginx frontend proxy and don't even get to WIS itself. In practice these responses end up being nearly instant.
While all of my cards are full x16 PCIe 3.0/4.0 I doubt even x1 PCIe of any version is going to make much of a difference but we'd love to see the performance stats with this configuration to validate this theory!
Just …