-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integration/soc/add_sdcard: switch to edge-triggered cmd-done irq #1787
Conversation
The current level-triggered command completion irq stays asserted until a new command is launched, requiring the software driver to disable and re-enable the cmd-done interrupt for each request. Switching to an edge-triggered cmd-done interrupt will allow for a more elegantly written software driver that can also easily use data/dma completion interrupts. Updates: e6765f8 Signed-off-by: Gabriel Somlo <[email protected]>
@paulusmack After talking to @Dolu1990 I realized the current Linux driver for LiteSDCard only uses interrupt support for command completion, and still polls for data/DMA completion. In my attempt to figure out how to also support DMA completion IRQs for LiteSDCard, I realized that the level-triggered nature of the current interrupt, along with the fact that it stays asserted, in effect, "forever" (from completion until the setup of the next command, however long that is), causes the current driver to have to do contorted stuff like this. Since there's no clear rule about which of a LiteSDCard command and its associated concurrent DMA data transfer would finish first, adding and removing "currently enabled" IRQs is very race-y, with lots of opportunity for TOCTOU bugs to happen. So IMHO the only reasonable way forward is to switch the command-completion IRQ to edge-triggered, which will then allow us to have all IRQs (command, dma-read, dma-write) enabled all the time, and rewrite the driver (see litex-hub/linux#15) to take advantage of all IRQs provided by the gateware. Please LMK what you think. Once the gateware updates (litesdcard, litex) are in place, I'm planning to submit litex-hub/linux#15 to LKML and push dma irq support into mainline. Any feedback much appreciated -- Thanks! |
[Replying here as well as in email]
I'll try to find some time to instrument the driver to find out how long we are typically spending doing the polling. (It may of course depend on the specific SD card being used.) I think what I remember from looking at traces of the SD bus is that the card usually didn't signal command completion until the data transfer was complete or almost complete, but I may be wrong (that was a couple of years ago).
Like I said in email, as far as I can see it's a single register write either way (either to clear the pending bit for edge-triggered, or to disable the interrupt if level-triggered), so I don't see how that is "contorted".
Sorry, I don't get this. There would be a possible race if the updates to the IRQ_ENABLE register were done in a read-modify-write fashion, but they aren't; we just write one of two values to it. There is also a potential race between disabling/enabling interrupts and the event occurrence for edge-triggered interrupts, but not for level-triggered interrupts. That's one of the reasons why I prefer level-triggered interrupts. With edge-triggered interrupts you always have to check after enabling interrupts whether the event occurred while you had interrupts disabled, whereas you don't have to check for level-triggered interrupts. And the check is generally a register read, which is usually slow. I also don't get the reference to TOCTOU. That's generally a security-related term, and I don't see any security implications here.
I would want to see performance measurements to check that we aren't actually slowing things down by always using interrupts. My great preference is to provide as much backwards and forwards compatibility as is reasonably possible without overly complicating things. As I said in email, I'm not at all a fan of the model where everything has to be kept in perfect sync by compiling the entire stack from top to bottom every time anything anywhere gets changed.
My experience has generally been that edge-triggered interrupts usually make drivers harder to get right, not easier. (And that experience does go back quite a way, over many different kinds of systems.) The alternative to level-triggered interrupts that does seem to work well and efficiently is to use a message-based system that has in-memory queues of commands and responses, with effectively edge-triggered interrupts on message delivery. That avoids races by having a record in memory of what events have occurred. But that kind of system tends to be quite a lot more complicated in hardware than litesdcard is. |
I ran some quick tests, using nexys-video, litex, and a 4-core rocket-chip running at 50MHz. I read a 32.4M file I had handy on the sdcard, and wrote it back, to get a rough idea of my read and write speeds:
for read speed, and
for write speed. The results (in KBytes/second):
If all you care about is command-completion IRQ support, then yes, turning it on and off is as easy as "enabling"
That's why we should prefer enabling all interrupts at the time the driver is loaded, and keep them enabled throughout. If we re-arm our completion before programming the sdcard gateware to initiate a command and/or data transfer, and wait for completion after doing that, we won't ever miss anything, and won't have to worry about what happens if we mix and match edge and level triggered interrupts: see patch 5/10 and patch 6/10 of the proposed series.
It's time-of-check vs. time-of-use (I was thinking about having to read whether the command IRQ is enabled or not before I write to the enable register to turn on other IRQs, such as dma-read or dma-write). If you want to turn something on or off, you need to know whether other things are themselves on or off, to avoid turning them off when they're supposed to be on or vice versa. That means reading the enable register, and-ing (or or-ing it) with something, then writing it back. Alternatively, we could maintain a variable that contains the current set of enabled IRQs and modify, then write it out whenever we want to change what IRQs are enabled, but once again, I humbly propose that my way is just cleaner... :)
So in my case (50MHz rv64gc rocket-chip) we're not slowing them down significantly, they're in the margin of error.
So as far as I'm concerned, the choices are:
I'm definitely NOT up for maintaining two LiteSDCard drivers, or something that tries to work with both "versions" of the gateware (It's F/OSS gateware, after all, and I expect the set of people both unwilling to update the gateware and adamant about running the latest kernel to be extremely small :)
I am definitely familiar with your significant history with Linux (have seen your email and name fly by numerous times as I was slowly working my way up to submit my first patch over the course of many years) :) But I think after 5/10 and 6/10 (as linked above) are applied (after litex is modified to generate edge irqs for commands), the driver remains rock solid by
Could I ask you for (yet another) huge favor, to actually have a look at the proposed Linux driver changes, and to LMK what you think I'm missing when I claim they're making it all cleaner than it was ? :) Many thanks, |
I forgot to include actual polling results (for 50MHz 4-core rocket on nexys video, results in KBytes/second):
With the current value of I've written a whole novel in my previous comment, so please allow me to re-iterate the TL;DR part of my point re. edge-triggered interrupts: If they're always on, and our driver operates like this:
Then there's no way to miss an interrupt, and the polling loop will consist of a single, successful check. If irq is off, the polling loop will spin until done. but there's no path through which there could be an unexpected IRQ, since we're ready for it (reinit_completion) even before the hardware is told to do anything that could generate one. Thanks, |
Hi @gsomlo Thanks ^^ |
On Fri, Oct 13, 2023 at 04:27:55AM -0700, Dolu1990 wrote:
50MHz 4-core rocket on nexys video
Do you know the configuration of that SoC ? (l1 i$ d$ size, internal L2 $size)
d-cache-block-size = <64>;
d-cache-sets = <64>;
d-cache-size = <16384>;
i-cache-block-size = <64>;
i-cache-sets = <64>;
i-cache-size = <16384>;
that's per each core. The L2 cache is currently not enabled.
|
@paulusmack I've re-worked the linux patches here: litex-hub/linux#16, with 5/10 and 6/10 being even more precise and clear in what they're trying to accomplish. I guess the other argument to be made is that all other interrupts generated by LiteX blocks (including the dma completion ones for LiteSDCard) are edge-triggered, so whether level-triggered IRQs are philosophically better in the big scheme of things, having just one "stick out" by bucking the trend is a complication we might be better off avoiding :) Once again, I value your opinion a lot, and would really like to be able to convince you... :D Thanks, |
I'll take a look, and I am also instrumenting the driver to let me look at the distribution of how long commands take and how long we are polling for data transfer/DMA completion. It will probably take a few days before I can comment in more detail, since I do have some other things occupying my time at the moment.
If the way it works is that there is an interrupt pending bit that is set by the occurrence of an event (such as command completion or transfer finished), and the device persists in interrupting until the pending bit is cleared, then I am happy with that. You can call it "edge triggered" on the event, and I can call it "level triggered" on the pending bit, and we're both happy. :) |
Yes, unless one acknowledges the specific "edge" interrupt by writing to the respective bit of the pending register In the end, that can be done with a single register write for all interrupts handled during one invocation of the handler, like this I very much appreciate your "pre-review" and advance feedback (before final decision to actually make the change in litex and then submit all this driver stuff to LKML) -- Thanks! |
Here are some observations from instrumenting litex_mmc_send_cmd(). I added MMIO writes to set and clear some GPIO bits at various points and observed the GPIO outputs, along with the interrupt request from the litesdcard interface, with an oscilloscope. I did the same tests that you mentioned above using a 32MiB file (32768 kiB), copying the file to and from an ext2 partition (12GiB in size) on a Sandisk 32GB "high endurance" micro-SD card. The CPU is a 100MHz microwatt CPU with 8kiB icache and 8kiB dcache (both 2-way set associative). The SD card clock is 25MHz. Reading the 32768kiB consistently takes 11 seconds (to the nearest second, which is the best one can do using the date command), i.e. 3000 kiB/s. Writing the 32768kiB takes about 25 seconds, i.e. 1300 kiB/s. Of the commands that don't use interrupts, i.e. those for which transfer == SD_CTL_DATA_XFER_NONE and response_len != SD_CTL_RESP_SHORT_BUSY, some are very quick, taking only a few microseconds, but most of them take between 75 and 90 microseconds. The commands that do use interrupts take between 400 and 700 microseconds when reading and between about 1.75 and 2 ms when writing. This is from when the command is issued to the point when the interrupt request is raised. The latency from when the interrupt request is raised to when litex_mmc_send_cmd gets to the point just after the first litex_mmc_sdcard_wait_done() call is mostly about 120 microseconds, but varies and can be as high as 500 microseconds. The second call to litex_mmc_sdcard_wait_done() (the one waiting on LITEX_CORE_DATAEVT) and the call to readx_poll_timeout() to wait for DMA completion always take 2 to 3 microseconds each. I never saw any case where they took a long time (i.e. 10 microseconds or more). Given the latencies that I'm seeing here, I think that using interrupts more aggressively is going to slow things down. That could be dependent on the particular SD card and CPU speed that I'm using, but with a slower CPU (such as you are using) I would think that adding more trips through an interrupt handler and the scheduler would have a bigger impact. Anyway, I'll try your new version of the driver and see what happens to the transfer rates. |
@paulusmack One question about the linux kernel configuration, CONFIG_SMP is off ? If i remember well, it was introducing quite some overhead to have it on when using network/filesystem, but i tested it a very long time ago XD Maybe the best would be to just add a dts configuration to tell if :
Then there would be no compromise <3 |
CONFIG_SMP is off in my setup, since I only have one core. Regarding using the DTS to configure the behaviour, we can do that if necessary, but the disadvantage is that it does increase the amount of testing needed. |
@gsomlo with your proposed change, how does the command-done irq pending bit work? In particular, does anything happen to it when the irq enable bit is cleared? It seems to me that if the behaviour was such that the pending bit is cleared when a zero is written to the enable bit, then the old driver will continue to work. (I mean that the pending bit is cleared once at the point where the write to the enable register is done, not that the pending bit is forced to zero while ever the enable bit in the register is zero.) |
The old linux driver does indeed keep working after the gateware is switched to edge-irq for sdcard command completion (for a very "tolerant" definition of "working", that is: doing e.g. an md5 on a large file will result in lots of 110 errors and retries on cmd17 & cmd18, but will get the right md5sum in the end :) So if you care about running older kernels on up-to-date gateware, that should be possible (with the above caveat). What's not going to work is using the new kernel driver on gateware that's still using "level" IRQ for sdcard command completion -- no amount of writing to the pending bit for command-done will "shut up" the asserted "level", the only way to do so is to disable the interrupt (i.e., write 0 to the enable register for that bit). I guess it would be possible to treat the command IRQ the "old" way (level) and use edge-mode on all the rest, by applying this final patch on top of my entire series: diff --git a/drivers/mmc/host/litex_mmc.c b/drivers/mmc/host/litex_mmc.c
index baa0f62ddc01..ed829067e90a 100644
--- a/drivers/mmc/host/litex_mmc.c
+++ b/drivers/mmc/host/litex_mmc.c
@@ -137,8 +137,12 @@ static int litex_mmc_send_cmd(struct litex_mmc_host *host,
cmd << 8 | transfer << 5 | response_len);
litex_write8(host->sdcore + LITEX_CORE_CMDSND, 1);
- if (host->irq > 0)
+ if (host->irq > 0) {
+ /* enable CMD_DONE irq *AFTER* setting up command,
+ * which clears the *previous* command's asserted LEVEL
+ */
+ litex_write32(host->sdirq + LITEX_IRQ_ENABLE,
+ SDIRQ_CARD_DETECT | SDIRQ_CMD_DONE |
+ SDIRQ_SD_TO_MEM_DONE | SDIRQ_MEM_TO_SD_DONE);
wait_for_completion(&host->cmd_done);
+ }
ret = litex_mmc_sdcard_wait_done(host->sdcore + LITEX_CORE_CMDEVT, dev);
if (ret) {
@@ -262,6 +266,9 @@ static irqreturn_t litex_mmc_interrupt(int irq, void *arg)
/* Check for command completed */
if (pending & SDIRQ_CMD_DONE) {
handled |= SDIRQ_CMD_DONE;
+ /* Disable (level-triggered) CMD_DONE IRQ to prevent it from firing */
+ litex_write32(host->sdirq + LITEX_IRQ_ENABLE,
+ SDIRQ_CARD_DETECT |
+ SDIRQ_SD_TO_MEM_DONE | SDIRQ_MEM_TO_SD_DONE);
complete(&host->cmd_done);
}
@@ -534,10 +541,10 @@ static int litex_mmc_irq_init(struct platform_device *pdev,
/* Clear & enable interrupts */
litex_write32(host->sdirq + LITEX_IRQ_PENDING,
- SDIRQ_CARD_DETECT | SDIRQ_CMD_DONE |
+ SDIRQ_CARD_DETECT |
SDIRQ_SD_TO_MEM_DONE | SDIRQ_MEM_TO_SD_DONE);
litex_write32(host->sdirq + LITEX_IRQ_ENABLE,
- SDIRQ_CARD_DETECT | SDIRQ_CMD_DONE |
+ SDIRQ_CARD_DETECT |
SDIRQ_SD_TO_MEM_DONE | SDIRQ_MEM_TO_SD_DONE);
init_completion(&host->cmd_done); We absolutely can't enable the command-done IRQ before setting up the command (because the previous command's irq level is still high right up until the new command is launched!!!). So with this patch we'd be bending over backwards to keep the command-done IRQ handling compatible with a self-inflicted earlier design decision... :) I'd much rather see us clean it up and have a nice, elegant, uniform way of handling interrupts... |
Closing w/o merging: this change was motivated by my belief that using DMA completion IRQs in addition to the existing command completion ones would improve performance. However, data transfers are usually complete before the command completes, so IRQ handling and gateware behavior can (and therefore probably should) be kept as-is :) |
The current level-triggered command completion irq stays asserted until a new command is launched, requiring the software driver to disable and re-enable the cmd-done interrupt for each request.
Switching to an edge-triggered cmd-done interrupt will allow for a more elegantly written software driver that can also easily use data/dma completion interrupts.
Updates: e6765f8
Depends on enjoy-digital/litesdcard#31