-
Notifications
You must be signed in to change notification settings - Fork 75
Tests hang: possible race condition #285
Comments
This happens on both "0.3.2" release and the master branch. |
I was actually able to observe it in the following simple MWE. If the number of inputs is small (<1000) it will most probably be fine. With larger number of inputs (>8000) it always hangs on my machine. class Mwe extends Module {
val io = IO(new Bundle {
val in = Flipped(Decoupled(UInt(16.W)))
val out = Decoupled(UInt(16.W))
})
io.out <> Queue(io.in)
}
class MweSpec extends FlatSpec with ChiselScalatestTester {
behavior.of("MWE Bug")
val annos = Seq(
// VerilatorBackendAnnotation,
// WriteVcdAnnotation,
)
val inputs = Seq.tabulate(8000) { i => i.U }
it should "conclude" in {
test(new Mwe).withAnnotations(annos) { dut =>
dut.io.in.initSource().setSourceClock(dut.clock)
dut.io.out.initSink().setSinkClock(dut.clock)
fork {
dut.io.in.enqueueSeq(inputs)
}.fork {
dut.io.out.expectDequeueSeq(inputs)
}.join()
}
}
} |
I guess also somewhat reproducible on Scastie: https://scastie.scala-lang.org/kammoh/7kDyWnXaSOGcYcT9Qgnf4A/2
Increasing to >=8000, sometimes concludes (in total time < 20 seconds), but sometimes times out.
Mostly timing out with >=18000. |
Strange. I don't know what's going on but:
If you wanted to dig further:
The one thing that might be a concern is that the scheduler releases the next thread semaphore before acquiring its own semaphore, and this isn't an atomic operation. It might be possible that inbetween, all the other threads run, and the original thread's semaphore is released (by another thread) before it is acquired, causing deadlock. That being said, I'm not sure what the recommended way to do an atomic thread switch is, since this generally isn't what synchronization structures are designed to do. A mildly hacky solution might be to check the currently active thread (and having that be a synchronized variable) before acquiring its own semaphore - so thinking of the semaphore as more of an optimization than the primary way to schedule threads. |
I'm seeing this happening in both Treadle and Verilator backends: When running long tests with ~10000 I/O elements, the test usually hangs (forever).
When running for ~100 I/O elements the tests usually conclude with no issue.
My test harness looks like this:
I tried to do some printf debugging and it seems the tests hang at a clock
.step(1)
call in DecoupledDriver/Monitor. I also tried to look at the threads on JVM debugger and, they seem to deadlock at either astep()
orstepAndJoin()
call.On the Treadle backend, I see threads locked with a call stack something like this:
The behaviour is quite non-deterministic, which I guess makes the case for a race condition stronger.
I've not not been able to come up with a "Minimum Working Example" and unfortunately the code has not been made public yet, but I'm pretty sure this is an issue with chiselTest and not related to my particular tests.EDIT:
Minimum Working Example
The following test hangs for large number of inputs:
https://scastie.scala-lang.org/kammoh/7kDyWnXaSOGcYcT9Qgnf4A/2
The text was updated successfully, but these errors were encountered: