Skip to content
This repository has been archived by the owner on Aug 19, 2024. It is now read-only.

Tests hang: possible race condition #285

Open
kammoh opened this issue Mar 30, 2021 · 4 comments
Open

Tests hang: possible race condition #285

kammoh opened this issue Mar 30, 2021 · 4 comments
Assignees

Comments

@kammoh
Copy link
Contributor

kammoh commented Mar 30, 2021

I'm seeing this happening in both Treadle and Verilator backends: When running long tests with ~10000 I/O elements, the test usually hangs (forever).
When running for ~100 I/O elements the tests usually conclude with no issue.

My test harness looks like this:

        fork {
          dut.io.in.enqueueSeq(inputs)
        }.fork {
          dut.io.cmd.enqueueSeq(instructions)
        }.fork {
          dut.io.out.expectDequeueSeq(expected)
        }.join()

I tried to do some printf debugging and it seems the tests hang at a clock .step(1) call in DecoupledDriver/Monitor. I also tried to look at the threads on JVM debugger and, they seem to deadlock at either a step() or stepAndJoin() call.
On the Treadle backend, I see threads locked with a call stack something like this:

Unsafe.park(boolean,long)[native method] (myproject/.metals/readonly/java.base/jdk/internal/misc/Unsafe.java:-1)
LockSupport.park(Object) (myproject/.metals/readonly/java.base/java/util/concurrent/locks/LockSupport.java:211)
AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer$Node,int,boolean,boolean,boolean,long) (myproject/.metals/readonly/java.base/java/util/concurrent/locks/AbstractQueuedSynchronizer.java:714)
AbstractQueuedSynchronizer.acquireSharedInterruptibly(int) (myproject/.metals/readonly/java.base/java/util/concurrent/locks/AbstractQueuedSynchronizer.java:1046)
Semaphore.acquire() (myproject/.metals/readonly/java.base/java/util/concurrent/Semaphore.java:318)
ThreadedBackend.doJoin(Seq,Option) (myproject/.metals/readonly/chiseltest/internal/ThreadedBackend.scala:680)
ThreadedBackend.doJoin$(ThreadedBackend,Seq,Option) (myproject/.metals/readonly/chiseltest/internal/ThreadedBackend.scala:658)
TreadleBackend.doJoin(Seq,Option) (myproject/.metals/readonly/chiseltest/backends/treadle/TreadleBackend.scala:15)
TesterThreadList.joinAndStep(Clock) (myproject/.metals/readonly/chiseltest/internal/BackendInterface.scala:21)
DecoupledDriver.$anonfun$expectDequeue$1(DecoupledDriver,Data) (myproject/.metals/readonly/chiseltest/DecoupledDriver.scala:96)
DecoupledDriver$$Lambda$2931/0x00000008012f6520.apply$mcV$sp() (Unknown Source:-1)
TreadleBackend.doTimescope(Function0) (myproject/.metals/readonly/chiseltest/backends/treadle/TreadleBackend.scala:112)
package$.timescope(Function0) (myproject/.metals/readonly/chiseltest/package.scala:290)
DecoupledDriver.expectDequeue(Data) (myproject/.metals/readonly/chiseltest/DecoupledDriver.scala:87)
DecoupledDriver.$anonfun$expectDequeueSeq$2(DecoupledDriver,Data) (myproject/.metals/readonly/chiseltest/DecoupledDriver.scala:112)
DecoupledDriver.$anonfun$expectDequeueSeq$2$adapted(DecoupledDriver,Data) (myproject/.metals/readonly/chiseltest/DecoupledDriver.scala:111)
DecoupledDriver$$Lambda$2930/0x00000008012f6140.apply(Object) (Unknown Source:-1)
Stream.foreach(Function1) (myproject/.metals/readonly/scala/collection/immutable/Stream.scala:533)
DecoupledDriver.$anonfun$expectDequeueSeq$1(DecoupledDriver,Seq) (myproject/.metals/readonly/chiseltest/DecoupledDriver.scala:111)
DecoupledDriver$$Lambda$2929/0x00000008012f5e50.apply$mcV$sp() (Unknown Source:-1)

The behaviour is quite non-deterministic, which I guess makes the case for a race condition stronger.
I've not not been able to come up with a "Minimum Working Example" and unfortunately the code has not been made public yet, but I'm pretty sure this is an issue with chiselTest and not related to my particular tests.

EDIT:

Minimum Working Example

The following test hangs for large number of inputs:

class Mwe extends Module {
  val io = IO(new Bundle {
    val in  = Flipped(Decoupled(UInt(16.W)))
    val out = Decoupled(UInt(16.W))
  })
  io.out <> Queue(io.in)
}

class MweSpec extends FlatSpec with ChiselScalatestTester {

  behavior.of("MWE Bug")

  val inputs = Seq.tabulate(8000) { i => i.U }

  it should "conclude" in {
    test(new Mwe) { dut =>
      dut.io.in.initSource().setSourceClock(dut.clock)
      dut.io.out.initSink().setSinkClock(dut.clock)

      fork {
        dut.io.in.enqueueSeq(inputs)
      }.fork {
        dut.io.out.expectDequeueSeq(inputs)
      }.join()
    }
  }
}

https://scastie.scala-lang.org/kammoh/7kDyWnXaSOGcYcT9Qgnf4A/2

@kammoh
Copy link
Contributor Author

kammoh commented Mar 30, 2021

This happens on both "0.3.2" release and the master branch.

@kammoh
Copy link
Contributor Author

kammoh commented Mar 31, 2021

I was actually able to observe it in the following simple MWE. If the number of inputs is small (<1000) it will most probably be fine. With larger number of inputs (>8000) it always hangs on my machine.

class Mwe extends Module {
  val io = IO(new Bundle {
    val in  = Flipped(Decoupled(UInt(16.W)))
    val out = Decoupled(UInt(16.W))
  })
  io.out <> Queue(io.in)
}

class MweSpec extends FlatSpec with ChiselScalatestTester {

  behavior.of("MWE Bug")

  val annos = Seq(
    // VerilatorBackendAnnotation,
    // WriteVcdAnnotation,
  )

  val inputs = Seq.tabulate(8000) { i => i.U }

  it should "conclude" in {
    test(new Mwe).withAnnotations(annos) { dut =>
      dut.io.in.initSource().setSourceClock(dut.clock)
      dut.io.out.initSink().setSinkClock(dut.clock)

      fork {
        dut.io.in.enqueueSeq(inputs)
      }.fork {
        dut.io.out.expectDequeueSeq(inputs)
      }.join()
    }
  }
}

@kammoh
Copy link
Contributor Author

kammoh commented Mar 31, 2021

I guess also somewhat reproducible on Scastie: https://scastie.scala-lang.org/kammoh/7kDyWnXaSOGcYcT9Qgnf4A/2
With number of inputs <= 7000, test passes with no issue.

Playground$MweSpec:
MWE Bug
Elaborating design...
Done elaborating.
Starting Test
Test Finished!
test Mwe Success: 0 tests passed in 7003 cycles in 16.966041 seconds 412.77 Hz
- should conclude

Increasing to >=8000, sometimes concludes (in total time < 20 seconds), but sometimes times out.

timed out after 30 seconds when running code

Mostly timing out with >=18000.

@ducky64
Copy link
Member

ducky64 commented Mar 31, 2021

Strange. I don't know what's going on but:

  • As architected, chiseltest shouldn't actually be running anything in parallel. Chiseltest spawns threads only to track execution context, but threads are scheduled to run one at a time.
  • enqueueSeq and expectDequeueSeq should both be fully serial - that is, each should not create more than one subthread, and it blocks on the completion of a sub-thread before spawning another.

If you wanted to dig further:

  • It's normal for threads to be blocked on a step or joinAndStep call - those are effectively scheduler invocations. But a scheduler invocation should mean that another thread is unblocked.
  • Do you know where inside step or joinAndStep the threads are blocked on?
  • Are there any exceptions thrown from any of the threads?

The one thing that might be a concern is that the scheduler releases the next thread semaphore before acquiring its own semaphore, and this isn't an atomic operation. It might be possible that inbetween, all the other threads run, and the original thread's semaphore is released (by another thread) before it is acquired, causing deadlock.

That being said, I'm not sure what the recommended way to do an atomic thread switch is, since this generally isn't what synchronization structures are designed to do. A mildly hacky solution might be to check the currently active thread (and having that be a synchronized variable) before acquiring its own semaphore - so thinking of the semaphore as more of an optimization than the primary way to schedule threads.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants