How to port the SIMTight project to PYNQ-Z2 FPGA board? #16

Honourable-A · 2022-10-19T11:07:22Z

Hi, as you know DE10 pro FPGA is very costly board. I have a PYNQ-Z2 FPGA board (board details are here https://www.tulembedded.com/FPGA/ProductsPYNQ-Z2.html). I want to do some research on SIMTight. Please tell me how I can port the project to this low end board. I am asking about the exact steps because I am not an expert on FPGA design porting.

mn416 · 2022-10-20T08:24:02Z

Hi, unfortunately we don't currently have the resources to port or maintain a port to Xilinx devices. Perhaps the ability to run in simulation is still useful to you? I foresee three issues in porting:

Quad-port BRAMs available on Stratix 10 may have like-for-like replacements in modern Xilinx devices but possibly not PYNQ. This is not a synthesis issue (there are pure Verilog versions of these components available for any FPGA) but an efficiency one (the pure Verilog components may map down to registers rather than BRAMs).
The DRAM bandwidth on the PYNQ is lower, so one would probably halve the DRAM bus width and the number of vector lanes in Config.h.
We use an Intel clock-crossing primitive to put the CPU and SIMT cores in different clock domains, but this isn't really necessary and could simply be removed.

Honourable-A · 2022-10-20T09:56:24Z

Hi, thanks for your suggestions. I have a question about simulation because you mentioned it. I understand that there is a SIMTight simulator but it probably simulates the CPU only and not the SIMT cores. Please rectify me if my understanding is inaccurate.

mn416 · 2022-10-20T10:30:18Z

It simulates the entire SoC including CPU, SIMT core, memory subsystem, UART and DRAM.

The drawback is that simulation is (of course) slow compared to FPGA. Therefore, in simulation the benchmarks are run only for small data-set sizes. This can lead to underloading of the system, and a dip in IPC. So it may be desirable to increase the data-set sizes slightly in simulation until the point where the benchmarks are performing at an IPC level close to the following level obtained from FPGA:

Samples/VecAdd (build): ok
Samples/VecAdd (run): ok [IPC=29.26,Instrs=9126880,Cycles=311871,DRAMAccs=189100,Retries=23227,Susps=0]
Samples/Histogram (build): ok
Samples/Histogram (run): ok [IPC=31.14,Instrs=7153216,Cycles=229718,DRAMAccs=32994,Retries=4702,Susps=0]
Samples/Reduce (build): ok
Samples/Reduce (run): ok [IPC=31.56,Instrs=6358334,Cycles=201496,DRAMAccs=64101,Retries=733,Susps=0]
Samples/Scan (build): ok
Samples/Scan (run): ok [IPC=30.33,Instrs=222357876,Cycles=7330080,DRAMAccs=162304,Retries=45776,Susps=0]
Samples/Transpose (build): ok
Samples/Transpose (run): ok [IPC=31.28,Instrs=5648320,Cycles=180567,DRAMAccs=50240,Retries=2481,Susps=0]
Samples/MatVecMul (build): ok
Samples/MatVecMul (run): ok [IPC=28.88,Instrs=10864608,Cycles=376171,DRAMAccs=139968,Retries=5969,Susps=0]
Samples/MatMul (build): ok
Samples/MatMul (run): ok [IPC=31.40,Instrs=144054240,Cycles=4588073,DRAMAccs=89472,Retries=82750,Susps=0]
InHouse/BlockedStencil (build): ok
InHouse/BlockedStencil (run): ok [IPC=27.01,Instrs=48971680,Cycles=1812934,DRAMAccs=212416,Retries=10757,Susps=0]
InHouse/StripedStencil (build): ok
InHouse/StripedStencil (run): ok [IPC=31.45,Instrs=35541920,Cycles=1129937,DRAMAccs=175360,Retries=2345,Susps=0]
InHouse/VecGCD (build): ok
InHouse/VecGCD (run): ok [IPC=4.23,Instrs=10955517,Cycles=2591078,DRAMAccs=20350,Retries=892,Susps=0]

Honourable-A · 2022-10-20T11:51:15Z

Thanks again for the clarification. Is there any document or user guide to use this simulator? My intention is to develop an OS for SIMTight but I am not sure if I can use this simulator or how I can use it.

mn416 · 2022-10-20T13:53:35Z

These are the only docs at the moment:

The first one does explain how to use the simulator. The second one discusses software interfaces.

Honourable-A · 2022-11-17T16:26:35Z

Can you please tell me what is Mailbox and what is ITCM in the SoC diagram? Also please tell me how the CPU and SIMT are connected. Also, is it possible to run applications on the CPU and SIMT at the same time? There is an UART(USB) connection to the CPU. What is the purpose of this connection? Thanks

mn416 · 2022-11-20T04:35:34Z

We hope to improve SIMTight's documentation over the next year. Hopefully, I will be able to address such questions as part of that process.

Honourable-A · 2022-11-21T10:56:29Z

Thanks for your answer. I have another doubt about the scalarisation. How do you implement dynamic scalarisation in hardware? Do you detect it in simple or host core or do you detect it in SIMT? Is there any existing literature on dynamic scalarisation which you can direct me to? As per the description, the entire warp is executed on a single execution unit in a single cycle because of scalarisation. But please tell me what a single execution unit mean. Is it a signle hardware thread inside a block? Also according to the description, it operates in parallel with the main vector pipeline. Please tell me how it is done because currently syncronous kernel invocation is avaialble. That means the host can run only one kernel and waits till it finishes and scalar optimized kernel must finish before any other kerenel can run. Also what is a main vector pipeline? I know that I asked too many questions, but if you can kindly shed some light that will be very helpful.

mn416 · 2022-11-21T11:35:50Z

Again, I'll try to address these questions in the upcoming documentation process. Briefly:

Regarding existing work on scalarisation, there is lots. To mention two: this GPGPU architecture book and this ISCA'13 paper.
SIMTight's SIMT core contains a scalar pipeline and a vector pipeline, both independent of the host CPU which is not part of the SIMT core in any way.

mn416 mentioned this issue Jul 6, 2023

SIMTight files for Vivado #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to port the SIMTight project to PYNQ-Z2 FPGA board? #16

How to port the SIMTight project to PYNQ-Z2 FPGA board? #16

Honourable-A commented Oct 19, 2022

mn416 commented Oct 20, 2022 •

edited

Loading

Honourable-A commented Oct 20, 2022

mn416 commented Oct 20, 2022 •

edited

Loading

Honourable-A commented Oct 20, 2022

mn416 commented Oct 20, 2022 •

edited

Loading

Honourable-A commented Nov 17, 2022

mn416 commented Nov 20, 2022

Honourable-A commented Nov 21, 2022

mn416 commented Nov 21, 2022

How to port the SIMTight project to PYNQ-Z2 FPGA board? #16

How to port the SIMTight project to PYNQ-Z2 FPGA board? #16

Comments

Honourable-A commented Oct 19, 2022

mn416 commented Oct 20, 2022 • edited Loading

Honourable-A commented Oct 20, 2022

mn416 commented Oct 20, 2022 • edited Loading

Honourable-A commented Oct 20, 2022

mn416 commented Oct 20, 2022 • edited Loading

Honourable-A commented Nov 17, 2022

mn416 commented Nov 20, 2022

Honourable-A commented Nov 21, 2022

mn416 commented Nov 21, 2022

mn416 commented Oct 20, 2022 •

edited

Loading

mn416 commented Oct 20, 2022 •

edited

Loading

mn416 commented Oct 20, 2022 •

edited

Loading