Deep Inference In FPGA

Project Description

A Deep Neural Network-inference accelerator is created in hardware. The codes for hardware is written in System Verilog. The hardware module is interfaced with NIOS computer system, thus this hardware acts as a peripheral to the computer system. The driver code to interface the hardware is written in C, and can be found in sources_NIOS folder.

A simple Neural Network implementation in software is also done in C. This implementation can be found in the same C file mentioned above.

Usage

Compile the System Verilog code in Intel Quartus or any other software (This step is unnecessary, and can be skipped if you do not want to check for errors).
Create a custom peripheral using the system Verilog codes provided.
Add the peripheral created above to your favorite computer system. The top level assumes NIOS computer system with Avalon interface. However, same System Verilog Code can be used to interface with ARM since the compiler will insert additional buses while compiling.
Use any software to program the computer system that you just used. Program the computer system with the C file provided, and watch the output.
Compare the results and performance between the hardware implementation and software implementation based on the result seen on terminal.
For your custom implementation, please train the network and generate weights and biases. Convert them to fixed point representation. You can update the specific locations in system Verilog code taking this implementation as reference.

Problem Definition

The problem that is being solved by the implementation is a classification based on a complex mathematical function. (This mathematical function is, however, very simple compared to the real world problems that we face today. This is just an illustration.)

Here, two mathematical functions, y1 = sin(x^2+2*x+3) and y2 = cos(x^2+2*x+3). The domain of x taken for y1 is x1=[-7.00:7.00:0.01] and for y2 is x2=[-7.01:6.99:0.01]. The individual functions look as follows:

and

When we add these two functions, the added function with respect to x1 looks like following:

We can take a threshold of 0.0 and create a classification problem out of this such that value

Class Label 1 if f(x1,x2) = y1+y2 > 0.00
Class Label 0 if f(x1,x2) = y1+y2 <= 0.00

Implementation Details

Activation of ReLU is used because it is simple.
The weights and inputs are float for training. Training is done in SKLearn using MLP Classifier.
For hardware implementation, the weight values and biases are converted to Fixed Point Implementation. 9 bit representation with 4 places of fractional part is used because DE2-115 platform has 9-bit multiplication modules.
Python scripts are used to automatically generate the codes for Single Layer and Neural Network. These codes are not released however. Please let me know and I will release them personally.
Software Implementation of NN in NIOS is done using floating point representation.
The implementation in hardware is fully combinational. This might be optimal for resource usage. All the implementation boils down to multiplication routine. There is a sequential multiplication routine as well. Please fee free to modify the code using sequential multiplication routine so that resource utilization is optimized.

Results and Comparison

Following shows the result obtained by running the implementation:

We can see the speed up achieved is 400 times.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
images		images
sources_FPGA		sources_FPGA
sources_NIOS		sources_NIOS
.gitignore		.gitignore
README.md		README.md
TechnicalReport.pdf		TechnicalReport.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Inference In FPGA

Project Description

Usage

Problem Definition

Implementation Details

Results and Comparison

About

Releases

Packages

Languages

sickRanchez-c137/InferenceInFPGA

Folders and files

Latest commit

History

Repository files navigation

Deep Inference In FPGA

Project Description

Usage

Problem Definition

Implementation Details

Results and Comparison

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages