This repository contains optimized code to perform inference of FANN-trained neural network on microcontrollers. Currently supported platforms are ARM Cortex M-series and Parallel Ultra-Low Power Platforms (PULP).
This repository contains optimized code to perform inference of FANN-trained neural network on the ARM Cortex M-series platform.
Given a data file and pre-trained network in FANN's format, all necessary files to run and test the network on the microcontroller are generated.
If this code is helpful for your research, please cite
X. Wang, M. Magno, L. Cavigelli, L. Benini, "FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things", arXiv:1911.03314 [cs.LG], Nov. 2019
You should have data and a pre-trained network in the FANN format. The generated codes uses optimized functions provided by CMSIS-DSP. To run the script, python needs to be installed. This code has been tested with TI's MSP432 platform and ST's STM32L475VG.
First, you need to export your data in the FANN default format
and train a neural network with FANN. How to do this is
explained here.
You should end up with two files, a .data
file and a .net
file.
An example can be found in the sample-data
folder.
In order to have optimized access to memory, the code generation script takes into account the available RAM and Flash memory in the selected microcontroller to store the parameters of the trained model in the level of memory closest to the processor which is still large enough to contain the model. Therefore you can give the memory configuration of your microcontroller as in mem_config.json
and give it as input to the code generation script.
Finally, you can use the generate.py
script to generate the
files to run the inference on the microcontroller, for example on arm using fixed point:
python generate.py -i sample-data/myNetwork -m fixed -p arm --mem_config mem_config.json
For more details on how to use generate.py:
python generate.py -h
Now all the *.h and *.c files in the root folder (fann.h
and fann_struct.h
) and in the output
folder can be copied to you project.
They include all the data and code to run the network.
To call it from your code, just include fann.h
and call
fann_type *fann_run(fann_type * input);
, where
fann_type
is float
or int
depending on whether you started
with a fixed-point model or not. Don't forget to include the files
in your build scripts/makefile/project.
The folder stm32l475-onDeviceTest-linux
contains a demo project running test and benchmarking code on an STM32L475 discovery board.
This repository contains optimized code to perform inference of FANN-trained neural network on PULP platforms.
Given a data file and pre-trained network in FANN's format, all necessary files to run and test the network on the microcontroller are generated.
You should have data and a pre-trained network in the FANN format. To run the script, python needs to be installed. To use pulp platform, pulp sdk needs to be installed, you can find instructions here. The generated codes uses optimized functions provided by PULP-DSP. Please follow the instructions on PULP-DSP to install the library. This code has been tested with PULP Mr.Wolf.
First, you need to export your data in the FANN default format
and train a neural network with FANN. How to do this is
explained here.
You should end up with two files, a .data
file and a .net
file.
An example can be found in the sample-data
folder.
In order to have optimized access to memory, the code generation script takes into account the available RAM and Flash memory in the selected microcontroller to store the parameters of the trained model in the level of memory closest to the processor which is still large enough to contain the model. Therefore you can give the memory configuration of your microcontroller as in mem_config.json
and give it as input to the code generation script.
Finally, you can use the generate.py
script to generate the
files to run on the microcontroller, for example on pulp using fixed point (currently only fixed point is supported on pulp):
python generate.py -i sample-data/myNetwork -m fixed -p pulp --mem_config mem_config.json
For more details on how to use generate.py:
python generate.py -h
Now all the *.h and *.c files in the root folder (fann.h
and fann_struct.h
) and in the output
folder can be copied to you project.
They include all the data and code to run the network.
To call it from your code, just include fann.h
and call
fann_type *fann_run(fann_type * input);
, where
fann_type
is float
or int
depending on whether you started
with a fixed-point model or not.
The folder MrWolf-onBoardTest
contains a demo project running test and benchmarking code on an PULP Mr. Wolf board. To run the demo you need to install and configure the pulp sdk (instructions here). Remember to source the sourceme.sh
everytime you open a new terminal to use pulp sdk.
After installing pulp sdk, run generate.py, copy the generated *.h and *.c files to the MrWolf-onBoardTest
folder and do
make clean all run
FANN allows to train your model and export it in fixed-point format easily.
After training with fann_train_on_data
and potentially saving the
floating-point model with fann_save
, just run
decimal_point = fann_save_to_fixed(ann, "myNetwork_fixed.net");
You can also convert your training or test data to fixed-point representation this way:
test_data = fann_read_train_from_file("./diabetes_test.data");
fann_save_train_to_fixed(test_data, "diabetes_test_fixed.data", decimal_point);
However, once you are running the code in-system, don't forget to rescale the input
data by scaling it accordingly: int x_fixed = x * (1 << DECIMAL_POINT);
. The decimal point constant is provided through fann_conf.h.
Furthermore, make sure that the data on which you are previously training your full-precision network is scaled to the [-1,1] interval including a potential safety-margin and that this scaling is also applied during on-device data preparation. FANN's network quantization method assumes the data is normalized this way and quantizes using worst-case data scaling assumptions. Thus training the network or feeding it non-normalized data is likely to introduce overflows.
Experimental tests show that if activation functions with names containing "STEPWISE_" are used already during the training, the loss in accuracy with fixed point inference is almost none.
Constant files:
generate.py
: the script generating the network and data-specific code files based an FANN-format datafann_structs.h
andfann.c
: contain the implementation of the NN building blocks.fann.h
: the header file to be included in your code providing thefann_type *fann_run(fann_type * input);
function declaration.sample-data/{myNetwork.net, myNetwork.data}
: sample data and network pre-trained with FANN.fann_utils.h
andfann_utils.c
: contain utility functions.test.c
: contains a test iterating over the exported test data. Serves as an example for 2-class classification.mem_config.json
: contains memory configurations of the selected microcontroller.arm
andpulp
: contain source code for respectively ARM Cortex-M series and PULP-based MCUs.generate.py
will copy the corresponding source codes for the selected MCU to theoutput
folder.
Generated files in output
folder:
fann_net.h
: contains the trained parameters and the network structure.fann_conf.h
: contains some more meta information on the network; #layers, fixed-point parameters (if applicable), ...test_data.h
: contains the test input data and expected resultfann.c
and/orfann_utils.c
andfann_utils.h
: the corresponding source codes for the selected MCU to theoutput
folder.
Please refer to the LICENSE file for the licensing of our code. We rely on the interfaces, specifications, and some code of the FANN project which is released on LGPL.