Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance on STM32 board? #15

Open
leechwort opened this issue Sep 22, 2023 · 6 comments
Open

Poor performance on STM32 board? #15

leechwort opened this issue Sep 22, 2023 · 6 comments

Comments

@leechwort
Copy link

Hello everyone! I'm interested in running your framework on my Blackpill board (STM32F411CEU6). I've noticed that I'm experiencing poor performance, and I'm wondering if I might be using your framework incorrectly or if this is expected behavior. Since there are no examples provided for embedded applications, I'm concerned that I may have made a mistake somewhere.

To provide some context, I've connected a PCM5102 DAC along with DMA to ensure that the MCU isn't overwhelmed with a simple I2S operation. Below, I've included the relevant portions of my code from the main.c file, along with comments for clarity

#include "leaf.h"
I2S_HandleTypeDef hi2s1;

// Constants
#define SAMPLERATE 44000
#define LEAF_BUFFER_SIZE (2*44100)
#define BUFFER_SIZE 8192

// Buffers 
char mempool[10000]; // LEAF Memory pool
uint16_t samplebuffer[BUFFER_SIZE] = {0}; // Buffer, used for transmission to I2S codec with DMA

// Pointers, used for switching between buffers in DMA transfer
volatile uint16_t *current_buffer_element_ptr = samplebuffer;
volatile size_t current_buffer_element_cntr = 0;

// LEAF objects
LEAF leaf;
tCycle cycle;
tHermiteDelay delay;

// Utility functions
float rnd_func()
{
    return ((float)rand() / (float)(RAND_MAX));
}


// Callbacks used for DMA transfers. When first part of buffer was sent(i2s_transfer_half_complited_callback called)
// I put current_buffer_element_ptr to the beginning and allow LEAF to fill it, in this time another half of buffer was sent to the
// DAC. And vise versa
void i2s_transfer_complited_callback(I2S_HandleTypeDef *hi2s)
{
	if (current_buffer_element_cntr >= BUFFER_SIZE / 2)
	{
		current_buffer_element_ptr = samplebuffer + BUFFER_SIZE / 2;
		current_buffer_element_cntr = 0;
	} else {
		printf("buffer overrun");
	}
}

void i2s_transfer_half_complited_callback(I2S_HandleTypeDef *hi2s)
{
	if (current_buffer_element_cntr >= BUFFER_SIZE / 2)
	{
		current_buffer_element_ptr = samplebuffer;
		current_buffer_element_cntr = 0;
	} else {
		printf("buffer overrun");
	}
}

int main(void)
{
  // CUBEMX stuff
  HAL_Init();
  SystemClock_Config();
  MX_GPIO_Init();
  MX_DMA_Init();
  MX_I2S1_Init();
  MX_NVIC_Init();

  // Callbacks for DMA transfer, where I switch buffers
  HAL_I2S_RegisterCallback(&hi2s1, HAL_I2S_TX_COMPLETE_CB_ID, &i2s_transfer_complited_callback);
  HAL_I2S_RegisterCallback(&hi2s1, HAL_I2S_TX_HALF_COMPLETE_CB_ID, &i2s_transfer_half_complited_callback);
  HAL_I2S_Transmit_DMA(&hi2s1, samplebuffer, sizeof(samplebuffer)/sizeof(samplebuffer[0]));

  // LEAF stuff init.
  LEAF_init(&leaf, SAMPLERATE, mempool, LEAF_BUFFER_SIZE, &rnd_func);
  tCycle_init(&cycle, &leaf);
  tCycle_setFreq(&cycle, 220);
  tHermiteDelay_init(&delay, 2000, 2500, &leaf);
  tHermiteDelay_setGain(&delay, 0.5f);

  uint64_t counter = 0;
  while (1)
  {
        // If DMA controller succesfully finished transfer to audio codec, we can put new data there.
       // This part work ok when simple stuff are done there.
	if (current_buffer_element_cntr < BUFFER_SIZE / 2)
	{
		counter++;

		if ((counter % 100000) == 10000)
			tCycle_setFreq(&cycle, 220);
		else if ((counter % 100000) == 20000)
			tCycle_setFreq(&cycle, 330);
		else if ((counter % 100000) == 30000)
			tCycle_setFreq(&cycle, 220);
		else if ((counter % 100000) == 40000)
			tCycle_setFreq(&cycle, 0);

		float processed_value = tCycle_tick(&cycle);
		//float delayed_value = tHermiteDelay_tick(&delay, processed_value);  // <<<< LOOK HERE
		//processed_value = delayed_value; // <<<<< LOOK HERE
		*(current_buffer_element_ptr + current_buffer_element_cntr) = (uint16_t) (0x0fff * (1.0f + processed_value));
		current_buffer_element_cntr++;
	}
  }
}

We can assume that the code is functioning correctly. I have provided a recording of the sound when the sequence is running as expected:
Record - sequence, works ok

Next, I uncommented a section marked as "<<<< LOOK HERE." This enabled a delay for the audio, and as a result, the sound became severely distorted:
Record - sequence + delay, broken

I also tested similar code on a host machine (you can find it in my fork and example: https://github.com/leechwort/LEAF/blob/master/Examples/sawtooth-sequence.c) and it worked ok. This leads me to suspect that the issue might be related to performance limitations on the STM32 board.

In summary, I have a few questions:

  • Can you suggest what might be causing this behavior on the STM32 board? Is there a specific way I should be using your framework for embedded systems that differs from using it on a host machine?
  • Do you have any example projects specifically designed for the STM32 platform that I could refer to for guidance?
  • It appears that the FPU (Floating-Point Unit) is not utilized in this framework. Do you have plans to implement FPU support in the future?
@spiricom
Copy link
Owner

spiricom commented Sep 22, 2023 via email

@spiricom
Copy link
Owner

spiricom commented Sep 22, 2023 via email

@tomerbe
Copy link
Collaborator

tomerbe commented Sep 22, 2023 via email

@leechwort
Copy link
Author

Thank your replies, guys! I've re-checked that FPU is enabled on my board, even made some benchmark for piece of code:

  volatile uint32_t start = HAL_GetTick();
  for (int i = 0; i < 10000000; i++)
	  for (int j = 0; i < 10000000; i++)
  {
	  volatile float x = sinf((0.5f*i) * (0.4f*j));
  }
  volatile uint32_t end = HAL_GetTick() - start;

~19 sec without FPU and ~6 sec with. And no changes for LEAF performance. I'm still thinking maybe I'm missing something. Sure, F411 has much less perfomance then stm32f7, but is delay so much power-consuming?

Also, my fault about last question. I meant "CMSIS DSP" unit.

@leechwort
Copy link
Author

Ok, looks like things becomes better with increasing buffers, but since I have a limits for RAM blackpill maybe time to switch to something with onboard additional RAM:)

@tomerbe
Copy link
Collaborator

tomerbe commented Sep 22, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants