Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M0+ version #1

Open
martin31821 opened this issue Sep 27, 2019 · 3 comments
Open

M0+ version #1

martin31821 opened this issue Sep 27, 2019 · 3 comments

Comments

@martin31821
Copy link

Is there an easy way to port this library to the Cortex M0+ (or ARMv6-M in general)?

@Emill
Copy link
Owner

Emill commented Sep 27, 2019

Hi @martin31821!

The amazing speed is mainly due to the availability of the UMAAL instruction in Cortex-M4 which in one cycle computes the 64-bit result of a*b+c+d (all inputs are 32-bits). Unfortunately with Cortex-M0(+) you only have the MULS instruction which computes the 32-bit result a*b (inputs are 32-bits), so you need to compute 4 multiplications to get a 64-bit result, as well as add everything together. That's the main reason why Cortex-M0(+) is so much slower.

You should be able to use https://munacl.cryptojedi.org/curve25519-cortexm0.shtml on Cortex-M0+ to get decent performance (3.6 Mcycles per operation). I'm working on a slightly optimized version, mainly focusing on reducing the size of the code since the multiplication loop there is too much unrolled, but also use more assembly code to squeeze out more performance. It's similar to https://github.com/Emill/P256-cortex-ecdh/blob/master/P256-cortex-m0-ecdh-keil.s. Please tell me if you think I should polish it and put it up.

@martin31821
Copy link
Author

Thanks for the explanation. I also found the curve25519 implementation you linked during my research and I'll test it on my device.
Your code looks awsome, even though I'm not very experienced with arm assembler, so if you are already on it and it isn't much extra work, I'd really like to see the optimized version.

@xavieryin
Copy link

xavieryin commented Aug 4, 2021

I'm working on a slightly optimized version, mainly focusing on reducing the size of the code since the multiplication loop there is too much unrolled, but also use more assembly code to squeeze out more performance. It's similar to https://github.com/Emill/P256-cortex-ecdh/blob/master/P256-cortex-m0-ecdh-keil.s. Please tell me if you think I should polish it and put it up.

Thank you @Emill for sharing all these fine work. I'm just to replying here, as another happy M4 version user, to express my intrest to your M0(+) version. Is it already avaliable somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants