Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEON version slower on Arm Cortex-A9 #403

Open
tsat-psv opened this issue Jun 27, 2024 · 0 comments
Open

NEON version slower on Arm Cortex-A9 #403

tsat-psv opened this issue Jun 27, 2024 · 0 comments

Comments

@tsat-psv
Copy link

I've compiled the C example with and without NEON optimizations setting -DBLAKE3_USE_NEON=1 -O3 -mfpu=neon-vfpv4 compiler flags, and to my surprise the non NEON variant seems to perform better. I've tested on a ~30MB file (both from RAM and flash, to rule out I/O) and here are the results:

Without NEON:

time ./b3sum < /dev/mtd5ro
5420676b03e59d74cd44331c200ea841cd247374f307ce838dc6a0d367f73774
real	0m 2.11s
user	0m 0.67s
sys	0m 0.08s

With NEON:

time ./b3sum < /dev/mtd5ro
5420676b03e59d74cd44331c200ea841cd247374f307ce838dc6a0d367f73774
real	0m 2.17s
user	0m 0.76s
sys	0m 0.04s

I've saw that there were some changes in the 1.5.1 release, so I tried the 1.5.0, but the results are the same.

Any suggestions on what might cause this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@tsat-psv and others