Power Developer https://powerdeveloper.org/forums/ |
|
Eigen port to ARM NEON! https://powerdeveloper.org/forums/viewtopic.php?f=60&t=1776 |
Page 1 of 2 |
Author: | markos [ Wed Mar 03, 2010 1:30 pm ] |
Post subject: | Eigen port to ARM NEON! |
http://bitbucket.org/eigen/eigen/change ... af7abc0af/ Here are some results from a matrix addition/multiplication benchmark (sizes 512x512) on the Efika MX: Scalar: $ ./bench_gemm.gcc4.4.1cs eigen cpu 3.84s 0.0699051 GFLOPS (19.27s) eigen real 3.8469s 0.0697796 GFLOPS (19.2648s) NEON: $ ./bench_gemm.gcc4.4.1cs.neon eigen cpu 0.81s 0.331402 GFLOPS (4.07s) eigen real 0.813919s 0.329806 GFLOPS (4.07218s) ~4.6x faster... No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC. UPDATE: Results have been fixed, apparently the scalar results were without -mfpu=vfp option -which is needed to actually use the FPU on ARM. ~4.5x faster is more logical, but still very very nice :) Sorry for the misunderstanding |
Author: | PurpleAlien [ Wed Mar 03, 2010 1:52 pm ] |
Post subject: | |
Nice :-) Great work! Johan. |
Author: | Jerzy Guc (Drako) [ Wed Mar 03, 2010 2:00 pm ] |
Post subject: | |
Wow did not expect this result So NEON is not so bad :) |
Author: | takemehomegrandma [ Wed Mar 03, 2010 3:07 pm ] |
Post subject: | Re: Eigen2 port to ARM NEON! |
It would be really interesting to see a broad spectrum benchmark comparison between a 7447 G4 PPC CPU (as used in Pegasos 2) running @ 800MHz (or results recalculated from 1GHz to 800MHz accordingly) and an i.MX515 CPU. I have no sense whatsoever about the ARM chip's performance (I have never seen or experienced one in action), but maybe it won't be too bad off? I think it would be interesting for more people than me here on Powerdeveloper.org to see a comparison with the Pegasos 2 G4 hardware, of which most of us has experiences from and can relate to! Not that raw performance is the key goal of the chip, rather power efficiency, but anyway... |
Author: | markos [ Wed Mar 03, 2010 4:21 pm ] |
Post subject: | Re: Eigen2 port to ARM NEON! |
Quote: It would be really interesting to see a broad spectrum benchmark comparison between a 7447 G4 PPC CPU (as used in Pegasos 2) running @ 800MHz (or results recalculated from 1GHz to 800MHz accordingly) and an i.MX515 CPU.
I will provide tomorrow with Eigen results from G4 also for comparison. One thing is for certain though: NEON has some real good tricks up its sleeve that are not available in either SSE or AltiVec. Even for that it wins both, IMHO.
I have no sense whatsoever about the ARM chip's performance (I have never seen or experienced one in action), but maybe it won't be too bad off? I think it would be interesting for more people than me here on Powerdeveloper.org to see a comparison with the Pegasos 2 G4 hardware, of which most of us has experiences from and can relate to! Not that raw performance is the key goal of the chip, rather power efficiency, but anyway... |
Author: | blu [ Wed Mar 03, 2010 8:55 pm ] |
Post subject: | |
impressive results, Markos. what are you impressions from this simd isa so far? ps: don't you miss the permute? ; ) |
Author: | markos [ Thu Mar 04, 2010 3:44 am ] |
Post subject: | |
Quote: impressive results, Markos. what are you impressions from this simd isa so far?
The ISA is a very complete and orthogonal SIMD approach. It can do many more things than AltiVec or SSE (I especially like the fact that I can split a 128-bit vector into 2 64-bit vectors, perform an operation and then combine them back into 128-bit. It can load/store 4x128-bit vectors at once alsops: don't you miss the permute? ; ) PS. It has vtbl and vtbx, which perform the same thing, I haven't played around with it yet though :) |
Author: | corto [ Thu Mar 04, 2010 2:21 pm ] |
Post subject: | Re: Eigen port to ARM NEON! |
Quote:
~4.6x faster...[/b]
Results are impressive but the comment makes me sad ...No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC. I didn't expect NEON was so good ... |
Author: | markos [ Thu Mar 04, 2010 4:31 pm ] |
Post subject: | Re: Eigen port to ARM NEON! |
Quote: Quote:
~4.6x faster...[/b]
Results are impressive but the comment makes me sad ...No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC. I didn't expect NEON was so good ... $ ./bench_gemm.gcc4.4.1cs.neon eigen cpu 2.44s 0.880116 GFLOPS (12.29s) eigen real 2.44403s 0.878666 GFLOPS (12.2967s) (compiled with gcc 4.4.1 CodeSourcery) $ ./bench_gemm.gcc4.5.neon eigen cpu 2.36s 0.909951 GFLOPS (11.85s) eigen real 2.36316s 0.908733 GFLOPS (11.8516s) (compiled with gcc 4.5 experimental) ~12.9x times faster. Yes this time it's real. According to the Eigen developers, we have a theoritical limit of 1.6GFLOPS in the EfikaMX, so we have a bit of a work to do yet :) |
Author: | jcmarcos [ Fri Mar 05, 2010 6:30 am ] |
Post subject: | Re: Eigen port to ARM NEON! |
Quote: Quote:
~4.6x faster...[/b]
I didn't expect NEON was so good ...No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC. |
Author: | markos [ Fri Mar 05, 2010 7:53 am ] |
Post subject: | Re: Eigen port to ARM NEON! |
Quote: Quote: Quote:
~4.6x faster...[/b]
I didn't expect NEON was so good ...No comments, apart from one: if NEON is that good -and I think it is-, I don't think I'll miss AltiVec and PowerPC. Scalar: $ ./bench_gemm eigen cpu 2.65264s 0.809565 GFLOPS (13.283s) eigen real 2.6532s 0.809394 GFLOPS (13.2863s) Altivec: $ ./bench_gemm eigen cpu 1.17936s 1.82088 GFLOPS (5.90097s) eigen real 1.17959s 1.82054 GFLOPS (5.90304s) But have in mind that PowerPC support is much better and more mature than for ARM (esp. wrt NEON) and that PowerPC is slightly faster at 1Ghz. Theoritically the G4 can do 4GFLOPS at fp math and the iMX515 can do 1.6GFLOPS. |
Author: | jcmarcos [ Fri Mar 05, 2010 10:00 am ] |
Post subject: | Re: Eigen port to ARM NEON! |
Quote: PowerPC is slightly faster at 1Ghz
Yes, but ARM is smarter, because it always sucks less electrons. Or am I wrong?Have you seen that initiative, to build new high power ARM CPUs that are NOT targetted at mobile computers? What will happen when they free ("take off the handcuffs") these processor from the power restrictions they've always had? |
Author: | markos [ Fri Mar 05, 2010 10:29 am ] |
Post subject: | Re: Eigen port to ARM NEON! |
Quote:
Yes, but ARM is smarter, because it always sucks less electrons. Or am I wrong?
I have remote access to a prototype quad-core ARM Cortex A9 :-P
Have you seen that initiative, to build new high power ARM CPUs that are NOT targetted at mobile computers? What will happen when they free ("take off the handcuffs") these processor from the power restrictions they've always had? |
Author: | corto [ Sun Mar 07, 2010 3:46 pm ] |
Post subject: | Re: Eigen port to ARM NEON! |
Quote: Quote: PowerPC is slightly faster at 1Ghz
Yes, but ARM is smarter, because it always sucks less electrons. Or am I wrong?Have you seen that initiative, to build new high power ARM CPUs that are NOT targetted at mobile computers? What will happen when they free ("take off the handcuffs") these processor from the power restrictions they've always had? jcmarcos : I liked very much ARM because it was small, efficient, easy to play with ... But with years, they add many things that were not planned and it is sometimes ugly in my opinion. I am afraid to see it takes the same way x86 did. But some features are great and it works well. I work on ARM every day and I sometimes play with low level things. |
Author: | slyd [ Tue Mar 09, 2010 11:35 am ] |
Post subject: | |
> the iMX515 can do 1.6GFLOPS The NEON Pipeline has 4 Single Precision FP Multiply Units and 4 Accumulators ... it could handle 4 Floats/Cycle. So shouldn't this be 3.2 GFLOPS or am I missing something? |
Page 1 of 2 | All times are UTC-06:00 |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |