All times are UTC-06:00




Post new topic  Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Thu Mar 10, 2005 10:38 pm 
Offline

Joined: Wed Oct 13, 2004 7:26 am
Posts: 348
Here are benchmarks from memcpy().
I'll post a comparison to libmotovec right after.
Code:
$ ./altivectorize -v -s -g --norandom --loops 1000000
Altivec is supported
Verbose mode on
Will do both scalar and vector tests
Will also do glibc tests
loops: 1000000
output file:
will do tests: memcpy
#size arrays glibc altivec (Effective bandwidth)
7 599186 0.130 (51.4 MB/s) 0.090 (74.2 MB/s)
13 325000 0.140 (88.6 MB/s) 0.110 (112.7 MB/s)
16 262144 0.140 (109.0 MB/s) 0.150 (101.7 MB/s)
20 209715 0.140 (136.2 MB/s) 0.210 (90.8 MB/s)
27 155344 0.140 (183.9 MB/s) 0.150 (171.7 MB/s)
35 119837 0.150 (222.5 MB/s) 0.140 (238.4 MB/s)
43 97542 0.170 (241.2 MB/s) 0.160 (256.3 MB/s)
54 77672 0.180 (286.1 MB/s) 0.140 (367.8 MB/s)
64 65536 0.160 (381.5 MB/s) 0.130 (469.5 MB/s)
90 46603 0.190 (451.7 MB/s) 0.180 (476.8 MB/s)
128 32768 0.210 (581.3 MB/s) 0.140 (871.9 MB/s)
185 22672 0.250 (705.7 MB/s) 0.170 (1037.8 MB/s)
256 16384 0.320 (762.9 MB/s) 0.180 (1356.3 MB/s)
347 12087 0.370 (894.4 MB/s) 0.200 (1654.6 MB/s)
512 8192 0.520 (939.0 MB/s) 0.250 (1953.1 MB/s)
831 5047 0.770 (1029.2 MB/s) 0.350 (2264.3 MB/s)
2048 2048 1.940 (1006.8 MB/s) 0.760 (2569.9 MB/s)
3981 1053 3.580 (1060.5 MB/s) 0.980 (3874.1 MB/s)
8192 512 7.080 (1103.5 MB/s) 2.560 (3051.8 MB/s)
13488 311 11.450 (1123.4 MB/s) 4.220 (3048.1 MB/s)
16384 256 14.040 (1112.9 MB/s) 4.980 (3137.6 MB/s)
38893 108 34.160 (1085.8 MB/s) 16.700 (2221.0 MB/s)
65536 64 62.160 (1005.5 MB/s) 30.040 (2080.6 MB/s)
105001 40 101.610 (985.5 MB/s) 43.490 (2302.5 MB/s)
262144 16 259.710 (962.6 MB/s) 121.090 (2064.6 MB/s)
600000 7 1300.250 (440.1 MB/s) 939.560 (609.0 MB/s)
1134355 4 3007.870 (359.7 MB/s) 2913.140 (371.4 MB/s)
2097152 2 6011.970 (332.7 MB/s) 5841.110 (342.4 MB/s)
The code is available in the same cvs repo as before:
Code:
$ cvs -z3 -d:pserver:anonymous@cvs.alioth.debian.org:/cvsroot/pegasos co altivectorize
As for the code, you'll notice that it's quite fast even for small sizes (some times even faster). Also, since alignment issues are taken care of by using the original memcpy() for copying the offset bytes, you'll notice that the speed of the routine is pretty much constant regardless of alignment.


Top
   
PostPosted: Fri Mar 11, 2005 8:07 am 
Offline

Joined: Wed Oct 13, 2004 7:26 am
Posts: 348
Quote:
I'll post a comparison to libmotovec right after.
I linked the benchmark app to libmotovec, so that it used libmotovec's memcpy() as the default one (in the glibc column).
Code:
$ ./altivectorize -v -s -g --norandom --loops 1000000
Altivec is supported
Verbose mode on
Will do both scalar and vector tests
Will also do glibc tests
loops: 1000000
output file:
will do tests: memcpy
#size arrays glibc altivec (Effective bandwidth)
7 599186 0.090 (74.2 MB/s) 0.080 (83.4 MB/s)
13 325000 0.110 (112.7 MB/s) 0.100 (124.0 MB/s)
16 262144 0.130 (117.4 MB/s) 0.100 (152.6 MB/s)
20 209715 0.110 (173.4 MB/s) 0.160 (119.2 MB/s)
27 155344 0.120 (214.6 MB/s) 0.130 (198.1 MB/s)
35 119837 0.110 (303.4 MB/s) 0.100 (333.8 MB/s)
43 97542 0.130 (315.4 MB/s) 0.130 (315.4 MB/s)
54 77672 0.120 (429.2 MB/s) 0.130 (396.1 MB/s)
64 65536 0.130 (469.5 MB/s) 0.120 (508.6 MB/s)
90 46603 0.160 (536.4 MB/s) 0.180 (476.8 MB/s)
128 32768 0.180 (678.2 MB/s) 0.160 (762.9 MB/s)
185 22672 0.190 (928.6 MB/s) 0.170 (1037.8 MB/s)
256 16384 0.230 (1061.5 MB/s) 0.190 (1285.0 MB/s)
347 12087 0.230 (1438.8 MB/s) 0.210 (1575.8 MB/s)
512 8192 0.300 (1627.6 MB/s) 0.280 (1743.9 MB/s)
831 5047 0.410 (1932.9 MB/s) 0.330 (2401.5 MB/s)
2048 2048 0.780 (2504.0 MB/s) 0.800 (2441.4 MB/s)
3981 1053 1.360 (2791.6 MB/s) 0.970 (3914.0 MB/s)
8192 512 2.440 (3201.8 MB/s) 2.790 (2800.2 MB/s)
13488 311 4.310 (2984.5 MB/s) 4.200 (3062.7 MB/s)
16384 256 5.280 (2959.3 MB/s) 5.110 (3057.7 MB/s)
38893 108 16.990 (2183.1 MB/s) 15.880 (2335.7 MB/s)
65536 64 28.670 (2180.0 MB/s) 27.190 (2298.6 MB/s)
105001 40 49.370 (2028.3 MB/s) 46.730 (2142.9 MB/s)
262144 16 138.550 (1804.4 MB/s) 129.870 (1925.0 MB/s)
600000 7 1040.730 (549.8 MB/s) 1045.950 (547.1 MB/s)
1134355 4 2742.170 (394.5 MB/s) 2868.980 (377.1 MB/s)
I think the numbers speak for themselves. Who said assembly is better than C? :-)


Top
   
 Post subject:
PostPosted: Wed May 17, 2006 10:34 am 
Offline

Joined: Tue Nov 02, 2004 2:11 am
Posts: 161
Hmm I'm getting somewhat different results.

Memorybench - copying a block of 80 MB from a -> b

glibc: 366.8876 MB/sec
Freevec: 381.2597 MB/sec
Motovec: 637.9746 MB/sec
FC64: 611.4247 MB/sec


Cachebench - copying a block of 8 KB from a -> b

glibc: 2637.2605 MB/sec
Freevec: 5510.4557 MB/sec
Motovec: 7355.0409 MB/sec
FC64: 4557.8185 MB/sec

*FC64 is real simple copy loop using float registers.
Its loop unrolled to copy 64 byte (2 cache lines) per loop iteration.
The copy gets speed by using a dcbt to prefetch the next two cache lines while copying the current.

I think the improved memory throughput from 360 to 630 MB/sec
does have a huge impack on many applications.

Freevec does not improve the memory throughput as good as libmotovec.
But maye I'm doing something wrong with freevec here.

For some more PowerPC memory benchmarks see here:
glibc benchmarks

Cheers
Gunnar


Top
   
 Post subject:
PostPosted: Thu May 18, 2006 9:50 pm 
Offline

Joined: Thu Apr 07, 2005 10:40 am
Posts: 35
What would be the easiest way to get your applications to use these altivec optimized functions instead of the glibc provided ones?


Top
   
 Post subject:
PostPosted: Fri May 19, 2006 2:08 am 
Offline

Joined: Tue Nov 02, 2004 2:11 am
Posts: 161
Quote:
What would be the easiest way to get your applications to use these altivec optimized functions instead of the glibc provided ones?
The easiest for all users and the best for Linux would be
if Linux would use the glibc functions that MAC OS X uses.
Apple has optimized functions for every PowerPC CPU (G3/G4/G5)
OS X benchmarks the functions for your system on startup and then uses the most optimal function for you.

The Apple routines are up to 80% faster than the "simpleminded" ones that linux. As the Apple source are free there is actually no excuse for Linux not to use them.


If you don't want to wait for Linux to use proper PPC functions but want to compile your application with the Altivec then you just need to include it on the linker command line prior to the compiler's libc library.
Exmaples how this can done for each compiler are in the motovec readme.


Cheers
Gunnar


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 5 posts ] 

All times are UTC-06:00


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
PowerDeveloper.org: Copyright © 2004-2012, Genesi USA, Inc. The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
All other names and trademarks used are property of their respective owners. Privacy Policy
Powered by phpBB® Forum Software © phpBB Group