(reposted here, on Sven's advice)
Hi guys,
I've just completed my first altivec program, a very simple one indeed and I have to say I'm really impressed!
I did an optimized version of a common routine strfill(), quite common (and used in this form in MySQL), which fills a given string with a given char. Here are the benchmarks for both scalar and altivec versions, with different string sizes:
Code:
#size scalar altivec
13 5 2
16 6 1
27 12 3
64 23 2
90 29 3
128 43 3
185 59 5
256 81 6
347 111 8
512 164 11
831 277 18
2048 686 40
3981 1316 86
(Note: I used both 'random' sizes and powers of 2, but this didn't seem to have any impact.)
and here is the code that produced this output:
Code:
#include <altivec.h>
#include <stdio.h>
#include <time.h>
// This one was shamelessy stolen and adapted from Apple's Altivec tutorial
vector unsigned char inline vec_ldsplatchar(unsigned char splatchar) {
vector unsigned char splatmap = vec_lvsl(0, &splatchar);
vector unsigned char result = vec_lde(0, &splatchar);
splatmap = vec_splat(splatmap, 0);
return vec_perm(result, result, splatmap);
}
unsigned char *vec_strfill(unsigned char *s, int len, char p) {
int i;
vector unsigned char sm = vec_ldsplatchar(p);
vector unsigned char *v1 = (vector unsigned char *)s;
vector unsigned char vec_a;
for (i=0; i < len-1; i = i+16) {
vec_a = vec_ld(0, v1);
vec_a = vec_splat(sm, 0);
vec_st(vec_a, i, s);
}
return s;
}
unsigned char *strfill(unsigned char *s, int len,char fill)
{
while (len--) *s++ = fill;
*(s) = '\0';
return(s);
} /* strfill */
int main( void )
{
int i, j, k, max, loops = 100000000;
time_t dt1, dt2, t1, t0;
int sizes[] = {13,16,27,64,90,128,185,256,347,512,831,2048,3981,8192,10000};
printf("#size\tscalar\taltivec\n");
for (k = 0; k < 16; k++) {
max = sizes[k];
unsigned char __attribute__ ((aligned(16))) test[max];
unsigned char __attribute__ ((aligned(16))) splatchar = rand();
t0 = time(NULL);
for (j=0; j < loops; j++) {
test[max] = '\0';
strfill(test, max, splatchar);
}
t1 = time(NULL);
dt1 = t1 - t0;
t0 = time(NULL);
for (j=0; j < loops; j++) {
vec_strfill(test, max, splatchar);
}
t1 = time(NULL);
dt2 = t1 - t0;
printf("%ld\t%ld\t%ld\n", max, dt1, dt2);
}
return 0;
}
I have to say, this code has bugs and is not very optimized, I'm a newbie when it comes to Altivec. But it did convince me that even a newbie like me can write
fast Altivec code! I'm posting here for comments and maybe ideas to make it faster, safer etc, so if I've done something exceedingly stupid in this code, please tell me
Also, I used a simple time() function, as I couldn't get pmon to work for me on 2.6.8.
Thanks to Luca for his excellent comments and pointers! Thanks to Genesi for designing such a nice system as Pegasos 2, the more I use it, the more I like it
Regards
Konstantinos