All times are UTC-06:00




Post new topic  Reply to topic  [ 3 posts ] 
Author Message
PostPosted: Mon Mar 14, 2005 7:08 am 
Offline

Joined: Wed Oct 13, 2004 7:26 am
Posts: 348
I'd like to rewrite these two functions
toLower(char *a)
toUpper(char *a)

in Altivec. Actually the original ones, take just a char as parameter, but where they're used, i could easily use vectorized functions for this. My guess is that they would be quite faster, but i'd have to use a lookup table, or sth like that. Could anyone give me some pointer on lookup tables in Altivec?

Thanks :-)

Konstantinos


Top
   
PostPosted: Mon Mar 14, 2005 7:05 pm 
Offline

Joined: Fri Sep 24, 2004 1:39 am
Posts: 103
Location: Gothenburg, Sweden
I wrote the following two functions for one of the CrabFire filters. They require no memory lookups so they will not pollute the data cache or hog the memory bus.
Code:
vector char vec_tolower(vector char str)
{
/* From Holger Bettag's table of constants */
vector char A = vec_rl(vec_splat_u8(4), vec_splat_u8(4));
vector char Z = vec_vor(vec_rl(vec_splat_u8(0xb), vec_splat_u8(0xb)), vec_splat_u8(0xb));
vector char diff = vec_rl(vec_splat_u8(1), vec_splat_u8(5));

vector bool char gt = vec_cmpgt(str, A);
vector bool char lt = vec_cmplt(str, Z);
vector bool char mask = vec_and(gt, lt);
vector char small = vec_add(str, diff);
return vec_sel(str, small, mask);
}
Code:
vector char vec_toupper(vector char str)
{
/* From Holger Bettag's table of constants */
vector char a = vec_rl(vec_splat_u8(3), vec_splat_u8(5));
vector char z = vec_avg(vec_splat_u8(0), vec_splat_u8(-13));
vector char diff = vec_rl(vec_splat_u8(1), vec_splat_u8(5));

vector bool char gt = vec_cmpgt(str, a);
vector bool char lt = vec_cmplt(str, z);
vector bool char mask = vec_and(gt, lt);
vector char small = vec_sub(str, diff);
return vec_sel(str, small, mask);
}


Top
   
PostPosted: Tue Mar 15, 2005 2:36 am 
Offline

Joined: Mon Oct 11, 2004 12:49 am
Posts: 35
The most basic table lookup in AltiVec is a single vector permute: the 32 table entries reside in the 'left' and 'right' data vectors, and 16 indexes of 5 bit each reside in the 'permute control' vector.

To extend this to larger lookup tables, you need to compare and select based on the higher index bits. For example for a table with 64 entries, you do two 32 entry lookups as explained above. Then you mask out bit number 5 (the one valued as 32), compare the result to zero, and use the result of the comparison to select between the initial two lookups.

In practice you'd compute the boolean mask in parallel with the lookups (they happen in independent execution units).

As you can see, the decision tree can be recursively extended to include more significant bits, up to the limit of a full 8 bit lookup table. It looks unelegant, but it is almost always a win over scalar code indexing an actual char array in memory. The biggest problem is that the lookup table and the invariant values occupy a lot of vector registers. So there is not much headroom to do more calculation right before or after the lookup (you wouldn't want the compiler to spill and refill registers to/from memory).


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 3 posts ] 

All times are UTC-06:00


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
PowerDeveloper.org: Copyright © 2004-2012, Genesi USA, Inc. The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
All other names and trademarks used are property of their respective owners. Privacy Policy
Powered by phpBB® Forum Software © phpBB Group