A few days ago i implemented CAST-256 in C and was unsatisfied by GCC's usage of the PowerPC isa's uncommon instructions.
Code:
Type 1: I = ((Kmi + D) <<< Kri)
O = ((S1[Ia] ^ S2[Ib]) - S3[Ic]) + S4[Id]
This excerpt from rfc2612 describes the first of the three F functions used in CAST-256 in C I used the following macro to implement it:
Code:
#define F1(D, R, M) \
( \
I = ( (M) + (D) ), \
I = rol( (R), I ), \
( ( ( SBox1[I >> 24] ^ SBox2[(I >> 16) & 0xFF] ) - SBox3[(I >>
& 0xFF] ) + SBox4[I & 0xFF] ) \
)
No matter what i tried gcc always used at least 3 instructions to calculate the index of an sbox if more than a simple shift right or masking is required for the case ( I >> 16 ) & 0xFF gcc used srwi, andi, and slwi. This are three dependend operations where as a single rlwinm $1, $4, 18, 22, 29 would do the job.
Is anyone here interessed in the source then it works?
Current version F1 in Assembler:
Code:
Dst0, Dst1, Dst2, Dst3, Src
.macro Split_Word_Mul_4
rlwinm $0, $4, 10, 22, 29
rlwinm $1, $4, 18, 22, 29
rlwinm $2, $4, 26, 22, 29
rlwinm $3, $4, 2, 22, 29
.endmacro
; Dst0 & Idx0, .. , Dst3 & Idx3
.macro Load_SBoxes
lwzx $0, SBox1, $0
lwzx $1, SBox2, $1
lwzx $2, SBox3, $2
lwzx $3, SBox4, $3
.endmacro
; Dst0, Dst1, Dst2, Dst3, Src
.macro SBoxes
Split_Word_Mul_4 $0, $1, $2, $3, $4
Load_SBoxes $0, $1, $2, $3
.endmacro
.macro F1
add Tmp3, Mask, Data
rotlw Tmp3, Tmp3, Rota
SBoxes Tmp0, Tmp1, Tmp2, Tmp3, Tmp3
xor Tmp0, Tmp0, Tmp1
sub Tmp0, Tmp0, Tmp2
add Tmp0, Tmp0, Tmp3
.endmacro
P.S.: I didn't reformat the code sections their are just copied and pasted for Xcode.