Power Developer
https://powerdeveloper.org/forums/

sse3 vs altvec
https://powerdeveloper.org/forums/viewtopic.php?f=23&t=643
Page 1 of 1

Author:  DaBlitz [ Fri Jun 16, 2006 2:23 am ]
Post subject:  sse3 vs altvec

I have seen some reports stating that sse3 has finally allowed intel to catch up in terms of speed to the altvec instruction set, anyone knowlageable care to comment on this?

Author:  Grzegorz Kraszewski [ Fri Jun 16, 2006 8:37 am ]
Post subject:  Re: sse3 vs altvec

Quote:
I have seen some reports stating that sse3 has finally allowed intel to catch up in terms of speed to the altvec instruction set, anyone knowlageable care to comment on this?
Don't think so. Look at the comparision of CPU performance in distributed.net contest. I think it is a good comparision because dnetc cores are very optimized for given architecture and of course all available SIMD units are used. Then look at the results:

RC5-72

The fastest x86 (single core) is Athlon64 at about 2.6 GHz, with 11.7 Mkeys/s. My poor 7447 at 1.0 GHz (Pegasos II) does 10.7 Mkeys/s. Some more reasonably clocked PPC, like 7447 at 1.7 GHz does 17.5 Mkeys/s.

OGR-25

The fastest x86 (single core) is again Athlon64 with 30 Mnodes/s at 2.6 GHz. It is significantly better than my Pegasos (22 Mnodes/s). But then again 7447 at 1,7 GHz does 39.5 Mnodes/s...

My conclusion is they still are going to catch up with PowerPC, but they haven't done yet. And I do not even want to guess dnetc througput of CELL processor ...

Author:  DaBlitz [ Sat Jun 17, 2006 12:15 am ]
Post subject: 

Wow after looking at that it makes me wonder what the OSW will get

what i see seems to confirm what you say however i am not sure what types of optimisations the program uses but i do know that the power arcitecture is "cleaner" in my opinion it seems to me that the x86 does some things in wierd ways

Author:  lu_zero [ Sat Jun 17, 2006 7:42 am ]
Post subject: 

The main issue with altivec is that too few people work on it and too few applications are getting optimized for it.

you may have fun with programs like jack the ripper and see the difference between some altivectorized pieces and non altivectorized ones.

Altivec is an impressive tool, quite more easier to use than any other SIMD and quite nice as results.

Please consider that g4 aren't exactly the "latest tecnology" and still something runs on them with reasonable performance.

(that said I should go back profiling h264 on ffmpeg to improve Romains code...)

Author:  Crest [ Sat Jun 17, 2006 12:19 pm ]
Post subject: 

Intels most obvious problem ist that they use onle 64 bit busses for SSE3 this limits their throughput to the half of AltiVec's throughput, but this is only the most obvious problem. Only to mention a few other:
- the lack of a permutation unit.
- high register pressure.
- 2 operand instructionformat destroys first source.

SSE3's only advantage is the support of 64 bit floats.

Nevertheless AltiVec will be beaten by SSE3 in near feature, because Freescale is the only major CPU developer using AltiVec. AltiVec ist unchanged for over 7 years now. SSE has been updated 2 times since it's introduction. In order to secure AltiVec's speed advantage I would recommend the following steps:
- increasing the register width to 512 bit.
- improving the workflow for non dsp or multimedia algorithmes by adding instructions like loads and stores takeing the offset from an AltiVec register or removing the operand size limitations of multiplications.
- support for 64 bit datatypes ( ints and floats )
- for fft's: support of complex numbers
- 3 vFPU modes: DAZ, Java, IEEE 754r ( with Stickybits in Statusregisters instead of exceptions ).
- like in some TI DSPs one bit in each AltiVec instruction indication if it has to be processed in a new cycle or in the current cycle.
- optionaly: consantant register for bitmasks, rotation counts etc. which could only be used as second source operand ( selectable by a spr linke VRSAVE ).

[Sorry posted in wrong language]

Intels Problem liegt darin das sie CPU intern nur 64 Bit Busse haben, was ihren Duchsatz schonmal auf die Hälfte reduziert. Dies ist aber nur das auffälligste Problem. Um nur ein paar der anderen zu nennen:
- keine Permutationseinheit.
- zu wenige Register.
- 2 Operandenformat zerstört die erste Quelle.

Ihr einziger Vorteil ist in meinen Augen das SSE seit SSE2 doubles mit 64Bit beherrscht.

Dennoch auf dauer wird AltiVec von SSE geschlagen werden, weil nurnoch Motorola auf AltiVec setzt. SSE ist jedoch jetzt schon in seiner 3. Version auf dem Markt. Bei AltiVec hat sich seit mindestens 7 Jahren nichts am Befehlsatz getan. Um AltiVecs Leistungsvorsprung für die nächstens Jahre sicher zustellen würde ich folgendes vorschlagen:
- Erhöhung der Registerbreite auf 512 Bit.
- Verbesserungen für die Verarbeitung von nicht Multimedia und DSP Algorithmen z.B. Speicherzugriffe mit AltiVec Registern als Offset und Multiplikationen mit mehr als 16 Bit x 16 Bit ->32 Bit.
- Unterstützung von Datentypen mit 64 Bit.
- Für FFTs: Unterstützung von Komplexen zahlen.
- 3 vFPU Modi: DAZ, Java, IEEE 754r ( Mit Stickybits in Statusregistern, keine Exceptions ).
- Wie bei einigen TI DSPs ein Bit im Befehl der angibt ob er in einem neuen Takt ausgeführt werden muss oder noch parralel zu den vorhergehenden.
- vllt. konstanten Register für Masken, Rotationsangaben etc. die nur als zweiter Operand auftauchen ( auswählbar über ein SPR ähnlich VRSAVE ).

Was habt ihr noch für vorschläge um AltiVec weiterzuentwickeln.

Author:  lu_zero [ Sat Jun 17, 2006 6:38 pm ]
Post subject: 

Quote:
Nevertheless AltiVec will be beaten by SSE3 in near feature, because Freescale is the only major CPU developer using AltiVec. AltiVec ist unchanged for over 7 years now. SSE has been updated 2 times since it's introduction. In order to secure AltiVec's speed advantage I would recommend the following steps:
- increasing the register width to 512 bit.
too much ad int IMHO, but probably you are already thinking about 4 long double
Quote:
- improving the workflow for non dsp or multimedia algorithmes by adding instructions like loads and stores takeing the offset from an AltiVec register or removing the operand size limitations of multiplications.
A bit hard to archive w/out having other limitations, again IMHO
Quote:
- support for 64 bit datatypes ( ints and floats )
4 64bit ints could be interesting...
Quote:
- for fft's: support of complex numbers
NO, PLEASE NO, too "complex"
Quote:
- 3 vFPU modes: DAZ, Java, IEEE 754r ( with Stickybits in Statusregisters instead of exceptions ).
=_= doesn't look sane, ok, I'm not so fond of floats and java.
Quote:
- like in some TI DSPs one bit in each AltiVec instruction indication if it has to be processed in a new cycle or in the current cycle.
- optionaly: consantant register for bitmasks, rotation counts etc. which could only be used as second source operand ( selectable by a spr linke VRSAVE ).
having too much means having high prices for high complexity.

altivec is nice also because is easy to code with it.

Surely adding a MIMD extension with 32x512bit registers could be interesting.

Still it's a matter of tradeof, x86 had loads of exotic feats: few use them, lots of silicon fat in your cpu good just for wasting energy.

an altivecII extension should be simple and to the point as the previous one, having support for wider vectors and just few operators to more would be more useful and won't make our cpus pink hidrogen propelled elephants...

still, just few applications are enjoying altivec, I'd try to get more from what we have before looking for something else.

Author:  popper [ Sat Jun 17, 2006 10:00 pm ]
Post subject: 

Quote:
The main issue with altivec is that too few people work on it and too few applications are getting optimized for it.

you may have fun with programs like jack the ripper and see the difference between some altivectorized pieces and non altivectorized ones.

Altivec is an impressive tool, quite more easier to use than any other SIMD and quite nice as results.

Please consider that g4 aren't exactly the "latest tecnology" and still something runs on them with reasonable performance.

(that said I should go back profiling h264 on ffmpeg to improve Romains code...)
i note that you place the H.264 and ffmpeg as an after thought and thats a shame.

i really hope that you and indeed all the Altivac people here (is there that many these days?) would really go to town on all the AVC/H.264 open code base so as to be able to use our G4/5 based machines for DVB encoding/decoding as a reasonable rate at the very least.

there seems to be a pure lack of will in improving all the open audio/Video code and thats a shame.

perhaps one day soon that will change, i hope so.....

for instance nothing in the open code base comes anywere close to the CoreAVC decoder ( optimised open source demo writers of old, gone commercial for AVC), and thats a big problem for even the x86 end users, the ppc users could benifit massively if the ppc AVC codebase were re-worked to maximise its potential and in turn that might boot the x86 codebase to take notice and enable improvments all round :)

Author:  DaBlitz [ Sun Jun 18, 2006 7:02 am ]
Post subject: 

Some intresting points there

i do feel however that x86 is becoming a bit to popular, all they have to do is say they are the best and everyone belives them, personally i am an ARM fan but i am really starting to like these PPC (pun intended) chips due to the huge multimedia performance they have

increseing the register width sounds great but i think thats the point where you are turning it into a graphics/CPU hybrid, and i think thats great but i can see keeping the Alnvec processor fed would start to get harder and harder (i guess thats why the bus width to ram is so huge), cant wait to see these OSW in action if they are the chip i think they are then there should be some crazy mem bus transfer capability

any one got some good Altvec tutorials they could point me to?

Author:  DaBlitz [ Sun Jun 18, 2006 7:05 am ]
Post subject: 

Just remebered that there is a retargetable libary for mathamatics that has optimisations for everything (altvec sse mmx) and many math and media libries to provide a generic acceleration frame work, wont give you the best performance but will give you a good boost, i will see if i can track it down again as it could help people integrate simd into thire code and help make Alt-vec'orising multimedia apps easier

Author:  Crest [ Sun Jun 18, 2006 8:50 am ]
Post subject: 

Quote:
Some intresting points there
increseing the register width sounds great but i think thats the point where you are turning it into a graphics/CPU hybrid, and i think thats great but i can see keeping the Alnvec processor fed would start to get harder and harder (i guess thats why the bus width to ram is so huge), cant wait to see these OSW in action if they are the chip i think they are then there should be some crazy mem bus transfer capability
Fedding AltiVec on a G4 can get really hard. The best way to solve this is imho local momory like in the Cell SPE's.

Author:  lu_zero [ Sun Jun 18, 2006 11:26 am ]
Post subject: 

Quote:
there seems to be a pure lack of will in improving all the open audio/Video code and thats a shame.
You should first check the source before making such complaints...

h264 is one of the codecs quite well covered about altivec optimizations, I'm still not happy with it, but isn't unoptimized at all!

Isn't lack of will but lack of time.

Keep in mind that people working on ffmpeg are using their free time.

Thank you for spitting in my face.

Author:  lu_zero [ Sun Jun 18, 2006 11:28 am ]
Post subject: 

Quote:
any one got some good Altvec tutorials they could point me to?
http://www.simdtech.org/altivec/documents/

Could be a good start.

Author:  popper [ Sun Jun 18, 2006 12:12 pm ]
Post subject: 

Quote:
Quote:
there seems to be a pure lack of will in improving all the open audio/Video code and thats a shame.
You should first check the source before making such complaints...

h264 is one of the codecs quite well covered about altivec optimizations, I'm still not happy with it, but isn't unoptimized at all!

Isn't lack of will but lack of time.

Keep in mind that people working on ffmpeg are using their free time.

Thank you for spitting in my face.
lu_zero, that was NOT my intent, and im sorry that you beleave it was, thanks again for your good work, i mearly wanted to draw attention for these matters to become far more well known outside a few people and perhaps have far more people look and help were they are able, and you have now clarifyed that, im hopeful now that in time all things will be great.

when i say 'lack of will' it does not mean a slagging off/slap in the face, if you will, NOT by ANY stretch of the imagination, it means theres far to much other important stuff going on to expand on the work already done.

you know, like the garden needs the grass& hedges cutting, but work and family take to much of your will to get around to doing it just yet, nothing more.

Author:  tarbos [ Sun Jun 18, 2006 12:30 pm ]
Post subject: 

Quote:
Nevertheless AltiVec will be beaten by SSE3 in near feature, because Freescale is the only major CPU developer using AltiVec.
Absolutely not!

-IBM PPC970
-IBM/MS Waternoose Xbox 360 CPU
-STI Cell
-IBM POWER6
-P.A. Semi PWRficient

Author:  Donar [ Sun Jun 18, 2006 1:47 pm ]
Post subject: 

Quote:
Intels most obvious problem ist that they use onle 64 bit busses for SSE3 this limits their throughput to the half of AltiVec's throughput...
That's right and I know the Headline is about SSE3...but i just wanted to say that it seems like Intel will introduce SSE4 with their new Core2 processor line, widening their bus to 128 bit and adding some instructions. This surely will increase (maybe in some cases double) throughput. So they seem to have learned a bit from PPC...

Page 1 of 1 All times are UTC-06:00
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/