All times are UTC-06:00




Post new topic  Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Fri Aug 18, 2006 1:15 pm 
Offline

Joined: Tue Aug 15, 2006 5:08 am
Posts: 50
Location: France
Hi,

I've made running this simple little programme on a PowerPC and a Sparc computer:

#include <stdio.h>
#include <math.h>

int main()
{
float a;
float b;
int i;

a = 42222;
b = 3;

for(i=0; i<10; i++)
{
a = a / b;
}
for(i=0; i<10; i++)
{
a = a * b;
}
printf("%f\n", a);

a = 42222;
b = 3;

for(i=0; i<100; i++)
{
a = a / b;
}

for(i=0; i<100; i++)
{
a = a * b;
}

printf("%f\n", a);
}

The first goal, was to compare C and Java... but I ran the Java version on a x86, a Sparc and a PowerPC: they all gives the same result that differes from C result... nothing wrong with it Java does not compute floats as C.

Java gives this on both machines:
42222.008
41887.47

But, what about x86 and their strange FPU. A friend made it run on a x86... and... result differes from C on PowerPC and Sparc, and Java !!

PowerPC and Sparc machine gives this as results:
42222.007812
41887.468750

X86 gave less precise results... so does anyone have access to a x86 and Gcc in order to test with me how much less precise x86 are ?


Top
   
 Post subject:
PostPosted: Fri Aug 18, 2006 2:26 pm 
Offline

Joined: Fri Aug 18, 2006 12:58 pm
Posts: 4
Don't know much about x86 and Sparc. It's also pretty hard to speculate without the actual assembly outputs, as I don't know what kind of optimizations the compiler produces.

In any case, if you are testing on Freescale's MPC74[4,5]x family of processors, be aware that all single precision instructions except division are done in double precision format behind the instruction format facade, and then converted back to single precision to fit the operand size. Notice that conversion may involve round-to-nearest, which gives the performing architecture an accuracy advantage.

Thus, I speculate that you take for granted your PowerPC multiply code's better accuracy, expecting the same from other architectures that may not do such conversion.


Top
   
 Post subject:
PostPosted: Fri Aug 18, 2006 2:44 pm 
Offline

Joined: Tue Aug 15, 2006 5:08 am
Posts: 50
Location: France
I didn't use any special optimisations on the compiler. I'm not benchmark anything regarding to speed. I just wanted the FPU to run "as is".

But, I had never heard about single to/from double precision conversion in powerPC... and how do you explain that Sparc produce the same results... ( Sparc can compute in quad precision, Sparc could work from/to single from /to quad precision, giving better result than powerpc ).

But, this test can be done with double instead of float if you prefer ?


Top
   
 Post subject:
PostPosted: Fri Aug 18, 2006 4:36 pm 
Offline

Joined: Fri Aug 18, 2006 12:58 pm
Posts: 4
What's your model of comparison: single precision accuracy of PowerPC vs Solaris vs x86?

You can't just do this by comparing the ISAs. You need the actual processor names and their FPU characteristics. You also need to know what assembly is being generated by your compiler:

IEEE754 defines most properties, but issues like intermediate representation are left to hardware implementers to worry about. Also, it is imperative to know if the compiler treats your "float" declarations as "double" ones (for each particular architecture). Ultimately, you do need to look at the disassembly.

Actually, I don't even know what the purpose of your experiment is. If you're trying to find a hardware bug or just playing, you need to learn about the above topics and how they pertain to what you are trying to find out.


Top
   
 Post subject:
PostPosted: Fri Aug 18, 2006 4:40 pm 
Offline

Joined: Fri Aug 18, 2006 12:58 pm
Posts: 4
change "for each particular architecture" to "for each particular platform"


Top
   
 Post subject:
PostPosted: Sat Aug 19, 2006 1:58 am 
Offline

Joined: Tue Aug 15, 2006 5:08 am
Posts: 50
Location: France
I'm using gcc with non option on both plateform ( Mac OS X and Solaris ).

The goal of this test if to find someone who could run this test on gcc on a x86 machine in order to find how different results can be on this particular processor regarding to the fact that Sparc and PowerPC produre the same result that can be considered as reference.

My goal is to show that if you do computer calculations, you can switch from or to PowerPC, Sparc ( hope to find Mips ) because those processors are good citizens in their FPU impletantion. But, you cannot use x86 "transparently" in order to replace well running Risc machines.


Top
   
 Post subject:
PostPosted: Mon Aug 21, 2006 10:25 am 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Quote:
My goal is to show that if you do computer calculations, you can switch from or to PowerPC, Sparc ( hope to find Mips ) because those processors are good citizens in their FPU impletantion. But, you cannot use x86 "transparently" in order to replace well running Risc machines.
If you use x86 like Apple uses x86 then you will get an 80-bit intermediate floating point precision if you use double-precision numbers.

Therefore you could do BETTER if you use x86 over a well-running RISC box, if you needed double precision numbers, with higher precision.

An alternative on PowerPC or SPARC might be to use the vector unit (there are lots of generic multiprecision math libraries around), however this is also available on x86.

I don't think you can prove one way is better than the other, therefore.

Also; you should attempt to set the PowerPC CPU in and out of Java FPU mode and see if it makes any difference for you. You may need a more comprehensive test to see the actual difference, however.

_________________
Matt Sealey


Top
   
 Post subject:
PostPosted: Mon Aug 21, 2006 11:34 am 
Offline

Joined: Tue Aug 15, 2006 5:08 am
Posts: 50
Location: France
Quote:
Therefore you could do BETTER if you use x86 over a well-running RISC box, if you needed double precision numbers, with higher precision.
It SHOULD be better while x86 are using pseudo registers with 80bits area. But, my early tests (on x86 lend by a friend) shows that it wasn't. I think that 80bits -> 32 bits -> 80bits -> 32 bits conversions and so one introduce poor precesion regarding to PowerPC ( and Sparc ... ). Double 64 bits registers should give other results, but let tests.

So, as I don't have an x86 machine ( i'm lucky ) with GCC. I'm looking for someone here that could help me to achieve tests.


Top
   
 Post subject:
PostPosted: Tue Aug 22, 2006 2:03 am 
Offline

Joined: Mon Feb 06, 2006 5:51 pm
Posts: 3
The 80-bit Extended double precision format on the intel/amd machines is typically for the storage of intermediate results. As far as precision is concerned however, you must be sure that the rounding mode used by the FPU is the same on both processors. This issue can be compounded if you store your intermediate results as 32bit, as the rounding takes effect here, and then load them back into the FPU, as rounding may take place several times across the course of a block of calculations; The more stores and loads, the greater the loss of precison may turn out to be.
Performance-wise, if you are running powerpc vs. x86, then powerpc will outperform the x87, due in no small part to the hideous stack-based programming model of that processor (unless you manage to do something horribly wrong).
Comprehensive testing probably requires you to implement your tests in assembler, simply beacuse differences in the implementation of a high level language could transparently skew the results.


Top
   
 Post subject:
PostPosted: Thu Aug 24, 2006 1:54 pm 
Offline

Joined: Tue Aug 15, 2006 5:08 am
Posts: 50
Location: France
Thanks for your clear informations.

But the goal was to test the better precision of PowerPC ( and Sparc ) using GCC as many programs ( and scientists programs !! :cry: ) runs using GCC. I think about Folding at Home, or Sety at home ...

Those guys should better pay attention to this ...


Top
   
 Post subject: ... and x86 precision
PostPosted: Mon Aug 28, 2006 11:45 am 
Offline

Joined: Tue Aug 15, 2006 5:08 am
Posts: 50
Location: France
So, I've got results for X86, from a friend of me.

42222.007813 : for 10 iterations
41887.468750 : for 100 iterations

So, X86 produces strange results. After 10 iteration error is more important than on PowerPC... after 100 iterations it give the same results...

I still looking for someone who can help me to test this program with different compiltaiont options on X86.

I've tested it on a Power, using AltiVec for float computation, it gives the same results. It quit logic, on PowerPC, we are on a robust, clear and coherent architecture.

But, as we can see people can melt achitecture on a cluster. You can use Risc processors ( Power PC and Sparc ), but you can't melt X86 with anything else...


Top
   
 Post subject:
PostPosted: Tue Aug 29, 2006 5:54 am 
Offline
Genesi

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1422
Hi Christian, can you make a detailed statement here on this thread exactly what you need to know/want to be supported on? We will direct some people here to have a look. Thanks.

R&B :)

_________________
http://bbrv.blogspot.com


Top
   
 Post subject:
PostPosted: Wed Aug 30, 2006 1:32 pm 
Offline

Joined: Mon Feb 06, 2006 5:51 pm
Posts: 3
Hi there
Ok, the discrepancy you see is probably down to one of a few possible factors.
First off, it is necesary to ensure that the rounding used by the altivec unit on power, and the FPU on x86 has the same setting (usually round-up, but can typically be set to round down or round-to-nearest)
The next variable is whether interim values are being stored in main memory, rather than contained in the FPU or vector unit for the duration. If this is the case, then the least significant bits of your float may be nuked every time the float is stored, as the rounding methods are operational here too, and this could vary from language to language, and across implementation of the same language on different processors.
Finally, there may actually be a nominal error caused by the routine which is decoding your float and printing it out on screen (in fact I can guarantee this because I tested various methods some months back to determine just what kind of error I might expect to get using different float to text convertion methods).
I would recommend that you write your own routine to perform these calculations, and save them to a data file. Then write a program which decodes this data file using a routine that you have written yourself to insure that the method for decoding the float is the same across languages and processors (this may require assembly). Also ensure that the rounding-method is identical across processors. Otherwise any results you get are skewed by default.
In a nutshell, eliminate all but one variable at a time.
Hope this helps.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 13 posts ] 

All times are UTC-06:00


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
PowerDeveloper.org: Copyright © 2004-2012, Genesi USA, Inc. The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
All other names and trademarks used are property of their respective owners. Privacy Policy
Powered by phpBB® Forum Software © phpBB Group