Here a compare for all systems I could get my hands on.  
 
First a benchmark of doing the compares needed to sort an arrays of 10,000 elements.
The arrays does contain the sort key and some additional selected columns (of 128 byte).
Good memory bandwith is key to score well here.
 
Second benchmark is doing the compares needed to sort an array of 1,000 elements.
The smaller number of elements helps to better use the cache. Because of this most system improve here.
 
A few things to mind when looking at the benchmark.
a) This is just an isolated  test not a comprehencive CPU benchmark,
b) For all tests GCC was used.
GCC creates not equally good code for all platforms.
I think the code quality for x86 is quite good
But the rather disappointing scores of the Itanium
and the Sparc indicate that GCC is not the best choice for these systems.
c) The PPC systems highly benefit from the handoptimized code.
So the test is bias by design!
The purpose of the test was to show how effective even simble optimizing time critical routines can be.
Some analyzes:
The G3 is a bit limited by its small cache it can not improve the throughput when working at longer strings as they do not fit in the cache. Nevertheless for its low clock rate the G3 scores quite good.
The G5 has very good memory bandwidth. This is clearly visible on the green memory bound benchmark.
When working on arrays which better fit into the cache the G4 and the G5 score both very similar results.
Cheers
Gunnar