All times are UTC-06:00




Post new topic  Reply to topic  [ 7 posts ] 
Author Message
 Post subject: MySQL benchmarks
PostPosted: Sat Oct 01, 2005 9:06 pm 
Offline

Joined: Tue Nov 02, 2004 2:11 am
Posts: 161
I've benchmarked a PPC optimized MySQL char compare function.
The char compare function is used for sorting of database results. (ORDER BY)

The current PPC function is scalar only as the MySQL sort often works with short, misaligned strings.
I'll now try to use Markos Altivec code for longer strings.

Here is the difference of the non PPC optimized function and the PPC optimized
Image


And here are some different systems compared
Image



Cheers
Gunnar


Top
   
 Post subject: Re: MySQL benchmarks
PostPosted: Sat Oct 01, 2005 10:54 pm 
Offline

Joined: Wed Oct 13, 2004 7:26 am
Posts: 348
Quote:
I've benchmarked a PPC optimized MySQL char compare function.
The char compare function is used for sorting of database results. (ORDER BY)

The current PPC function is scalar only as the MySQL sort often works with short, misaligned strings.
I'll now try to use Markos Altivec code for longer strings.
Congratulations the results are MOST impressive!!!

You might want to wait one day, i'm working on integrating Hobold's code in memcmp/etc as we speak :-)

Later, we might even look at the integrated sort+cmp function.

Konstantinos


Top
   
 Post subject: Re: MySQL benchmarks
PostPosted: Sun Oct 02, 2005 2:50 am 
Offline

Joined: Tue Nov 02, 2004 2:11 am
Posts: 161
Seems I've read your post too late.
I've tried your libvec memcmp routines.
They are quite fast. Faster than the normal MySQL C code.

For strings < 256 byte the scalar PPC-asm code that I use is only a few percent faster.
For strings longer than 256 chars Altivec seems to get faster.
But usually the strings that will be compared in MySQL are between 1-255 chars so this case is too rare to be very usefull.

Another problem that I see is that when we work wit hbigger arrays of strings longer than 256 chars then we have quickly run into a situation where the 2nd level cache hitrate degrates. At least on my 7447 with 512KB of cache.
While in theory Altivec could be usefull for strings longer than 256 chars it can not increase the thoughput anymore as we are limited by the memory bus.
Code:
#2st run - CACHE performance
#comparing 1000 string elements
#size mysql-C PPC-ASM libvec-scalar libvec-Altivec
32 =341 MB/s =1052 MB/s =722 MB/s =569 MB/s
64 =355 MB/s =1273 MB/s =869 MB/s =1075 MB/s
120 =364 MB/s =1454 MB/s =967 MB/s =902 MB/s
256 =365 MB/s =1551 MB/s =1024 MB/s =1585 MB/s
512 =309 MB/s =875 MB/s =689 MB/s =895 MB/s
1024 =223 MB/s =400 MB/s =364 MB/s =390 MB/s
Maybe the new memcmp can beat the numbers. :-)
I'm looking forward to test it.

Cheers
Gunnar


Top
   
 Post subject: Re: MySQL benchmarks
PostPosted: Sun Oct 02, 2005 3:06 am 
Offline

Joined: Tue Nov 02, 2004 2:11 am
Posts: 161
For those that like bar charts I've run some the benchmark on a few more platforms.

First the results for G3 (750cxe) and G5.

The G3 performance improves but it is limited by the smaller cache of the G3.
Image


The G5 performance noticeable better on longer string in huge arrays (10,000 elements) (Third bar) This can be explained by the good memory bandwith of the G5.
Image


Cheers
Gunnar


Top
   
 Post subject: Re: MySQL benchmarks
PostPosted: Sun Oct 02, 2005 3:10 am 
Offline

Joined: Wed Oct 13, 2004 7:26 am
Posts: 348
Quote:
Seems I've read your post too late.
I've tried your libvec memcmp routines.
They are quite fast. Faster than the normal MySQL C code.
Heh, nice to know that!
Quote:
Another problem that I see is that when we work wit hbigger arrays of strings longer than 256 chars then we have quickly run into a situation where the 2nd level cache hitrate degrates. At least on my 7447 with 512KB of cache.

While in theory Altivec could be usefull for strings longer than 256 chars it can not increase the thoughput anymore as we are limited by the memory bus.
Well, the nice thing you'll notice with bigger sizes is that the Altivec routines are guaranteed to be almost always faster, due to consistently used cache prefetching code, which is used for precisely that reason.
Quote:
Maybe the new memcmp can beat the numbers. :-)
I'm looking forward to test it.
As soon as I figure out Hobold's algorithm, I'll send you a modified routine :-)

Konstantinos

PS. I have thought about the integrated sort+memcmp routine and I have figured out an algorithm for that. I'll try to work on that afterwards, and if my idea works, I think we might even have a world record for AltiVec :-)


Top
   
 Post subject: Re: MySQL benchmarks
PostPosted: Sun Oct 02, 2005 3:33 am 
Offline
Genesi

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1422
We cannot empasize enough how important this sort of collaboration and development is! WOW! Hey hobold! Where are you?! :-)

Great work Gunnar and Konstantinos! :-D

R&B


Top
   
 Post subject: Re: MySQL benchmarks
PostPosted: Sun Oct 02, 2005 3:44 am 
Offline

Joined: Tue Nov 02, 2004 2:11 am
Posts: 161
Here a compare for all systems I could get my hands on. ;-)

First a benchmark of doing the compares needed to sort an arrays of 10,000 elements.
The arrays does contain the sort key and some additional selected columns (of 128 byte).
Good memory bandwith is key to score well here.
Image


Second benchmark is doing the compares needed to sort an array of 1,000 elements.
The smaller number of elements helps to better use the cache. Because of this most system improve here.
Image

A few things to mind when looking at the benchmark.

a) This is just an isolated test not a comprehencive CPU benchmark,

b) For all tests GCC was used.
GCC creates not equally good code for all platforms.
I think the code quality for x86 is quite good
But the rather disappointing scores of the Itanium
and the Sparc indicate that GCC is not the best choice for these systems.

c) The PPC systems highly benefit from the handoptimized code.
So the test is bias by design!

The purpose of the test was to show how effective even simble optimizing time critical routines can be.

Some analyzes:
The G3 is a bit limited by its small cache it can not improve the throughput when working at longer strings as they do not fit in the cache. Nevertheless for its low clock rate the G3 scores quite good.

The G5 has very good memory bandwidth. This is clearly visible on the green memory bound benchmark.

When working on arrays which better fit into the cache the G4 and the G5 score both very similar results.

Cheers
Gunnar


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 7 posts ] 

All times are UTC-06:00


Who is online

Users browsing this forum: No registered users and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
PowerDeveloper.org: Copyright © 2004-2012, Genesi USA, Inc. The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
All other names and trademarks used are property of their respective owners. Privacy Policy
Powered by phpBB® Forum Software © phpBB Group