All times are UTC-06:00




Post new topic  Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Sat Sep 24, 2005 10:25 pm 
Offline

Joined: Tue Nov 02, 2004 2:11 am
Posts: 161
I would like to change some routines in MySQL to get better performance.
It would be great if you could read the below and give me your feedback.

SHORT:
Today I'm looking for the optimal strncmp function to compare strings of 120 chars.


LONG
I'm looking for a the best sort algorythm for strings of size of about 120 chars.

WHY - A big part of MySQL involves sorting.
Often the database server fetches a number of records which need then to be sorted
The sorting function in MySQL is called "filesort".
Filesort it is a sort which normally operates on a memory buffer
but can operate with limited memory by storing sorted buffers to disk.
Filesort uses quicksort or radixsort depending on the number of rows to sort.

The majority of the sorts are done using the quicksort algorythm with a memory buffer. So my hope is to optimize this case.
I think that optimizing this common case will be worthwhile as e.g.:
the Sysbench database benchmark spents 20% of its time sorting arrays
of 100 element containing strings of 120 char length.


BUT HOW CAN WE OPTIMIZE THIS EASELY?

The mysql quicksort currently uses a scaler strncmp function which works on bytes.
I feel that this functions could be easely improved.

The function looks like this:

#define cmp(N) if (first[N] != last[N]) return (int) first[N] - (int) last[N]

static int ptr_compare_0(uint *compare_length,uchar **a, uchar **b)
{
reg3 int length= *compare_length;
reg1 uchar *first,*last;

first= *a; last= *b;
loop:
cmp(0);
cmp(1);
cmp(2);
cmp(3);
if ((length-=4))
{
first+=4;
last+=4;
goto loop;
}
return (0);
}

The main work loop runs over 4 bytes.
If the strings are not divideable by 4 then single cmp are done before the loop begins
Example:

static int ptr_compare_1(uint *compare_length,uchar **a, uchar **b)
{
reg3 int length= *compare_length-1;
reg1 uchar *first,*last;

first= *a+1; last= *b+1;
cmp(-1); /// COMPARE BEFORE THE LOOP
loop:
cmp(0);
cmp(1);
cmp(2);
cmp(3);
if ((length-=4))
{
first+=4;
last+=4;
goto loop;
}
return (0);
}

You can find the complete source in the mysql source archive in "/mysys/mf_sort.c" and "/mysys/ptr_cmp.c"


Working on single bytes is probably not the most effective solution. :-)
I assume that running on longwords will be a lot faster.
At which strlength will the usage of altivec give an advantage?


Cheers
Gunnar


Top
   
PostPosted: Sat Sep 24, 2005 10:45 pm 
Offline

Joined: Wed Oct 13, 2004 7:26 am
Posts: 348
Hi Gunnar,

Sorry I did not reply to your email, I've been sleeping at 5 am every day these past weeks, and not because of night clubbing unfortunately :-(

But that's not important, what is importantis WHY :-)

I'm almost finished with libfreevec and you should expect an announcement soon!

Please, give me a couple of days for that...

And again sorry for not replying, I certainly did not ignore your email :-)

Konstantinos

PS. This function seems quite easy to optimize, I could do it unless someone else beats me first :-)
PS2. I've already done strcmp/strncmp which works quite well on this sizes.
UPDATE: altivec strncmp is /up to/ 3.5x faster, though I'm pretty sure I can optimize it even more...


Top
   
PostPosted: Sun Sep 25, 2005 1:35 am 
Offline

Joined: Tue Nov 02, 2004 2:11 am
Posts: 161
Hi Markos,

Don't worry about the email. I kow what you mean I'm often very busy too. :-)

My hope was that this function is easy to improve.
The Sysbench benchmark spends about 6% of the time in the small strncmp function. If we could improve the performance of this functions, then this would be an easy win.

BTW we could use different strncmp functions for different str length if the altivec is better at certain length. For scoring well in the benchmark it would help to find the best strncmp for strings with 120 chars.

I'm looking forward to any proposals.

Cheers

Gunnar


Top
   
PostPosted: Sun Sep 25, 2005 3:11 am 
Offline

Joined: Wed Oct 13, 2004 7:26 am
Posts: 348
Quote:
BTW we could use different strncmp functions for different str length if the altivec is better at certain length. For scoring well in the benchmark it would help to find the best strncmp for strings with 120 chars.
Ok, do you have control as to the alignment of the data?
If we can force the data to be 16-byte aligned, then we can certainly have *great* speed up, Eg, a strncmp would need just 8 vector loads and checks for ~128 byte buffers.

For now at these sizes, my altivec version gives just 20-30% speedup. For bigger sizes (eg. 2k or 16k) though the difference reaches 200-400% (yeah I redesigned it a bit :-), even for unaligned data. Though, again I'm pretty sure I can do it faster yet.

But a little patience, I'm going to release the lib soon... :-)

Konstantinos


Top
   
PostPosted: Sun Sep 25, 2005 8:11 pm 
Offline

Joined: Tue Nov 02, 2004 2:11 am
Posts: 161
Quote:
Ok, do you have control as to the alignment of the data?
If we can force the data to be 16-byte aligned, then we can certainly have *great* speed up, Eg, a strncmp would need just 8 vector loads and checks for ~128 byte buffers.
We have control over the alignment.
Aligning to longword (4byte) should be no problem.

But aligning the data on 16 byte boundary could give problems.
Patting long arrays of small strings to 16 byte boundary
will increase the memory requirement a lot.
If we need more memory than we have buffers then we will need swap to disk.

Please mind that the max string size for sorting is typically 255 byte.
So we should look for algorythms giving best performance at max 255 byte siz.e


Cheers
Gunnar


Top
   
PostPosted: Sun Sep 25, 2005 9:25 pm 
Offline

Joined: Wed Oct 13, 2004 7:26 am
Posts: 348
Quote:
We have control over the alignment.
Aligning to longword (4byte) should be no problem.

But aligning the data on 16 byte boundary could give problems.
Patting long arrays of small strings to 16 byte boundary
will increase the memory requirement a lot.
If we need more memory than we have buffers then we will need swap to disk.

Please mind that the max string size for sorting is typically 255 byte.
So we should look for algorythms giving best performance at max 255 byte siz.e
Ok, I will send you today a strncmp function that assumes word alignment.

Konstantinos


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 6 posts ] 

All times are UTC-06:00


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
PowerDeveloper.org: Copyright © 2004-2012, Genesi USA, Inc. The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
All other names and trademarks used are property of their respective owners. Privacy Policy
Powered by phpBB® Forum Software © phpBB Group