Power Developer https://powerdeveloper.org/forums/ |
|
GCC compile brainstorming https://powerdeveloper.org/forums/viewtopic.php?f=61&t=1533 |
Page 1 of 1 |
Author: | gunnar [ Mon Apr 14, 2008 8:42 am ] |
Post subject: | GCC compile brainstorming |
Do you think it makes sense to post here some C-code snippes together with the ASM instruction compiled by GCC as examples on how GCC operates? The examples could be used for brainstorming and to identify patterns in the behavior of GCC. Cheers Gunnar |
Author: | markos [ Mon Apr 14, 2008 8:55 am ] |
Post subject: | Re: GCC compile brainstorming |
Quote: Do you think it makes sense to post here some C-code snippes together with the ASM instruction compiled by GCC as examples on how GCC operates?
Hi Gunnar,The examples could be used for brainstorming and to identify patterns in the behavior of GCC. Cheers Gunnar however important I think the discussion here is, it is much more important to be done in the proper place, ie in the gcc bugtracker and gcc mailing lists. It's much more probable to be fixed there by the right persons, and even the Freescale/CodeSourcery guys are probably following these. Of course if you don't mind, it would be interesting to read these here as well :) Regards Konstantinos |
Author: | Neko [ Mon Apr 14, 2008 7:39 pm ] |
Post subject: | Re: GCC compile brainstorming |
Quote: however important I think the discussion here is, it is much more important to be done in the proper place, ie in the gcc bugtracker and gcc mailing lists. It's much more probable to be fixed there by the right persons, and even the Freescale/CodeSourcery guys are probably following these. Of course if you don't mind, it would be interesting to read these here as well :)
I think the discussion is very relevant here :)However I do think the compiler performance shouldn't be Gunnar's goal here. We're talking about mimicking a 68k processor on ColdFire for a specific application. At this point a 200MB/s bus bandwidth for read is about 3x more than he would expect from a 68060 with EDO SDRAM with a 60ns access time. Certain versions of the GCC compiler generate adequate - if not performance - code (later versions, seemingly, do not). We know CodeWarrior and DIAB and GreenHills do better. In the end, mimicking the m68k does not rely on the compiler but the technique used, of which - as he has very competantly explained in his project and elsewhere - could be one of 3 or 4 or 5 different ways (perhaps a QEMU/UAE style virtual machine, or an instruction trap mechanism as with the 68000 fpsp or 68040/68060.library mechanisms on AmigaOS, or something like ShapeShifter/Sheepshaver MacOS emulation on the Amiga, where 90% of the instructions are run native but important differences are emulated for the purpose of seperation of operating systems). This is where the important work lies. Redefining the operation of GCC4 is a waste of time. You can code the emulator now, find the best method, and fix GCC later so that it enables compilation and linking of the best method with the least amount of manual hacking. But compiler reworks are the last resort - until, that is, you hit a compiler bug that refuses to generate working code or causes exceptions or doesn't even compile, THEN it is something worth fixing :) |
Author: | markos [ Mon Apr 14, 2008 11:52 pm ] |
Post subject: | Re: GCC compile brainstorming |
I agree that the discussion is relevant about Gunnar's project and any details around it. My point was about GCC bugs that would get too technical. It's not that *shouldn't* be here, it's that they should be on GCC bugtracker *too*! :) After all, for any project most bugs are found elsewhere rather than the bugtracker/mailing lists and then filed as bug reports upstream. In any case, don't mind me, please continue, it was an interesting read anyway :) Konstantinos |
Author: | gunnar [ Tue Apr 15, 2008 2:22 am ] |
Post subject: | Re: GCC compile brainstorming |
Hi Matt, Quote: At this point a 200MB/s bus bandwidth for read is about 3x more than he would expect from a 68060 with EDO SDRAM with a 60ns access time.
So far I have only measured 120 MB/sec read for the V4m.You can find a comparison of 680x0 and V4m results here: http://www.powerdeveloper.org/forums/vi ... 0621#10621 I hope this helps you. |
Author: | gunnar [ Tue Apr 15, 2008 3:25 am ] |
Post subject: | |
One GCC example: C-source Code:
void * copy_32x4a(void *destparam, const void *srcparam, size_t size)
Compile option: m68k-linux-gnu-gcc -mcpu=54455 -msoft-float -o example -Os -fomit-frame-pointer example.c{ int *dest = destparam; const int *src = srcparam; int size32; size32 = size / 16; for (; size32; size32--) { *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; *dest++ = *src++; } } We use -Os to focus on compact code. Generated code: Code:
04: 202f 000c movel %sp@(12),%d0
Code length produced by GCC = 56 Byte08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888 lsrl #4,%d0 12: 6022 bras 36 14: 2290 movel %a0@,%a1@ 16: 2368 0004 0004 movel %a0@(4),%a1@(4) 1c: 2368 0008 0008 movel %a0@(8),%a1@(8) 22: 2368 000c 000c movel %a0@(12),%a1@(12) 28: d3fc 0000 0010 addal #16,%a1 2e: d1fc 0000 0010 addal #16,%a0 34: 5380 subql #1,%d0 36: 4a80 tstl %d0 38: 66da bnes 14 3a: 4e75 rts Length of workloop = 9 instructions , 38 Byte Expected code: Code:
04: 202f 000c movel %sp@(12),%d0
Expected code length = 30 Byte08: 226f 0004 moveal %sp@(4),%a1 0c: 206f 0008 moveal %sp@(8),%a0 10: e888 lsrl #4,%d0 12: 6022 beq 20 14: 20d9 movel %a1@+,%a0@+ 16: 20d9 movel %a1@+,%a0@+ 18: 20d9 movel %a1@+,%a0@+ 1a: 20d9 movel %a1@+,%a0@+ 1c: 5380 subql #1,%d0 1e: 66da bnes 14 20: 4e75 rts Length of workloop = 6 instructions , 12 Byte Issue 1: Why does GCC not use the ConditionCodes already set by the 68k instruction but generates a unneeded test.l? Issue 2: Why does GCC not use the much more efficient (an)+ adressing mode but uses instead d(an) mode plus an extra add instrcution ? The (Ad)+,(Am)+ instruction is 2 Bytes instead of 6 Bytes. And (Ad)+,(Am)+ does not need the extra two instructions to increment the pointers. Issue 3: Assuming that GCC decided to increment a pointer manually. Why does GCC use addil to increment a pointer? LEA should be the better choice for this as its 2 bytes shorter than addi.l |
Author: | gunnar [ Tue Apr 15, 2008 4:04 am ] |
Post subject: | |
Second example: C-Code: Code:
void * write_32x4(void *destparam, const void *srcparam, size_t size)
{ int value=1; int *dst = destparam; size = size / 16; for (; size; size--) { *dst++=value; *dst++=value; *dst++=value; *dst++=value; } } Generated output Code:
<write_32x4>:
Generated code length = 46 Byte0a: 202f 000c movel %sp@(12),%d0 0e: 206f 0004 moveal %sp@(4),%a0 12: e888 lsrl #4,%d0 14: 601c bras 32 16: 20bc 0000 0001 movel #1,%a0@ 1c: 7201 moveq #1,%d1 1e: 2141 0004 movel %d1,%a0@(4) 22: 2141 0008 movel %d1,%a0@(8) 26: 2141 000c movel %d1,%a0@(12) 2a: d1fc 0000 0010 addal #16,%a0 30: 5380 subql #1,%d0 32: 4a80 tstl %d0 34: 66e0 bnes 16 36: 4e75 rts Length of Workloop: 9 instructions, 32 byte The expected result would be Code:
<write_32x4>:
Expected code length = 28 Byte0a: 202f 000c movel %sp@(12),%d0 0e: 206f 0004 moveal %sp@(4),%a0 12: 7201 moveq #1,%d1 14: e888 lsrl #4,%d0 16: 601c beqs 24 18: 21c0 movel %d1,%a1@+ 1a: 21c0 movel %d1,%a1@+ 1c: 21c0 movel %d1,%a1@+ 1e: 21c0 movel %d1,%a1@+ 20: 5380 subql #1,%d0 22: 66e0 bnes 18 24: 4e75 rts Length of Workloop: 6 instructions, 12 byte Issue 4: We see again the unneeded TST instruction. Issue 5: The Compiler again uses a much bigger and slower addressing mode. Issue 6: The preload of the work value into register D1 is done inside the work loop. The should be done outside of the main workloop. Issue 7: The compiler decides to put the literal work value #1 into the work register D1. But its not always using this work register, one time it uses a literal move.l #1, and thereby unneeded increasing the code by 4 bytes. Both GCC 4 examples have 9 instruction inside the workloop. Older GCC would solve the same task using a workloop of only 6 instructions. Generelly the new GCC 4 code is bigger and a lot slower than before. |
Author: | gunnar [ Tue May 06, 2008 1:25 am ] |
Post subject: | |
To help improve the Code generated by GCC for Coldfire/68K, I've filed the bugs 36133, 36134, 36135, and 36136 to the GCC-Compiler. |
Author: | kaltst [ Tue May 06, 2008 2:24 am ] |
Post subject: | gcc 3.4? |
Hi Gunnar, do you know by chance, if the gcc 3.4 code suffers of the same problems? |
Page 1 of 1 | All times are UTC-06:00 |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |