All times are UTC-06:00




Post new topic  Reply to topic  [ 43 posts ] 
Author Message
 Post subject:
PostPosted: Fri Mar 14, 2008 2:22 am 
Offline

Joined: Mon Jan 08, 2007 3:40 am
Posts: 195
Location: Pinto, Madrid, Spain
I have further responses from my brother. I'm aware that perhaps you think I'm kidnapping this thread with this discussion, but he has been investigating a lot, and deserves a further post here for his valuable time. I'm also aware that it might be edited out. Here it goes:
Quote:
Quote:
Also remember the DIU is *not* like typical PC AGP/PCI Express graphics
engines; it does not rely on a PCI arbiter to grant it time to access the graphics RAM, it is just a DAC
which spins over some memory areas, and an interface to the same RAM the CPU uses.
In fact, all my assumptions are based on the fact of an INTERNAL LCD controller unit, NOT an PCI/AGP based one. I'm not talking about PCs but SoCs.
The point is precisely that the LCD controller uses the SAME memory and the same bus the main core uses. That means the LCD controller "stealing" clock cycles for each pixel drawn on screen.
Quote:
well, imagine you did a non-displaying benchmark of decode performance of a 1080p video on a command line. The speed will be the same.
If the internal LCD controller does work is I guess (I have no access to technical datasheets, I'm guessing and internal pixel FIFO retrieving data at DIU clock), I'm afraid not, as I have seen on other
several SoCs. Try the same benchmark with the LCD controller DISABLED (no video) and check the results - you should see a difference, based on the
extra clock cycles available to the core, not used by the LCD controller.
Keep in mind that even if no PCI bus arbitration exists, there still exists an internal bus arbitration - several internal SoC's devices trying to access memory.
Quote:
Are you saying a 533MHz 64-bit bus can't handle the bandwidth required?
Absolutely NOT! Of course not! The bandwidth to retrieve a full HD picture is obviously there. For instance, it is absolutely possible to render a user interface on full HD. My point is that decoding MPEG4 video (more in this later) is a really hard task for ANY processor, aggraviated by the need to do the colorspace conversion by code (easy, but not "free"), re-scaling (intensive), etc.
Quite possibly, using well optimized code, it could be possible to decode MPEG4 HD video in a memory buffer, without any other overhead. Add to that the colorspace conversion (granted, on the fly, but definetely NOT free) and I guess you are putting the core to the limits. And, if enabled under that stress, you loose a considerable amount of bandwidth by the internal LCD controller. For instance, color conversion of a full HD video at 30 fps involves processing 62 million pixels per second. Add a couple or three clock ticks per pixel and you have 150MIPS devoted to colorspace conversion...
Quote:
Windows DirectShow, the VMR9 renderer in RGB mode is an incredibly resource intensive thing with a lot of overhead.
Probably because of the colorspace conversion and the fact that is architected for non-overlay-enabled VGAs, so uses plain bitblts to transfer the decoded video onto video memory. The preferred method of video rendering is using the overlay filter. It's there.
Quote:
to play 1280x720p H.264 content because my 1.7GHz Pentium M, Radeon 9200 laptop manages it just as well - using the ffdshow codec and not a commercially optimized one.
To play 720p of course not. And keep in mind that ffdshow is hand-optimized (to my knowledge, up to SSE2 extensions).

In "http://en.wikipedia.org/wiki/YUV" you can find YUV to RGB convertion formulas. And yes,
DivX y XviD are MPEG4-ASP (Advanced Simple Profile) implementations, while h.264 is MPEG4 AVC (Advanced Video Coding), and are computationally equivalent DEPENDING ON THE CHOSEN PROFILE. Obviously, if I unleash a H.264 compressor, ANY CPU has it really hard, but there's no need.
Quote:
How many years have Microsoft been supporting video decoding on PowerPC? Not even one!
Can I pass you on a CD with Windows NT 4.0 for PowerPC? It included NetMeeting, with the world famous MP43 codec that got hacked and made DivX possible (an the rest).
It's almost eight in the evening... I've been since half past five reading information... And my head feels like a drum.


Top
   
 Post subject:
PostPosted: Fri Mar 14, 2008 2:23 am 
Offline
Genesi

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1422
We will see if we can't get a sample of the development board sent to a developer gathering soon.

R&B :)

_________________
http://bbrv.blogspot.com


Top
   
 Post subject:
PostPosted: Fri Mar 14, 2008 8:33 am 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Quote:
In fact, all my assumptions are based on the fact of an INTERNAL LCD controller unit, NOT an PCI/AGP based one. I'm not talking about PCs but SoCs. The point is precisely that the LCD controller uses the SAME memory and the same bus the main core uses. That means the LCD controller "stealing" clock cycles for each pixel drawn on screen.
It does not steal clock cycles, it is allowed them by the MPX bus. Even with a relatively high resolution screen it can only take up a maximum of a quarter of the bandwidth of the entire MPX bus if it was requesting 64-bit pixels at the maximum rate possible; there is no display resolution supported which actually requires this kind of bandwidth. In fact it barely needs 1/8th the bandwidth of the MPX bus to do it; something which I think you will find, is fairly normal utilization by any part of the chip.

The OCeaN bus the rest of the chip is sitting on, is designed for high levels of transaction concurrency, has a non-blocking crossbar and has significant bandwidth per-port on each connected unit.

The MPX coherency module has plenty of buffer space (8 cache lines each for read and write), can read data from the CPU L2 cache back into the buffers, and is very low-latency.

It is actually very hard to imagine that you would be flooding the system bus with MPEG4 data (1920x1080@60Hz with 4-byte RGB pixels requires 497MB/s write speed to memory at that rate; but remember the MPEG was only 30 frames per second so it's actually half that. There is probably 10MBit/s of bandwidth required to manage reading and maintaining the data at that resolution, and decoding overhead and disk drive and network activity and audio and not have enough left over on a bus capable of some 5GB/s when bursting.

The DIU requires less than 64MB/s - less than 46MB/s in 24-bit RGB mode - of actual bandwidth to display that data. How is it "stealing" so much bandwidth as to make it an impossible challenge? You would have to be running something SO incredibly bandwidth intensive, and the chip would actually have had to be designed by a moron with far less bandwidth than required for each unit alone, let alone more than one of them working in tandem.
Quote:
you should see a difference, based on the
extra clock cycles available to the core, not used by the LCD controller.
I don't think you would. Unless you are already saturating the bus (practically impossible given the architecture of the bus), you would never notice it.

There would be a problem if doing extremely heavy DMA operations and running out of bus bandwidth; we see that on the Efika if many BestComm tasks run at once, and CPU load is extremely high, the bus does get contended and prioritisation is key for performance here. We are going to see the same problem with the MPC5121E and the DIU where performance may actually suffer for using internal graphics; but only in extremely high resolutions. Of course the MPC5121E cannot support those resolutions :)

A 1GHz Pentium M chip can decode 720p H.264 video with modest hardware assistance from an nVidia GeForce card, in the Apple TV. It can downscale 1080p video to 720p. It has an external memory bus and a northbridge, not a low-latency internal one. The graphics have to go over the PCI Express bus to the graphics chip - including all the high latencies involved. Yet the Apple TV manages it very very easily.

I am fairly sure, given the performance of the Pentium M in the laptop I have here, that a slightly faster chip could decode 1080p video; because with power management on (running at 600-1000MHz), I can decode MPEG2 and MPEG4 1080p video (and scale it down to 720p, my screen isn't that big) without skips, on Windows, with an *open source codec*.

The MPEG4 demonstration here is running *without* the overhead of an OS like Linux. It plays MPEG4 files, and it's a specially optimized codec for commercial use. It is ABSOLUTELY possible. The DIU does not make a dent in the bandwidth required and the chip itself is up to the task. With some clever management of the system resources, it could even be possible inside an OS like Linux.
Quote:
DivX y XviD are MPEG4-ASP (Advanced Simple Profile) implementations, while h.264 is MPEG4 AVC (Advanced Video Coding), and are computationally equivalent DEPENDING ON THE CHOSEN PROFILE.
Except they are not. Not at all.
Quote:
Can I pass you on a CD with Windows NT 4.0 for PowerPC? It included NetMeeting, with the world famous MP43 codec that got hacked and made DivX possible (an the rest).
I already have that CD, and let me just ask: do you think Microsoft have been painstakingly maintaining their PowerPC codebase since 1996, adding AltiVec support, simultaneous multithreading, 64-bit awareness, DirectShow support, in case they ever made a high-def games console?

The answer is: No. Most of the codec support for XBox360 has been written for the XBox360, for best performance on the XBox360 - ostensibly from scratch and not for a Video For Windows codec from 1996. The more likely code path they took is to take the C reference of their WMV codec, and recompile it for POWER, then add in the optimisations where needed.

They still have some way to go. It is not in any way some kind of "proof" that you need a 3-core, 6-way multithreading, 3.2GHz 64-bit chip with improved AltiVec to decode MPEG4 video. The G4 is a significantly different - actually far more efficient - design, and there is plenty of scope in the chip to do the demo.

If you still think it's fake, then feel free to bash your head against the wall some more with your inaccurate assumptions. I can assure you, it is not fake, it is possible, and we'll have to just go so far as proving it.

_________________
Matt Sealey


Top
   
 Post subject:
PostPosted: Fri Mar 21, 2008 6:17 pm 
Offline
Genesi

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1422
The new boards will be here next week. We will keep you posted.

R&B :)

_________________
http://bbrv.blogspot.com


Top
   
 Post subject: 8610 missing parts
PostPosted: Sun Mar 23, 2008 3:30 pm 
Offline

Joined: Sun Mar 23, 2008 2:39 pm
Posts: 4
Location: Sweden
Did a bit of googling for a few chips to connect to the 8610..

USB2 controller for PCI interface:
http://www.via.com.tw/en/products/perip ... b/vt6210l/

Network controller for PCIE interface:
http://www.realtek.com.tw/products/prod ... &ProdID=12

SATA controller for the PCIE interface:
http://www.siliconimage.com/products/product.aspx?id=32

AUDIO codec for the serial interface:
http://www.realtek.com.tw/products/prod ... l=5&Conn=4

Did not see any multi function chips, but maybe there are some..


Top
   
 Post subject: Re: 8610 missing parts
PostPosted: Sun Mar 23, 2008 5:26 pm 
Offline

Joined: Sun Mar 23, 2008 2:39 pm
Posts: 4
Location: Sweden
More PCIE network controllers:
http://www.intel.com/design/network/pro ... ard_ec.htm
http://www.broadcom.com/products/Small- ... ontrollers


Top
   
 Post subject: Re: 8610 missing parts
PostPosted: Mon Mar 24, 2008 7:29 am 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Quote:
Did a bit of googling for a few chips to connect to the 8610..
Don't worry we can use Google too :)
Quote:
Did not see any multi function chips, but maybe there are some..
Well, there's this..

_________________
Matt Sealey


Top
   
 Post subject: Re: 8610 missing parts
PostPosted: Mon Mar 24, 2008 8:34 am 
Offline

Joined: Sun Mar 23, 2008 2:39 pm
Posts: 4
Location: Sweden
Quote:
Quote:
Did not see any multi function chips, but maybe there are some..
Well, there's this..
Hmm.. so this "A link Xpress II" interface is just a PCIE interface
with a marketing name then. Nice.

Needs Ethernet though :)


Top
   
 Post subject: Re: 8610 missing parts
PostPosted: Tue Mar 25, 2008 7:32 am 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Quote:
Quote:
Hmm.. so this "A link Xpress II" interface is just a PCIE interface with a marketing name then. Nice.
Yep.
Quote:
Needs Ethernet though :)
But it has at it's core everything else you would need. Ethernet is one of those things that isn't worth building into certain kinds of chips - if a 10/100 ethernet was built in to the AMD SB600 then board designers would need to source gigabit ethernet. If it had gigabit ethernet it would need a different, more expensive kind of PHY. What if you just want a board that has USB, SATA and Wireless, but not fixed ethernet? All these choices mean they can cut a couple dollars off the price of the chip and give designers more flexibility with their target market.

The same kind of decision is why there is no ethernet in the MPC8610 in the first place; the target market (image processing in the MPC8610 case, or desktop PC boards in the SB600 case) dictates that it is not an appreciated integration.

Luckily we have a wealth of ethernet chips to choose from, it's not as difficult as finding a full-featured southbridge :D

_________________
Matt Sealey


Top
   
 Post subject:
PostPosted: Fri Oct 24, 2008 12:07 pm 
Offline

Joined: Sat Jul 29, 2006 8:29 am
Posts: 7
Location: Japan
Freescale posted a datasheet of the MPC8610 which includes the power consumption spec.

http://www.freescale.com/files/32bit/do ... umentation


---------------
-lonelywild
---------------


Top
   
 Post subject:
PostPosted: Fri Nov 07, 2008 11:19 am 
Offline

Joined: Thu Nov 11, 2004 7:34 am
Posts: 130
Location: Bielefeld, FRG
Quote:
Quote:
The MPC8610 evaluation board playing a high definition (1080p) DivX video?
It HAS to be a fake. Technical reasons upon request. I've received lots of them, but would need translation.
Last weekend in Bad Bramstedt a h264 1080p video was fluently replayed on a 1.42 GHz Mac Mini running MPlayer/MorphOS. Kind of amazing what altivec is able to do! Since I haven't been myself at that event, I don't know the exact parameters of taht demonstration, but it is impressive I'd say...


Top
   
 Post subject:
PostPosted: Fri Nov 07, 2008 2:15 pm 
Offline
Site Admin

Joined: Fri Sep 24, 2004 1:39 am
Posts: 1589
Location: Austin, TX
Quote:
Last weekend in Bad Bramstedt a h264 1080p video was fluently replayed on a 1.42 GHz Mac Mini running MPlayer/MorphOS. Kind of amazing what altivec is able to do! Since I haven't been myself at that event, I don't know the exact parameters of taht demonstration, but it is impressive I'd say...
Well, it's definitely, definitely possible. You do need a ~1.42GHz G4 to do it, but this is your basic minimum specification for getting the data decoded and passing it to a video card overlay.

The world is changing though; ATI graphics cards after a certain point do not really support video overlay and nVidia cards have not for a longer time. To get YUV output you need to render to a texture and this incurs something of a performance penalty (since texture setup is a pain in the backside compared to copying data to a YUV buffer and changing a pointer). You do, however, gain several advantages in that this texture can be whipped around on a couple of triangles, or even a complicated mesh, and have shader operations posted on it.

When you're dealing with a high-end desktop operating system most of the time you will not have the performance available to the task doing the decoding to get that kind of output. Linux is by far the worst, Windows and MacOS X kind of hit a level playing field here both being as clunky and having too much going on in the background. This is why Apple and Microsoft's minimum requirements are somewhat artificially high.

(sometimes audio playback has an impact too, but as long as you are not doing any mixing or enhancement, it's absolutely negligible compared to the overhead of H.264 decoding and inline deblocking)

Since MorphOS has no MMU context switch to perform, an extremely lean kernel and almost direct graphics card access (you can write to the screen from any process without asking for it, although this would get you hit with a large blunt object if you told anyone this was how your app worked :), plus the rather unique architecture of MPlayer* (it just hooks things together on a kind of fast path, it is not as complex as a filter graph in Quicktime, DirectShow, GStreamer or even Reggae for a direct comparison) this all becomes very possible indeed using those demo specs.

However it is just as contrived a demo and just as contrived a comparison compared to the mainstream operating systems as it bears no comparison at all to the more complicated - and much better - multimedia frameworks available. It only goes to show that MorphOS is lighter on resource usage by a good double digit percentage figure than other operating systems, such that where you would previously need a demo whereby it ran directly on the CPU with no complicated OS behind it (and the only task running is the decoder, as in the MPC8610 demo on Youtube).

This is probably one thing where MorphOS could be used to actively sell some chips, though, and show the real performance potential rather than just want Linux lets shine through a dull window.

I would love to see how MorphOS performs doing H.264 rendering through Reggae, though, that would be a sight to see, and something to be really proud of.

By the way I do wonder if PCI video memory is mapped cacheable by X.org on PowerPC. This is done on x86 (using an MTRR to mark it write-combined) to get speed boosts and it's the first thing to do when your video performance is "slow" (since the processor has to write to the graphics memory to copy the YUV data to the card for overlay or mapping to a texture)... in theory PPC processors will "store gather" to cacheable memory. I don't know if that's true of anything over the PCI bus.. but I'm going to check a couple systems in the lab right now and find out :D

_________________
Matt Sealey


Top
   
 Post subject:
PostPosted: Fri Nov 07, 2008 4:18 pm 
Offline

Joined: Wed Oct 25, 2006 7:18 pm
Posts: 42
Quote:
Quote:
Quote:
The MPC8610 evaluation board playing a high definition (1080p) DivX video?
It HAS to be a fake. Technical reasons upon request. I've received lots of them, but would need translation.
Last weekend in Bad Bramstedt a h264 1080p video was fluently replayed on a 1.42 GHz Mac Mini running MPlayer/MorphOS. Kind of amazing what altivec is able to do! Since I haven't been myself at that event, I don't know the exact parameters of taht demonstration, but it is impressive I'd say...
About that 8610 above. Its not a fake. We tried on our powerdev meeting without hardware acceleration (cpu scaling), in frame buffer (onboard graphic without specific driver) mode using linux and the video was just a little bit to slow, but under those circumstances quite impressive. So having a fast OS like MorphOS and using AltiVec, Overlay and a real gfx driver its more than only possible.

And yes, the demo in BadBramstedt was amazing :D

Geit


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 43 posts ] 

All times are UTC-06:00


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
PowerDeveloper.org: Copyright © 2004-2012, Genesi USA, Inc. The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
All other names and trademarks used are property of their respective owners. Privacy Policy
Powered by phpBB® Forum Software © phpBB Group