Quote:
Nice video. I'm by no ways saying the i.MX515 is underpowered, far from it. My point was that where possible it's aways a good idea to use H/W accelerators. The main CPU my be quite capable of doing all the work, but if you have the H/W then you can free the CPU to do other things or clock it down to save power.
Ideally the optimization should be to reduce the CPU cycles needed to do it in the first place - using NEON or VFP if it's relevant - and then move those parts that can be further accelerated to the hardware accelerators.
What I'd hate to see is people throwing an MPEG4 decoder up onto the video decoder core, and thinking that is enough, while the rest of the video pipeline is languishing in old scalar code. Once people see it has "hardware accelerated MPEG4" nobody bothers to look at the pipeline.
We got the same effect with Mesa - now that people are spending inordinate amounts of time trying to get it working on graphics cards using every acceleration method they can, they may be missing out on significant optimization opportunities for certain rendering paths which are handled using simple scalar code.
As an example, a lot of binary drivers on Windows do small performance benchmarks on boot, which basically determine if it can produce certain results faster from the SSE2/SSE3 unit than passing it to the graphics card. In those cases, a highly optimized software fallback is used rather than offloading it to the graphics card, in the interests of performance. This comes into it's own when the latest drivers take advantage of CPU multithreading and multicore. You generally don't get to drive a graphics card from two threads and get better performance - the command pipeline is sequential and a lot of sitting around happens during processing.
With the multi-core CPU, multi-core GPU shader/CUDA modules, acceleration features all together, it is usually a mistake to just use one alone, just because it comes for free, and handles a single use case extremely well.
Quote:
With H/W accelerators you get stuff done "for free"
Here's another good example; the guys at Tro^H^H^H QtSoftware have been implementing "raster" and "opengl" rendering modes for the Qt backend, on the basis that a full software fallback or full 3D hardware acceleration is better than using the X Render protocol.
The standard "for free" hardware accelerated compositing engine, which is ironically used by Cairo too (actually X.org and Cairo share a software fallback library, libpixman) is actually slower than a well designed software pipeline..
The technique for gaining extra battery life that works best these days seems to be the race for idle.. get it done on the CPU as fast as you can, so you can sit idle for longer, later. This includes optimizing all the setup and preparation of data before submitting it to the hardware accelerator :)