I'm glad you expect this to happen at some point, cause that was going to be my next question :) - why a hard-float blob hasn't been released already. When Genesi buys hardware that uses the GPU in question and is in need of this library, you'd think the GPU manufacturer would be more eager to give you this library if it can improve performance on hardware that uses their GPU.
But again, I don't know much about the issue, it could be non-trivial releasing this library in a hard-float version.
This is a historical issue. Before the advent of ARMv6 (e.g. the ARM11 core in the first iPhones), there was a profusion of different and incompatible FPU implementations for ARM. Nowadays, almost all ARM implementations have some kind of vfp3 variant in them, at least a vfp3-d16 (which implements only 16 double registers, instead of 32).
The toolchains were designed to pass all FP arguments in integer registers for the plain reason that if a CPU lacked the VFPv3 (or another compatible variant), then there was no possibility of using FP registers at all - unless in integer registers. The FPU can be emulated in integer math, but its registers cannot. Hence, the softfp ABI was a trade-off that could still use a FPU if one existed, but would work without.
The world has moved on, now we have a VFPv3 implementation in all the chips that are relevant to Desktop (or Notebook) Linux in any way. So it's about time to build toolchains for hard float (passing params in FPU registers).
The biggest perf hit comes from the fact that to convert a set of variables from integer to FP registers takes a huge performance hit in the initial setup code of a function. Just look at an 'objdump --disassemble' output of any program that uses FPU and the softfp ABI.