Hi Martin, a good starting point to learn is
http://www.l4hq.org. There are the kernel API documentation for a lot of CPU families, L4 and its predessesor L3 was running on, like MIPS and Alpha.
From there the way leads to University of Karlsruhe
http://l4ka.org/. They are doing a lot in L4 developing as a hypervisor and virtualization of Linux. There was also a dissertation try to host Darwin Kernel on top of L4.
http://ertos.nicta.com.au/software/darbat/ And not to forget the never ending HURD carryed on with L4 a short time.
On the other hand is the Technical University of Dresden
http://os.inf.tu-dresden.de/L4/. There you can find a lot about API internals and theory behind the implementation. They have also initiated the design of a mathematical proven high secure microkernel based on L4 (called L4.sec
http://os.inf.tu-dresden.de/L4/L4.Sec/) which is as far as I understand base of appliance used by the german ambassy to secure the communication channels (SINA-BOX). Dresden is also the only member is researching the real time capabilities of L4. All other (particulary the OKL) have dropped these, because is makes the IPC path a lot more complicated.
Most of the kernel sources are GPL or BSD on you can have a look into. In the beginning it was a mixture of ASM and C/C++. Todays kernels are pure C/C++, which makes them very portable. I haven't found so much about QNX and it's design priciples, so I couldn't compare. Maybe, you can give me some information here.
Lx is really easy. The basic concept is synchronous IPC as mechanism to communication and a special handling of memory. Both are changed in details, but the main concept is constant since L3.
IPC is synchronous which mean that the receiving thread doesn't have a buffer. Only if the receiving thread is waiting for a msg he can receive. The sending thread may block or fail, if not. On older APIs there was also something like a watchdog in the sending/waiting, say wait x seconds and than come back with failure, if you are not able to send/receive. This was used for real time capabilities in DROPS. L3 (?) starts than in optimizing the IPC path with ASM and the send/receive by a fast path IPC call with the ability to send a msg and stay waiting for an answer from the receiver/another task in ONE operation.
There was/are 3 kinds of msg. 1.) The newer APIs use up to 32 virtual registers to transfer msg from one thread to another, which are partly identical to CPU registers. This makes the IPC of short msgs really fast, because sending and receiving thread using the same REGs without transfering to/from memory (synchronous IPC!!!). 2.) the next used strings up to a few MB which was copyed to the receivers address space. These was abandoned few years ago. 3.) use classicle map of memory pages and was used to underly memory to an address space. (Will explain it later.)
Memory is special kind of ressource in L4. All the memory (not used by the kernel) are given to root task (Sigma0). Every other task isn't more than a virtual address space of x GB filled with no memory pages in the beginning. If a thread start living it will access a memory address and rise a MMU exception 'cause of there is no memory underlying, which will be transformed to an IPC msg by the kernel send to the tasks memory handler (mainly Sigma0) which will answer with an IPC msg of type 3 to grant memory to this address. There is also an unmap operation to steel pages back and a map operation for sharing memory regions. By the way all exceptions are also transformed to IPC msg to special Exception handler tasks.
All other kernel operations (something between 7 and 15 in total) are specialized to create address spaces/ threads, perform operations on CPU registers and so on.
The last big change came from EROS kernel and is called a capability. This is a concept of security I didn't understand in total. As far as I figured out, a task hold a access/operation right (rwx) on an object (like a memory page). Let's say Task A has the right to write msg to task B, than every IPC need started by task A will use this write capability. And as the capabilities are kernel controled object, task A can't forge this right. But, I'm honest with you. I didn't get how to handle this in real life.
Greeting, Kay-Uwe