Well, the next thing works.
I finally implemented 'protected non-copying message passing', after a pretty lengthy effort. Phew.
Once I started putting it together I realised I needed a 'virtual address range' allocator, since I want to have a fixed global address range for messages. After a bit of mucking around I settled on an AVL tree based implementation, using first-fit. Although I'd written an AVL tree before (one perfect for the purpose) I'm not sure which version works anymore or where it is, so I just took the parent-based libavl implementation and 'tweaked' it to be more suitable - tree nodes are 'embedded' in objects (no data pointer), and never allocated, and there is no tree object or traversers (comparision functions are passed as arguments if needed). So basically it now has the same api as the one I wrote, but I know it's already debugged and working.
The nodes keep track of the unallocated memory range, and are sorted by address. This makes it easy to coalese blocks if adjacent ones are freed, and so on. Looking up an empty block is O(n) but the empty block list should be pretty short. It shouldn't end up with much fragmentation since it's allocating groups of pages at a time. In the past I did a lot of research on memory algorithms and it ends up that first fit has some desirable characteristics that best fit doesn't - which is nice since it's simpler too.
Anyway, once I had a range of memory, it was simply a matter of mapping that to the callee process when they allocate a message, and re-mapping it to the destination process when it is sent. Well, almost. I was going to have the kernel take ownership of the memory but since it just uses the last-process's page table it can't, at least without globally sharing it and that loses any protection from other processes. So instead a PutMsg maps the memory to the target process's page table immediately, and tracks the object separately (unfortunately requiring dynamic memory allocation). Page tables are only changed when the thread changes - so the page table update shouldn't involve any unexpected overhreads. Once the target process invokes GetMsg, the kernel just returns it's pointer. It's something that needs to run fast (although 'fast' is relative when the cpu is so fast), so it would've been better to avoid the AllocMem/FreeMem overhead required to queue the message, but maybe I can find some other mechanism if it becomes an issue.
I also ran into an optimisation thing that interfered with the nasty hack I'm using to implement 'tag lists' - which basically treat a varargs list as an array of pairs of ints. Anyway, I had a small inline wrapper to call the real function (that takes an array) and all the varag arguments were getting optimised out. __attribute__ ((noinline, unused)) works for now.
Now that stuff is sorted, I guess I can start looking at 'devices' next (funny, I thought I was at that point a couple of weeks ago). Although I think I need to rest this weekend - I've gone and bloody gotten sick again. I usually don't get sick too often, so 3 times in a month is a bit of a shock. Just a sore throat so far this time, but it's enough to get in the way of things.