Quite a while back I talked about optimizing select/epoll/whatever using mmap'd memory shared between user space and the kernel for the event table.
http://www.openldap.org/lists/openldap-devel/200411/msg00088.html
These guys have gone even further, and use mapped memory to bring the entire network stack up to user level:
http://www.openonload.org/openonload-google-talk.pdf
Their paper talks about the work they did with two particular 10Gbit NICs. I find this interesting because in these benchmarks I ran
http://connexitor.com/blog/pivot/entry.php?id=191
the NIC interrupt overhead was 100% of a CPU core, and that was only on 1Gbit ethernet. Clearly for heavy-load deployments a better network solution is needed. It sounds like maybe the OpenOnload guys have a better approach.