i did try a dummy prototype awhile back and it doesnt perform very well. you end up incurring too much overhead and it doesnt pay off even when underlaying FS data is 100% cached. plus you can never truly control what happens with FS cache, you can size and influence it in some ways but you cannot guarantee your operation will hit cached data which does make it difficult to deliver predictable response times, in other words you gonna have to accept I/O hits and widen your response window to the worst case scenario for at least some %tage of operations. this can be optimized and made more predictable on a black box where you control the entire machine but moot otherwise. the FS was ZFS and just for the record the perf didnt suck per se but didnt quite match traditional db backends perf [especially with entry caches] either. i dont have slamd comparison data anymore to show you unfortunately.
Emmanuel Lecharny wrote:
That sounds interesting. Now, you may consider another idea to be totally insane, but instead of writing your own DB engine implementation, what about relying on the FS ? We discussed about this idea recently in the Apache Directory community (we have pretty much the same concern : 3 level of cache is just over killing). So if you take Window$ out of the picture (and even if you keep it in the full picture), many existing linux/unix FS are already implemented using a BTree (EXT3/4, BTRFS, even NTFS !). What about using this underlying FS to store entries directly, instead of building a special file which will be a intermediate layer ? The main issue will be to manage indexes, but that should not be a real problem. So every entry will be stored as a single file (could be in LDIF format :)
So far, this is just a discussion we are having, but that might worth a try at some point...
Does it sound insane ?