https://bugs.openldap.org/show_bug.cgi?id=9920
--- Comment #9 from NikoPLP info@parlepeuple.fr --- Hello Howard and Kero,
Sorry for the lack of answer to your last message Howard. This issue is 2 years old. I have seen your message of 2023 but I was too busy until now, so I couldnt dedicate some time to that.
The fact is that I am not using LMDB anymore for NextGraph. Already 2 years, after I found the issue with AEAD, and then another issue popped up when compiling on OpenBSD (an issue with semaphores that I couldn't resolve. the data was corrupted in a mysterious way), I have switched to RocksDB for the storage backend of NextGraph.
It is not that RocksDN is better in terms of performance (it isn't) but it is just more suitable for me as it compiles on more platforms, and the encryption plugin was working (even though i had to implement it myself, and it doesn't offer AEAD). I also had another problem with LMDB, in that it relies on memory mapped paged (mapping handled by the OS) and this is clearly not something that will work with WASM.
As our storage backend has to work on web browsers too, I didn't want to invest more time on LMDB. RocksDB isn't ready for WASM neither, but the fact that it relies on simple reads and writes to static files on disk, makes it more suitable to be adapted to IndexedDB or the newest File API in the browser, even with encryption at rest.
Anyway, I love LMDB for its simplicity, perfs, and elegance. But. The code is a mess. I am sorry to say that but the fact that it is written in C is not an excuse for very poor inline documentation and obscure variable names.
I tried to debug the code of the encryption part several times. Here for the issue at end, it gets very complex as it implies a race condition, apparently (or at least, a case that isn't included in your test suite).
It seems that all in all, the master.3 branch is not used ( i am happily surprised by Meilisearch wanting to have AEAD with LMDB ), because the branch is very difficult to find (i think there is a mention to it in an old tweet. and that's pretty much all there is. no documentation neither).
Eventually, coming back to this issue
I was also using LMDB via a Rust binding (the one of firefox) but i don't think it is related to the Rust binding.
I couldn't extract a reproducible test easily, first because the code was complex and i didn't have time to extract a list of C API calls to give to Howard. second because i stopped using LMDB, and thirdly, because as Kero just said, the bug is only triggered when a fair amount of data has been entered in the database. I cannot say how many data is needed. I have tried in my original issue to describe all the information about the zeros i found in my data (that the LMDB code is putting there after the page arrived from the OS, because as I said, the data on disk doesn't have the zeros. it smells like a buffer overrun somewhere, or a buffer index that is shifted)
The incriminated code is not from Howard, but of someone else who worked on this part several years ago. I think it would be of interest to find that person and ask him what he did and what he thinks about the issue we describe.
With the lack of documentation and inline commenting, and obscure coding, if Howard is not fully aware of what is happening in this code, maybe it would be worth it to just throw it away and start the encryption part anew.
Depends also if there is some interest or not.
The only thing I can advise to Kero is to avoid AEAD for now, until it is fix (if Howard can find the bug, but he will also need a reproducible test case. If Kero cannot produce one, I might find some time in October, in order to extract one, just for the sake of benevolence towards LMDB)
But yes Kero, I confirm that you are hitting the same bug as I did back then. It is not a problem with your code.