Greg Hudson wrote:
On 08/10/2017 11:55 AM, Howard Chu wrote:
Thoughts? Hardcode 1 algorithm, or leave it pluggable?
Some thoughts, without advocating for either option:
- If support isn't built-in, then generic LMDB tools (including
mdb_copy/dump/load/stat) can't operate on encrypted databases, if they need plaintext pages to work.
Yeah, already thought about that. We can add an option to the generic tools to dynamically load a user-supplied module for such cases. I always wanted this for BerkeleyDB as well, to safely operate on DBs with custom comparators.
- Built-in support doesn't necessarily mean hardcoding an algorithm for
all time, if the meta pages can include an algorithm selector. One of the selector values could even mean "use application callbacks".
- Is the page size guaranteed to be a multiple of 16 bytes? 32 bytes?
I would assume yes to both; documenting that would make it easier to use block ciphers since ciphertext expansion isn't allowed.
Yes, page sizes are always large powers of 2. 4096 bytes is typical (but on the small side). SPARC uses 8192, some MIPS systems use 32768 or 65536.
- Application writers are more likely to get encryption callbacks wrong
than Howard is. They could ignore the IV (making it easy to detect duplicate initial blocks within a page) or even do pure ECB encryption (making it easy to detect duplicate blocks anywhere). Less egregiously, applications might not make the ideal choice of cipher mode. I would personally have to think about the best choice to use. If I were using a block cipher, CBC with the provided ivec seems like it should be okay, but assuming 128-bit cipher blocks, after around 2^64 blocks one would expect to experience a block collision which reveals the XOR of the plaintexts of the preceding two blocks[1]. Deriving a key with HKDF(key, ivec) and using counter mode might be safer, unless I'm missing something, which I easily could be. If I were using a stream cipher, I would have to do research to figure out how to incorporate the ivec.
The user-supplied IV is really just a seed, it will be hashed with some other uniqifiers (pageID,txnID) before being passed to the cipher. I suppose we could make some recommendations on ciphers and modes, but really I think it's up to the user to determine what kind of strength/speed tradeoffs they'll accept.
I would expect stream ciphers to be used, in general.
- Not wanting to depend on crypto libraries seems like a valid concern.
Teaching the LMDB code how to dynamically load encryption plugins doesn't necessarily seem attractive either.
We'll probably do the dynamic loading anyway, as noted above.