Hi Everyone,
I've been talking to Howard about this and he suggested to post it to
this mailing list. There are two things that I recently noticed about
how LMDB works with various encodings and I think it's worth to
discuss.
1. Database names
mdb_dbi_open treats its name parameter as a C string. This means UTF-8 on
unixes and ANSI on Windows, which is problematic for cross-platform
applications.
My suggestion is to create a variant of this function that also
accepts a length parameter (or just use MDB_val) so that instead of
treating it as a C string, it would treat it like a series of bytes,
allowing the user to use the encoding of their choice.
2. Path names
Functions like mdb_env_open, mdb_env_get_path, mdb_env_copy and the
likes accept a char* for path names. This is fine on most unixes where
char* is an UTF-8 string, but unfortunately, these functions call the
ANSI variants of the Windows API functions, making it impossible to
use Unicode path names with them.
I think we should switch to the widechar APIs instead, but that would
also mean changing the LMDB API to accept a wchar_t* parameter on
Windows instead of char*.
What do you guys think about all this?
Best regards,
Timur Kristóf