Re: LMDB and text encoding

29 Jan 2015


      I've had a brief chat with Hallvard on IRC. We came up with several
possible solutions, although each of them has its drawbacks. Writing
cross-platform code that supports unicode is always a messy business.
I vote for option 4, but would like to hear everyone's opinions before
starting to work on any of them.
1) Separate widechar functions
Make functions such as mdb_env_open_w that would call the widechar
APIs. The drawback of this approach is that it would require a lot of
duplicate code, which is hard to maintain. It would also pollute the
lmdb header file.
2) New flag
Introduce a new flag (such as MDB_USE_WCHAR) that would tell
mdb_dbi_open to cast the path parameter to wchar_t* under the hood and
call the widechar variant of the windows api.
Advantage: only the string concatenation code would need to be duplicated
Drawback: it is really-really ugly
3) Require UTF-16 on Windows
Since Microsoft discourages the use of their ANSI apis, we could say
that we require UTF-16 on windows. We can make a type such as
mdb_uchar_t that we would typedef to char on unix and wchar_t on
windows and then we could change the function signatures to use this
type.
Drawback: users that want to write cross-platform code would need to
ifdef their calls to mdb_env_open
4) Require UTF-8 on Windows
Let's say we require the path parameter to be encoded in UTF-8, even
on windows. Then under the hood we can convert it to UTF-16 and call
the widechar APIs. This doesn't lead to loss of performance because
windows itself converts to UTF-16 anyway if you use their ANSI
functions.
This is the least ugly and perhaps the easiest-to-implement solution
we found. It is easy to make UTF-8 (most libraries can produce it, or
the user could use u8"..." from C++11, etc.)
Advantage: this is the easiest to implement; code that worked before
(with ASCII paths) will work without modification, and we don't need
to duplicate any code.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: LMDB and text encoding