https://bugs.openldap.org/show_bug.cgi?id=10260
Issue ID: 10260 Summary: Document alignment of MDB_val.mv_data Product: LMDB Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: sascha@brawer.ch Target Milestone: ---
In lmdb.h, could the documentation for MDB_val talk about alignment of mv_data?
For example, is the key guaranteed to be aligned to an 8-byte boundary if a table got created with MDB_INTEGERKEY? What about values in MDB_INTEGERDUP tables? Can database values be directly loaded into SIMD registers (of what width?) without first copying the data to an aligned location?
On some processor architectures, unaliged reads lead to bus errors; therefore, it would help programmers to know whether LMDB makes any alignment guarantees. Even if clients cannot assume anything, it would be good if LMDB’s public API documentation could state so.
Many thanks for documenting this! Just adding 1 or 2 sentences to the MDB_val section in lmdb.h would be enough.
— Sascha Brawer, sascha@brawer.ch
https://bugs.openldap.org/show_bug.cgi?id=10260
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID
--- Comment #1 from Howard Chu hyc@openldap.org ---
In lmdb.h, could the documentation for MDB_val talk about alignment of mv_data?
No.
All guarantees are already documented. e.g. http://www.lmdb.tech/doc/group__internal.html#ga8c8e3aac03984bb37d2b5adf7c4e...
In LMDB 0.9 keys are guaranteed to be 2-byte aligned.
In LMDB 1.0 values will be guaranteed to be 2-byte aligned (keys will be padded with an extra byte if necessary).
Anything else is entirely up to how you use it.
https://bugs.openldap.org/show_bug.cgi?id=10260
Sascha Brawer sascha@brawer.ch changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|INVALID |---
--- Comment #2 from Sascha Brawer sascha@brawer.ch --- Would you be open to documenting this in the public documentation, not only in internals? See below for a proposed patch.
Apologies for the nuisance, but from reading the public LMDB documentation, it's really not obvious that LMDB keys and values have different memory alignment than malloc(), which aligns to the platform’s largest primitive type (typically 8 bytes these days). Looking at some applications that use LMDB in the wild, they're happily reading mv_data pointers as int32_t*, uint64_t*, or as pointers to custom structs with i32/i64/float members. If LMDB’s mv_data is aligned to a 2-byte boundary, these applications will either crash or be very slow on non-Intel CPUs.
Just to clarify, I'm not asking LMDB to change its implementation. But I think it would be good to document the unusual alignment in the public lmdb.h header, not just in internals.
From c35b5e80d19e649a90ab23437ce69cff176c02c3 Mon Sep 17 00:00:00 2001 From: Sascha Brawer sascha@brawer.ch Date: Mon, 30 Sep 2024 10:39:10 +0200 Subject: [PATCH] Add a note about (lack of) alignment to public API documentations
https://bugs.openldap.org/show_bug.cgi?id=10260 --- libraries/liblmdb/lmdb.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/libraries/liblmdb/lmdb.h b/libraries/liblmdb/lmdb.h index 199382a14c..04ceca13f8 100644 --- a/libraries/liblmdb/lmdb.h +++ b/libraries/liblmdb/lmdb.h @@ -1272,6 +1272,10 @@ int mdb_set_relctx(MDB_txn *txn, MDB_dbi dbi, void *ctx\ ); * database. The caller need not dispose of the memory, and may not * modify it in any way. For values returned in a read-only transaction * any modification attempts will cause a SIGSEGV. + * @note Starting with LMDB 1.0, the memory address of returned + * values is aligned to a 2-byte boundary; earlier versions make + * no alignment guarantees. On some processor architectures, + * such as ARM or PowerPC, misaligned reads will cause a SIGBUS. * @note Values returned from the database are valid only until a * subsequent update operation, or the end of the transaction. * @param[in] txn A transaction handle returned by #mdb_txn_begin()
https://bugs.openldap.org/show_bug.cgi?id=10260
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED
--- Comment #3 from Howard Chu hyc@openldap.org --- (In reply to Sascha Brawer from comment #2)
Would you be open to documenting this in the public documentation, not only in internals? See below for a proposed patch.
Apologies for the nuisance, but from reading the public LMDB documentation, it's really not obvious that LMDB keys and values have different memory alignment than malloc(), which aligns to the platform’s largest primitive type (typically 8 bytes these days). Looking at some applications that use LMDB in the wild, they're happily reading mv_data pointers as int32_t*, uint64_t*, or as pointers to custom structs with i32/i64/float members. If LMDB’s mv_data is aligned to a 2-byte boundary, these applications will either crash or be very slow on non-Intel CPUs.
Just to clarify, I'm not asking LMDB to change its implementation. But I think it would be good to document the unusual alignment in the public lmdb.h header, not just in internals.
General DB users don't need to worry about such things, it is an internal implementation detail. People who need to worry about such things must read up on the DB internals.
https://bugs.openldap.org/show_bug.cgi?id=10260
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED Keywords|needs_review |
https://bugs.openldap.org/show_bug.cgi?id=10260
Ondřej Kuzník ondra@mistotebe.net changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.openldap.org/s | |how_bug.cgi?id=10262