https://bugs.openldap.org/show_bug.cgi?id=8988
--- Comment #25 from jhaberman@gmail.com --- I ran into this UBSAN error today:
third_party/liblmdb/mdb.c:7559:26: runtime error: member access within misaligned address 0x32577fcf7fd3 for type 'MDB_page' (aka 'struct MDB_page'), which requires 8 byte alignment
0x32577fcf7fd3: note: pointer points here 00 66 6f 6f 02 00 00 00 00 00 00 00 00 00 52 00 10 00 2c 00 00 00 00 00 00 00 00 00 62 61 72 00
This corresponds to this line: https://github.com/LMDB/lmdb/blob/da9aeda08c3ff710a0d47d61a079f5a905b0a10a/l...
mx->mx_db.md_entries = NUMKEYS(fp);
From the bug log I see that there is disagreement over the interpretation of UB. Language lawyering aside, this issue makes it difficult to use LMDB in environments where UBSAN is in use. I also worry about the potential for miscompiles if the compiler is using its own interpretation of UB, and optimizing based on the assumption that UB cannot happen.
Is there any way to make LMDB pack its structures less aggressively, so that it will satisfy UBSAN's alignment expectations?
Alternatively, there are ways to perform the accesses that make UBSAN happy, but they make the code uglier: https://godbolt.org/z/8EP6cW77s
The code in question is accessing an unsigned short on a 2 byte boundary. I.e., its alignment is correct. UBsan is incorrect here.
I cannot speak for the UBSAN authors, but I believe the issue is that, since p->x is equivalent to (*p).x, the dereference of p must be valid, even if x's alignment requirements are looser.
To avoid this, you could write *(uint16_t*)((char*)p + offsetof(S, x)), since this avoids actually dereferencing p. But this is obviously much less readable.