openldap-bugs October 2019

openldap-bugs@openldap.org

20 participants
54 discussions

(ITS#9098) assert fails in meta_back_search in some cases after reconnect
by maxime.besson＠worteks.com 16 Oct '19

16 Oct '19

Full_Name: Maxime Besson Version: 2.4.47 OS: Debian Jessie URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (2a01:cb00:802:8400:2cbe:3c60:fca6:e50b) I am running a meta-directory with the following DB configuration. version 2.4.47, LTB build on Ubuntu 16.04 dn: olcDatabase={1}meta,cn=config objectClass: olcMetaConfig objectClass: olcDatabaseConfig objectClass: olcConfig objectClass: top olcDatabase: {1}meta olcSuffix: dc=com olcAccess: {0}to * by * read olcRootDN: cn=admin,dc=com dn: olcMetaSub={0}uri,olcDatabase={1}meta,cn=config objectClass: olcMetaTargetConfig objectClass: olcConfig objectClass: top olcMetaSub: {0}uri olcDbURI: ldap://1.2.3.4/dc=example,dc=com olcDbIDAssertBind: mode=legacy flags=non-prescriptive,proxy-authz-non-critical bindmethod=simple binddn="cn=admin,dc=example,dc=com" credentials="XXXXX" olcDbTimeout: 5 olcDbNetworkTimeout: 3 olcDbNretries: never olcDbRebindAsUser: true ... (There are 8 backends in total) Timeouts were added in order to avoid blocking OpenLDAP completely when one server becomes completely unavailable. However, since I added them, the slapd process started crashing every now and then (from a couple hours to a couple of days), usually during small network interruptions that affect all backends: I see plenty of reconnect logs shortly before the crashes. The crash is always immediately preceded by the following log message: meta_search_dobind_init[{i}]: retrying URI="{url}" DN="{DN}" {i} is never the same, and {url} and {DN} are the correct settings for backend i. The crash itself is an ABRT at the following assert in back-meta/search.c: 1957 assert( candidates[ i ].sr_msgid >= 0 1958 || candidates[ i ].sr_msgid == META_MSGID_CONNECTING ); I have analyzed several core dumps, and found that every single time slapd crashes, sr_msgid has a value of -1 (META_MSGID_IGNORE), which indeed causes the assert to fail. I found that candidates[i]->sr_flags has a value of 3 (META_CANDIDATE + META_BINDING) And the msc_mscflags in mc->mc_conns[ i ] are * 0x100081 for all connections before the one that triggers the crash * 0x100010 for the candidate that crashes the server * 0x100080 for all connections after it I am having trouble reproducing this in a test environment, but it happens regularly in production, I have tried changing the timeouts, adding a non-default bind timeout , and disabling retries (they were originally allowed) but the crashes keep happening. Note that disabling retries (olcDbNretries: never) still seems to lead to retries in meta_search_dobind_init, since the log message is still there. I cannot share the core dumps due to the sensitive information inside them. However I would gladly extract more information from them if it can help solving this.

1 0

Re: (ITS#9097) lmdb: premature free of env->me_txn0
by hyc＠symas.com 16 Oct '19

16 Oct '19

Christopher Zimmermann wrote: > Hi again, > > I did some further debugging and now I'm afraid I am indeed finishing a > transaction twice. I do call txn_commit() and after that I call > txn_abort(). But the MDB_MAP_FULL event happens during mdb_txn_commit(): > > (gdb) frame 0 > #0 mdb_page_alloc (mc=0x7f7ffffbd580, num=1, mp=0x7f7ffffbd228) at mdb.c:2286 > 2286 rc = MDB_MAP_FULL; > (gdb) bt > #0 mdb_page_alloc (mc=0x7f7ffffbd580, num=1, mp=0x7f7ffffbd228) at mdb.c:2286 > #1 0x00000fb6fb6aa239 in mdb_page_touch (mc=0x7f7ffffbd580) at mdb.c:2428 > #2 0x00000fb6fb6a9db3 in mdb_page_search (mc=0x7f7ffffbd580, key=0x0, flags=5) at mdb.c:5627 > #3 0x00000fb6fb69e585 in mdb_freelist_save (txn=0xfb6bfb3ae00) at mdb.c:3087 > #4 0x00000fb6fb69c2a4 in mdb_txn_commit (txn=0xfb6bfb3ae00) at mdb.c:3612 > > mdb_txn_commit() will return MDB_MAP_FULL. > > It may be worth adding MDB_MAP_FULL and MDB_BAD_TXN to the possible > error codes of mdb_txn_commit(), or in general provide some guidance on > how to clean up a txn after some failure. The doc for commit and abort is explicit. A txn must not be used again after calling commit or abort. There is nothing to clean up, the txn is always gone when commit or abort returns. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#9097) lmdb: premature free of env->me_txn0
by madroach＠gmerlin.de 16 Oct '19

16 Oct '19

Hi again, I did some further debugging and now I'm afraid I am indeed finishing a transaction twice. I do call txn_commit() and after that I call txn_abort(). But the MDB_MAP_FULL event happens during mdb_txn_commit(): (gdb) frame 0 #0 mdb_page_alloc (mc=0x7f7ffffbd580, num=1, mp=0x7f7ffffbd228) at mdb.c:2286 2286 rc = MDB_MAP_FULL; (gdb) bt #0 mdb_page_alloc (mc=0x7f7ffffbd580, num=1, mp=0x7f7ffffbd228) at mdb.c:2286 #1 0x00000fb6fb6aa239 in mdb_page_touch (mc=0x7f7ffffbd580) at mdb.c:2428 #2 0x00000fb6fb6a9db3 in mdb_page_search (mc=0x7f7ffffbd580, key=0x0, flags=5) at mdb.c:5627 #3 0x00000fb6fb69e585 in mdb_freelist_save (txn=0xfb6bfb3ae00) at mdb.c:3087 #4 0x00000fb6fb69c2a4 in mdb_txn_commit (txn=0xfb6bfb3ae00) at mdb.c:3612 mdb_txn_commit() will return MDB_MAP_FULL. It may be worth adding MDB_MAP_FULL and MDB_BAD_TXN to the possible error codes of mdb_txn_commit(), or in general provide some guidance on how to clean up a txn after some failure. Christopher -- http://gmerlin.de OpenPGP: http://gmerlin.de/christopher.pub CB07 DA40 B0B6 571D 35E2 0DEF 87E2 92A7 13E5 DEE1

1 0

Re: (ITS#9097) lmdb: premature free of env->me_txn0
by madroach＠gmerlin.de 15 Oct '19

15 Oct '19

On Wed, Oct 16, 2019 at 01:59:51AM +0100, Howard Chu wrote: > christopher(a)gmerlin.de wrote: > > Full_Name: Christopher Zimmermann > > Version: lmdb 0.9.24 > > OS: OpenBSD > > URL: ftp://ftp.openldap.org/incoming/ > > Submission from: (NULL) (85.212.180.240) > > > > > > Hi, > > > > I can reliably hit a Bus error on OpenBSD. > > This is triggered by OpenBSDs malloc/free junking [1] and a use-after-free bug > > in lmdb. > > > > Steps to reproduce: > > - begin a read/write transaction (getting env->me_txn0) > > - fill the environment > > -> returns MDB_MAP_FULL > > -> sets txn->mt_flags |= MDB_TXN_ERROR; (This is also env->me_txn0 !) > > -> calls mdb_txn_abort > ... > > - abort the transaction (again) with mdb_abort() > > This is a bug in your code, you can't call txn_abort twice. This is > already documented. Closing this ITS. Hi, thanks for having a look. I could not find any documentation about when one must / must not call mdb_txn_abort. But in any case I do _not_ call txn_abort() twice. The problem rather seems to be that lmdb _implicitely_ frees invalid transactions. And my code then aborts them again _explicitely_. So mdb_abort was indeed called twice. Once in the MDB_TXN_ERROR internally in lmdb case and once from my code. On second thought my fix won't fix the same problem for errored child transactions. What is necessary seems to be either documenting on which error conditions the user needs to call txn_abort() and on which transactions (child vs parent, too). Or (what I would prefer) let the user clean up the transaction. In case this is indeed already documented I would appreciate a pointer to the location of this documentation. Thanks, Christopher -- http://gmerlin.de OpenPGP: http://gmerlin.de/christopher.pub CB07 DA40 B0B6 571D 35E2 0DEF 87E2 92A7 13E5 DEE1

1 0

Re: (ITS#9097) lmdb: premature free of env->me_txn0
by hyc＠symas.com 15 Oct '19

15 Oct '19

christopher(a)gmerlin.de wrote: > Full_Name: Christopher Zimmermann > Version: lmdb 0.9.24 > OS: OpenBSD > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (85.212.180.240) > > > Hi, > > I can reliably hit a Bus error on OpenBSD. > This is triggered by OpenBSDs malloc/free junking [1] and a use-after-free bug > in lmdb. > > Steps to reproduce: > - begin a read/write transaction (getting env->me_txn0) > - fill the environment > -> returns MDB_MAP_FULL > -> sets txn->mt_flags |= MDB_TXN_ERROR; (This is also env->me_txn0 !) > -> calls mdb_txn_abort ... > - abort the transaction (again) with mdb_abort() This is a bug in your code, you can't call txn_abort twice. This is already documented. Closing this ITS. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

(ITS#9097) lmdb: premature free of env->me_txn0
by christopher＠gmerlin.de 15 Oct '19

15 Oct '19

Full_Name: Christopher Zimmermann Version: lmdb 0.9.24 OS: OpenBSD URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (85.212.180.240) Hi, I can reliably hit a Bus error on OpenBSD. This is triggered by OpenBSDs malloc/free junking [1] and a use-after-free bug in lmdb. Steps to reproduce: - begin a read/write transaction (getting env->me_txn0) - fill the environment -> returns MDB_MAP_FULL -> sets txn->mt_flags |= MDB_TXN_ERROR; (This is also env->me_txn0 !) -> calls mdb_txn_abort -> calls mdb_txn_end(txn, MDB_END_ABORT|MDB_END_SLOT|MDB_END_FREE): mdb_txn_end tries not to free env->me_txn0: } else if (!F_ISSET(txn->mt_flags, MDB_TXN_FINISHED)) { [...] txn->mt_flags = MDB_TXN_FINISHED; if (!txn->mt_parent) { [...] mdb.c:3020 mode = 0; /* txn == env->me_txn0, do not free() it */ [...] } [...] } if (mode & MDB_END_FREE) free(txn); this prevents the free only for unfinished transactions. Unfinished transactions are now finished. - abort the transaction (again) with mdb_abort() -> calls mdb_txn_end(txn, MDB_END_ABORT|MDB_END_SLOT|MDB_END_FREE): since the env->me_txn0 detection is skipped on MDB_TXN_FINISHED transactions the transaction is freed, the memory will get "junked". - begin a read/write transaction (getting now invalid env->me_txn0) -> calls mdb_txn_renew0(env->me_txn0) MDB_env *env = txn->mt_env; /* txn->mt_env is now invalid */ MDB_txninfo *ti = env->me_txns; /* triggers bus error */ Please make the protection against freeing of me_txn0 more robust. Thanks, Christopher Zimmermann [1] https://man.openbsd.org/malloc#j

1 0

Re: (ITS#9091) mdb attribute get mixed up in slapadd continue (-c) mode
by maxime.besson＠worteks.com 14 Oct '19

14 Oct '19

On 10/14/19 8:47 PM, Howard Chu wrote: > > I'd also note that using -c has always been pretty risky. You would be better off using -j > in situations like this, as show below: > Thanks for the fix and tip!

1 0

Re: (ITS#9096)
by priit＠ww.ee 14 Oct '19

14 Oct '19

--0000000000009f14730594e762f7 Content-Type: text/plain; charset="UTF-8" [deemon@Zen ~]$ pkg-config --modversion lmdb 0.9.24 could this be the version number you are looking for? - Priit On Mon, 14 Oct 2019 at 20:27, Priit Oorn <priit(a)ww.ee> wrote: > Don't know the version or how to find out, but I zipped it for you so you > can maybe find out yourself? :-) > Or give me some hints how to find out with what command? > > - Priit > --0000000000009f14730594e762f7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">[deemon@Zen ~]$ pkg-config --modversion lmdb<br>0.9.24<br>= <div><div dir=3D"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_sig= nature"><br></div><div class=3D"gmail_signature" data-smartmail=3D"gmail_si= gnature">could this be the version number you are looking for?<br></div><di= v dir=3D"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_signature">= <br>- Priit</div></div><br></div><br><div class=3D"gmail_quote"><div dir=3D= "ltr" class=3D"gmail_attr">On Mon, 14 Oct 2019 at 20:27, Priit Oorn <<a = href=3D"mailto:priit@ww.ee">priit(a)ww.ee</a>> wrote:<br></div><blockquote= class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so= lid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div>Don't know= the version or how to find out, but I zipped it for you so you can maybe f= ind out yourself? :-)</div><div>Or give me some hints how to find out with = what command?<br></div><div><div><div><div dir=3D"ltr"><br>- Priit</div></d= iv></div></div></div> </blockquote></div> --0000000000009f14730594e762f7--

1 0

Re: (ITS#9091) mdb attribute get mixed up in slapadd continue (-c) mode
by hyc＠symas.com 14 Oct '19

14 Oct '19

Howard Chu wrote: > maxime.besson(a)worteks.com wrote: >> Full_Name: Maxime Besson >> Version: 2.4.48 >> OS: Linux >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (77.193.139.162) >> >> >> I am attempting to implement the following disaster recovery process: >> >> * rm all previous data >> * run a configuration script (puppet) to recreate a bare-bones LDAP server and >> DIT >> * restore a backed-up slapcat dump on top of the freshly installed OpenLDAP >> server, ignoring duplicates already inserted by Puppet >> >> But it seems that when skipping over the existing objects, something goes wrong >> and causes attributes to get mixed up. But only when certain attributes are >> present in the existing objects. Here is how to reproduce: > > Thanks for the report and simple test case. This is now fixed in git master. I'd also note that using -c has always been pretty risky. You would be better off using -j in situations like this, as show below: >> >> puppet_init.ldif >> === >> dn: dc=example,dc=com >> objectClass: domain >> dc: example >> === >> >> backup.ldif >> === >> dn: dc=example,dc=com >> objectClass: domain >> dc: example >> contextCSN: 20190909094705.796552Z#000000#001#000000 >> >> dn: uid=ttully,dc=example,dc=com >> objectClass: inetOrgPerson >> uid: ttully >> userPassword:: c2Nob29uZXI= >> facsimileTelephoneNumber: +1 408 555 0111 >> givenName: Torrey >> cn: Torrey Tully >> telephoneNumber: +1 408 555 2274 >> sn: Tully >> roomNumber: 3924 >> mail: ttully(a)example.com >> l: Sunnyvale >> ou: Human Resources >> ou: People >> === >> >> When running the following commands: >> >> === >> rm -f /tmp/*.mdb >> slapadd -f slapd.conf puppet_init.ldif >> slapadd -f slapd.conf -c backup.ldif >> === slapadd -f slapd.conf -l puppet_init.ldif slapadd -f slapd.conf -l backup.ldif -j 5 -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#9091) mdb attribute get mixed up in slapadd continue (-c) mode
by hyc＠symas.com 14 Oct '19

14 Oct '19

maxime.besson(a)worteks.com wrote: > Full_Name: Maxime Besson > Version: 2.4.48 > OS: Linux > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (77.193.139.162) > > > I am attempting to implement the following disaster recovery process: > > * rm all previous data > * run a configuration script (puppet) to recreate a bare-bones LDAP server and > DIT > * restore a backed-up slapcat dump on top of the freshly installed OpenLDAP > server, ignoring duplicates already inserted by Puppet > > But it seems that when skipping over the existing objects, something goes wrong > and causes attributes to get mixed up. But only when certain attributes are > present in the existing objects. Here is how to reproduce: Thanks for the report and simple test case. This is now fixed in git master. > > > slapd.conf: > === > include /etc/ldap/schema/core.schema > include /etc/ldap/schema/cosine.schema > include /etc/ldap/schema/inetorgperson.schema > moduleload back_mdb > database mdb > maxsize 1073741824 > suffix "dc=example,dc=com" > directory /tmp > === > > > puppet_init.ldif > === > dn: dc=example,dc=com > objectClass: domain > dc: example > === > > backup.ldif > === > dn: dc=example,dc=com > objectClass: domain > dc: example > contextCSN: 20190909094705.796552Z#000000#001#000000 > > dn: uid=ttully,dc=example,dc=com > objectClass: inetOrgPerson > uid: ttully > userPassword:: c2Nob29uZXI= > facsimileTelephoneNumber: +1 408 555 0111 > givenName: Torrey > cn: Torrey Tully > telephoneNumber: +1 408 555 2274 > sn: Tully > roomNumber: 3924 > mail: ttully(a)example.com > l: Sunnyvale > ou: Human Resources > ou: People > === > > When running the following commands: > > === > rm -f /tmp/*.mdb > slapadd -f slapd.conf puppet_init.ldif > slapadd -f slapd.conf -c backup.ldif > === > > The redundant root object in backup.ldif gets skipped as -c should do, but the > attributes from my "ttully" user (and every following object in the ldif dump) > end up all mixed up: > > > === > slapcat -f slapd.conf > ... > dn: uid=ttully,dc=example,dc=com > objectClass: inetOrgPerson > userPassword:: dHR1bGx5 > facsimileTelephoneNumber: schooner > givenName: +1 408 555 0111 > cn: Torrey > telephoneNumber: Torrey Tully > sn: +1 408 555 2274 > roomNumber: Tully > mail: 3924 > l: ttully(a)example.com > ou: Sunnyvale > ou: Human Resources > ou: People > ... > === > > > This mixup does NOT happen if: > > * I import backup.ldif into an empty database > * OR I remove "contextCSN" from the backup ldif > * OR I use BDB instead of MDB > > So it seems that my issue is caused by a combination of MDB, skipping existing > entries, and having special attributes (contextCSN) in the skipped objects. > > I was able to reproduce this behavior: > * On Debian Buster (OpenLDAP 2.4.47) > * On RHEL7 + LTB project RPMs 2.4.48 > * Using a git snapshot (3be82f40d5cd4ca050e10859ecb961f28c807c41) with no > particular config options > * Using cn=config rather than slapd.conf > * On a real production system, rather than the simplified version presented > here. > * Using another "special" attribute such as pwdAccountLockedTime instead of > "contextCSN" > > > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs October 2019