(ITS#9098) assert fails in meta_back_search in some cases after reconnect
by maxime.besson@worteks.com
Full_Name: Maxime Besson
Version: 2.4.47
OS: Debian Jessie
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (2a01:cb00:802:8400:2cbe:3c60:fca6:e50b)
I am running a meta-directory with the following DB configuration. version
2.4.47, LTB build on Ubuntu 16.04
dn: olcDatabase={1}meta,cn=config
objectClass: olcMetaConfig
objectClass: olcDatabaseConfig
objectClass: olcConfig
objectClass: top
olcDatabase: {1}meta
olcSuffix: dc=com
olcAccess: {0}to * by * read
olcRootDN: cn=admin,dc=com
dn: olcMetaSub={0}uri,olcDatabase={1}meta,cn=config
objectClass: olcMetaTargetConfig
objectClass: olcConfig
objectClass: top
olcMetaSub: {0}uri
olcDbURI: ldap://1.2.3.4/dc=example,dc=com
olcDbIDAssertBind: mode=legacy flags=non-prescriptive,proxy-authz-non-critical
bindmethod=simple binddn="cn=admin,dc=example,dc=com" credentials="XXXXX"
olcDbTimeout: 5
olcDbNetworkTimeout: 3
olcDbNretries: never
olcDbRebindAsUser: true
...
(There are 8 backends in total)
Timeouts were added in order to avoid blocking OpenLDAP completely when one
server becomes completely unavailable. However, since I added them, the slapd
process started crashing every now and then (from a couple hours to a couple of
days), usually during small network interruptions that affect all backends: I
see plenty of reconnect logs shortly before the crashes.
The crash is always immediately preceded by the following log message:
meta_search_dobind_init[{i}]: retrying URI="{url}" DN="{DN}"
{i} is never the same, and {url} and {DN} are the correct settings for backend
i.
The crash itself is an ABRT at the following assert in back-meta/search.c:
1957 assert( candidates[ i ].sr_msgid >= 0
1958 || candidates[ i ].sr_msgid == META_MSGID_CONNECTING );
I have analyzed several core dumps, and found that every single time slapd
crashes, sr_msgid has a value of -1 (META_MSGID_IGNORE), which indeed causes the
assert to fail.
I found that candidates[i]->sr_flags has a value of 3 (META_CANDIDATE +
META_BINDING)
And the msc_mscflags in mc->mc_conns[ i ] are
* 0x100081 for all connections before the one that triggers the crash
* 0x100010 for the candidate that crashes the server
* 0x100080 for all connections after it
I am having trouble reproducing this in a test environment, but it happens
regularly in production, I have tried changing the timeouts, adding a
non-default bind timeout , and disabling retries (they were originally allowed)
but the crashes keep happening. Note that disabling retries (olcDbNretries:
never) still seems to lead to retries in meta_search_dobind_init, since the log
message is still there.
I cannot share the core dumps due to the sensitive information inside them.
However I would gladly extract more information from them if it can help solving
this.
1 year, 3 months
Re: (ITS#9097) lmdb: premature free of env->me_txn0
by hyc@symas.com
Christopher Zimmermann wrote:
> Hi again,
>
> I did some further debugging and now I'm afraid I am indeed finishing a
> transaction twice. I do call txn_commit() and after that I call
> txn_abort(). But the MDB_MAP_FULL event happens during mdb_txn_commit():
>
> (gdb) frame 0
> #0 mdb_page_alloc (mc=0x7f7ffffbd580, num=1, mp=0x7f7ffffbd228) at mdb.c:2286
> 2286 rc = MDB_MAP_FULL;
> (gdb) bt
> #0 mdb_page_alloc (mc=0x7f7ffffbd580, num=1, mp=0x7f7ffffbd228) at mdb.c:2286
> #1 0x00000fb6fb6aa239 in mdb_page_touch (mc=0x7f7ffffbd580) at mdb.c:2428
> #2 0x00000fb6fb6a9db3 in mdb_page_search (mc=0x7f7ffffbd580, key=0x0, flags=5) at mdb.c:5627
> #3 0x00000fb6fb69e585 in mdb_freelist_save (txn=0xfb6bfb3ae00) at mdb.c:3087
> #4 0x00000fb6fb69c2a4 in mdb_txn_commit (txn=0xfb6bfb3ae00) at mdb.c:3612
>
> mdb_txn_commit() will return MDB_MAP_FULL.
>
> It may be worth adding MDB_MAP_FULL and MDB_BAD_TXN to the possible
> error codes of mdb_txn_commit(), or in general provide some guidance on
> how to clean up a txn after some failure.
The doc for commit and abort is explicit. A txn must not be used again after calling commit or abort.
There is nothing to clean up, the txn is always gone when commit or abort returns.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
1 year, 3 months
Re: (ITS#9097) lmdb: premature free of env->me_txn0
by madroach@gmerlin.de
Hi again,
I did some further debugging and now I'm afraid I am indeed finishing a
transaction twice. I do call txn_commit() and after that I call
txn_abort(). But the MDB_MAP_FULL event happens during mdb_txn_commit():
(gdb) frame 0
#0 mdb_page_alloc (mc=0x7f7ffffbd580, num=1, mp=0x7f7ffffbd228) at mdb.c:2286
2286 rc = MDB_MAP_FULL;
(gdb) bt
#0 mdb_page_alloc (mc=0x7f7ffffbd580, num=1, mp=0x7f7ffffbd228) at mdb.c:2286
#1 0x00000fb6fb6aa239 in mdb_page_touch (mc=0x7f7ffffbd580) at mdb.c:2428
#2 0x00000fb6fb6a9db3 in mdb_page_search (mc=0x7f7ffffbd580, key=0x0, flags=5) at mdb.c:5627
#3 0x00000fb6fb69e585 in mdb_freelist_save (txn=0xfb6bfb3ae00) at mdb.c:3087
#4 0x00000fb6fb69c2a4 in mdb_txn_commit (txn=0xfb6bfb3ae00) at mdb.c:3612
mdb_txn_commit() will return MDB_MAP_FULL.
It may be worth adding MDB_MAP_FULL and MDB_BAD_TXN to the possible
error codes of mdb_txn_commit(), or in general provide some guidance on
how to clean up a txn after some failure.
Christopher
--
http://gmerlin.de
OpenPGP: http://gmerlin.de/christopher.pub
CB07 DA40 B0B6 571D 35E2 0DEF 87E2 92A7 13E5 DEE1
1 year, 3 months
Re: (ITS#9097) lmdb: premature free of env->me_txn0
by madroach@gmerlin.de
On Wed, Oct 16, 2019 at 01:59:51AM +0100, Howard Chu wrote:
> christopher(a)gmerlin.de wrote:
> > Full_Name: Christopher Zimmermann
> > Version: lmdb 0.9.24
> > OS: OpenBSD
> > URL: ftp://ftp.openldap.org/incoming/
> > Submission from: (NULL) (85.212.180.240)
> >
> >
> > Hi,
> >
> > I can reliably hit a Bus error on OpenBSD.
> > This is triggered by OpenBSDs malloc/free junking [1] and a use-after-free bug
> > in lmdb.
> >
> > Steps to reproduce:
> > - begin a read/write transaction (getting env->me_txn0)
> > - fill the environment
> > -> returns MDB_MAP_FULL
> > -> sets txn->mt_flags |= MDB_TXN_ERROR; (This is also env->me_txn0 !)
> > -> calls mdb_txn_abort
> ...
> > - abort the transaction (again) with mdb_abort()
>
> This is a bug in your code, you can't call txn_abort twice. This is
> already documented. Closing this ITS.
Hi,
thanks for having a look. I could not find any documentation about
when one must / must not call mdb_txn_abort.
But in any case I do _not_ call txn_abort() twice. The problem rather
seems to be that lmdb _implicitely_ frees invalid transactions. And my
code then aborts them again _explicitely_.
So mdb_abort was indeed called twice. Once in the MDB_TXN_ERROR
internally in lmdb case and once from my code.
On second thought my fix won't fix the same problem for errored child
transactions.
What is necessary seems to be either documenting on which error
conditions the user needs to call txn_abort() and on which transactions
(child vs parent, too).
Or (what I would prefer) let the user clean up the transaction.
In case this is indeed already documented I would appreciate a pointer
to the location of this documentation.
Thanks,
Christopher
--
http://gmerlin.de
OpenPGP: http://gmerlin.de/christopher.pub
CB07 DA40 B0B6 571D 35E2 0DEF 87E2 92A7 13E5 DEE1
1 year, 3 months
Re: (ITS#9097) lmdb: premature free of env->me_txn0
by hyc@symas.com
christopher(a)gmerlin.de wrote:
> Full_Name: Christopher Zimmermann
> Version: lmdb 0.9.24
> OS: OpenBSD
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (85.212.180.240)
>
>
> Hi,
>
> I can reliably hit a Bus error on OpenBSD.
> This is triggered by OpenBSDs malloc/free junking [1] and a use-after-free bug
> in lmdb.
>
> Steps to reproduce:
> - begin a read/write transaction (getting env->me_txn0)
> - fill the environment
> -> returns MDB_MAP_FULL
> -> sets txn->mt_flags |= MDB_TXN_ERROR; (This is also env->me_txn0 !)
> -> calls mdb_txn_abort
...
> - abort the transaction (again) with mdb_abort()
This is a bug in your code, you can't call txn_abort twice. This is
already documented. Closing this ITS.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
1 year, 3 months
(ITS#9097) lmdb: premature free of env->me_txn0
by christopher@gmerlin.de
Full_Name: Christopher Zimmermann
Version: lmdb 0.9.24
OS: OpenBSD
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (85.212.180.240)
Hi,
I can reliably hit a Bus error on OpenBSD.
This is triggered by OpenBSDs malloc/free junking [1] and a use-after-free bug
in lmdb.
Steps to reproduce:
- begin a read/write transaction (getting env->me_txn0)
- fill the environment
-> returns MDB_MAP_FULL
-> sets txn->mt_flags |= MDB_TXN_ERROR; (This is also env->me_txn0 !)
-> calls mdb_txn_abort
-> calls mdb_txn_end(txn, MDB_END_ABORT|MDB_END_SLOT|MDB_END_FREE):
mdb_txn_end tries not to free env->me_txn0:
} else if (!F_ISSET(txn->mt_flags, MDB_TXN_FINISHED)) {
[...]
txn->mt_flags = MDB_TXN_FINISHED;
if (!txn->mt_parent) {
[...]
mdb.c:3020 mode = 0; /* txn == env->me_txn0, do not free() it */
[...]
}
[...]
}
if (mode & MDB_END_FREE)
free(txn);
this prevents the free only for unfinished transactions.
Unfinished transactions are now finished.
- abort the transaction (again) with mdb_abort()
-> calls mdb_txn_end(txn, MDB_END_ABORT|MDB_END_SLOT|MDB_END_FREE):
since the env->me_txn0 detection is skipped on MDB_TXN_FINISHED
transactions
the transaction is freed, the memory will get "junked".
- begin a read/write transaction (getting now invalid env->me_txn0)
-> calls mdb_txn_renew0(env->me_txn0)
MDB_env *env = txn->mt_env; /* txn->mt_env is now invalid */
MDB_txninfo *ti = env->me_txns; /* triggers bus error */
Please make the protection against freeing of me_txn0 more robust.
Thanks,
Christopher Zimmermann
[1] https://man.openbsd.org/malloc#j
1 year, 3 months
Re: (ITS#9096)
by priit@ww.ee
--0000000000009f14730594e762f7
Content-Type: text/plain; charset="UTF-8"
[deemon@Zen ~]$ pkg-config --modversion lmdb
0.9.24
could this be the version number you are looking for?
- Priit
On Mon, 14 Oct 2019 at 20:27, Priit Oorn <priit(a)ww.ee> wrote:
> Don't know the version or how to find out, but I zipped it for you so you
> can maybe find out yourself? :-)
> Or give me some hints how to find out with what command?
>
> - Priit
>
--0000000000009f14730594e762f7
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">[deemon@Zen ~]$ pkg-config --modversion lmdb<br>0.9.24<br>=
<div><div dir=3D"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_sig=
nature"><br></div><div class=3D"gmail_signature" data-smartmail=3D"gmail_si=
gnature">could this be the version number you are looking for?<br></div><di=
v dir=3D"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_signature">=
<br>- Priit</div></div><br></div><br><div class=3D"gmail_quote"><div dir=3D=
"ltr" class=3D"gmail_attr">On Mon, 14 Oct 2019 at 20:27, Priit Oorn <<a =
href=3D"mailto:priit@ww.ee">priit(a)ww.ee</a>> wrote:<br></div><blockquote=
class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so=
lid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div>Don't know=
the version or how to find out, but I zipped it for you so you can maybe f=
ind out yourself? :-)</div><div>Or give me some hints how to find out with =
what command?<br></div><div><div><div><div dir=3D"ltr"><br>- Priit</div></d=
iv></div></div></div>
</blockquote></div>
--0000000000009f14730594e762f7--
1 year, 3 months
Re: (ITS#9091) mdb attribute get mixed up in slapadd continue (-c) mode
by hyc@symas.com
Howard Chu wrote:
> maxime.besson(a)worteks.com wrote:
>> Full_Name: Maxime Besson
>> Version: 2.4.48
>> OS: Linux
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (77.193.139.162)
>>
>>
>> I am attempting to implement the following disaster recovery process:
>>
>> * rm all previous data
>> * run a configuration script (puppet) to recreate a bare-bones LDAP server and
>> DIT
>> * restore a backed-up slapcat dump on top of the freshly installed OpenLDAP
>> server, ignoring duplicates already inserted by Puppet
>>
>> But it seems that when skipping over the existing objects, something goes wrong
>> and causes attributes to get mixed up. But only when certain attributes are
>> present in the existing objects. Here is how to reproduce:
>
> Thanks for the report and simple test case. This is now fixed in git master.
I'd also note that using -c has always been pretty risky. You would be better off using -j
in situations like this, as show below:
>>
>> puppet_init.ldif
>> ===
>> dn: dc=example,dc=com
>> objectClass: domain
>> dc: example
>> ===
>>
>> backup.ldif
>> ===
>> dn: dc=example,dc=com
>> objectClass: domain
>> dc: example
>> contextCSN: 20190909094705.796552Z#000000#001#000000
>>
>> dn: uid=ttully,dc=example,dc=com
>> objectClass: inetOrgPerson
>> uid: ttully
>> userPassword:: c2Nob29uZXI=
>> facsimileTelephoneNumber: +1 408 555 0111
>> givenName: Torrey
>> cn: Torrey Tully
>> telephoneNumber: +1 408 555 2274
>> sn: Tully
>> roomNumber: 3924
>> mail: ttully(a)example.com
>> l: Sunnyvale
>> ou: Human Resources
>> ou: People
>> ===
>>
>> When running the following commands:
>>
>> ===
>> rm -f /tmp/*.mdb
>> slapadd -f slapd.conf puppet_init.ldif
>> slapadd -f slapd.conf -c backup.ldif
>> ===
slapadd -f slapd.conf -l puppet_init.ldif
slapadd -f slapd.conf -l backup.ldif -j 5
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
1 year, 3 months
Re: (ITS#9091) mdb attribute get mixed up in slapadd continue (-c) mode
by hyc@symas.com
maxime.besson(a)worteks.com wrote:
> Full_Name: Maxime Besson
> Version: 2.4.48
> OS: Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (77.193.139.162)
>
>
> I am attempting to implement the following disaster recovery process:
>
> * rm all previous data
> * run a configuration script (puppet) to recreate a bare-bones LDAP server and
> DIT
> * restore a backed-up slapcat dump on top of the freshly installed OpenLDAP
> server, ignoring duplicates already inserted by Puppet
>
> But it seems that when skipping over the existing objects, something goes wrong
> and causes attributes to get mixed up. But only when certain attributes are
> present in the existing objects. Here is how to reproduce:
Thanks for the report and simple test case. This is now fixed in git master.
>
>
> slapd.conf:
> ===
> include /etc/ldap/schema/core.schema
> include /etc/ldap/schema/cosine.schema
> include /etc/ldap/schema/inetorgperson.schema
> moduleload back_mdb
> database mdb
> maxsize 1073741824
> suffix "dc=example,dc=com"
> directory /tmp
> ===
>
>
> puppet_init.ldif
> ===
> dn: dc=example,dc=com
> objectClass: domain
> dc: example
> ===
>
> backup.ldif
> ===
> dn: dc=example,dc=com
> objectClass: domain
> dc: example
> contextCSN: 20190909094705.796552Z#000000#001#000000
>
> dn: uid=ttully,dc=example,dc=com
> objectClass: inetOrgPerson
> uid: ttully
> userPassword:: c2Nob29uZXI=
> facsimileTelephoneNumber: +1 408 555 0111
> givenName: Torrey
> cn: Torrey Tully
> telephoneNumber: +1 408 555 2274
> sn: Tully
> roomNumber: 3924
> mail: ttully(a)example.com
> l: Sunnyvale
> ou: Human Resources
> ou: People
> ===
>
> When running the following commands:
>
> ===
> rm -f /tmp/*.mdb
> slapadd -f slapd.conf puppet_init.ldif
> slapadd -f slapd.conf -c backup.ldif
> ===
>
> The redundant root object in backup.ldif gets skipped as -c should do, but the
> attributes from my "ttully" user (and every following object in the ldif dump)
> end up all mixed up:
>
>
> ===
> slapcat -f slapd.conf
> ...
> dn: uid=ttully,dc=example,dc=com
> objectClass: inetOrgPerson
> userPassword:: dHR1bGx5
> facsimileTelephoneNumber: schooner
> givenName: +1 408 555 0111
> cn: Torrey
> telephoneNumber: Torrey Tully
> sn: +1 408 555 2274
> roomNumber: Tully
> mail: 3924
> l: ttully(a)example.com
> ou: Sunnyvale
> ou: Human Resources
> ou: People
> ...
> ===
>
>
> This mixup does NOT happen if:
>
> * I import backup.ldif into an empty database
> * OR I remove "contextCSN" from the backup ldif
> * OR I use BDB instead of MDB
>
> So it seems that my issue is caused by a combination of MDB, skipping existing
> entries, and having special attributes (contextCSN) in the skipped objects.
>
> I was able to reproduce this behavior:
> * On Debian Buster (OpenLDAP 2.4.47)
> * On RHEL7 + LTB project RPMs 2.4.48
> * Using a git snapshot (3be82f40d5cd4ca050e10859ecb961f28c807c41) with no
> particular config options
> * Using cn=config rather than slapd.conf
> * On a real production system, rather than the simplified version presented
> here.
> * Using another "special" attribute such as pwdAccountLockedTime instead of
> "contextCSN"
>
>
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
1 year, 3 months