amoneger@cisco.com wrote:
Full_Name: Alex Version: 2.4.35 OS: Centos 6.3 (2.6.32-279.el6.x86_64 #1 SMP) URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (64.103.25.102)
When using the pcache overlay over a translucent proxy, the slapd daemon crashes after the second LDAP request which misses the cache. For example, the following requests will trigger the issue. The important part is to miss the cache. Assuming nothing is cached for the aaaa and bbbb uid, the following request will trigger the issue (172.16.206.156 being the openldap server):
Thanks for the report, this is now fixed in git master.
ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=aaaa" uid st ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=bbbb" uid st
Whether aaaa and bbbb exist or not does not matter.
The following config is used:
include /usr/local/etc/openldap/schema/core.schema include /usr/local/etc/openldap/schema/cosine.schema include /usr/local/etc/openldap/schema/inetorgperson.schema include /usr/local/etc/openldap/schema/misc.schema include /usr/local/etc/openldap/schema/nis.schema
moduleload pcache.la moduleload translucent.la
database bdb suffix "o=xxxx" #checkpoint 1024 15 rootdn "uid=amoneger,ou=yyyy,o=xxxx" overlay translucent translucent_local uidNumber,gidNumber,homeDirectory,loginShell translucent_strict rootdn "uid=amoneger,ou=yyyy,o=xxxx" uri ldap://zzzz/ #tls ldaps tls_reqcert=demand tls_cacert=/usr/local/etc/openldap/certs/Cisco_ca_chain overlay pcache pcache bdb 10000 1 50 100 pcacheAttrset 0 * pcacheTemplate (uid=) 0 3600 pcacheBind (uid=) 0 1800 sub ou=yyyy,o=xxxx pcacheOffline TRUE pcachePersist TRUE pcacheValidate FALSE directory /var/cache/ldap cachesize 1000 index pcacheQueryid eq
The crash seems to be caused by a SIGABRT which is raised by libc free() due to a double free. Here is the traceback: Breakpoint 2, 0x000000312aa33f10 in abort () from /lib64/libc.so.6 (gdb) c Continuing.
Program received signal SIGABRT, Aborted. 0x000000312aa328a5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x000000312aa328a5 in raise () from /lib64/libc.so.6 #1 0x000000312aa34085 in abort () from /lib64/libc.so.6 #2 0x000000312aa707b7 in __libc_message () from /lib64/libc.so.6 #3 0x000000312aa760e6 in malloc_printerr () from /lib64/libc.so.6 #4 0x000000000042391e in do_search (op=0x7f135c000b80, rs=0x7f1366764930) at search.c:263 #5 0x0000000000421449 in connection_operation (ctx=0x7f1366764a90, arg_v=0x7f135c000b80) at connection.c:1155 #6 0x0000000000421c25 in connection_read_thread (ctx=0x7f1366764a90, argv=<value optimized out>) at connection.c:1291 #7 0x00000000005601f0 in ldap_int_thread_pool_wrapper (xpool=0x1f42770) at tpool.c:688 #8 0x000000312b207851 in start_thread () from /lib64/libpthread.so.0 #9 0x000000312aae890d in clone () from /lib64/libc.so.6
I was unable to track back the particular piece of code triggering the double free, but the same pointer p is freed twice by ber_memfree_x() in memory.c:
(gdb) delete 3 (gdb) break search.c:263 Breakpoint 4 at 0x5645df: file search.c, line 263. (gdb) c Continuing.
Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263 263 op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx ); (gdb) s slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493 493 { (gdb) s 498 if (!ptr) (gdb) s 501 if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) { (gdb) s 502 ber_memfree_x(ptr, NULL); (gdb) s 649 } (gdb) s 502 ber_memfree_x(ptr, NULL); (gdb) s ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127 127 { (gdb) s 128 if( p == NULL ) { (gdb) s 134 if( ber_int_memory_fns == NULL || ctx == NULL ) { (gdb) s 160 } (gdb) s 152 free( p ); (gdb) print p $1 = (void *) 0x1d64a10 (gdb) x/10x p 0x1d64a10: 0x00000001 0x00000000 0x005ffd31 0x00000000 0x1d64a20: 0x00000000 0x00000000 0x00000000 0x00000000 0x1d64a30: 0x00000000 0x00000000 (gdb) c Continuing. [New Thread 0x7fa561994700 (LWP 5823)]
Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263 263 op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx ); (gdb) s slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493 493 { (gdb) s 498 if (!ptr) (gdb) s 501 if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) { (gdb) s 502 ber_memfree_x(ptr, NULL); (gdb) s 649 } (gdb) s 502 ber_memfree_x(ptr, NULL); (gdb) s ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127 127 { (gdb) s 128 if( p == NULL ) { (gdb) s 134 if( ber_int_memory_fns == NULL || ctx == NULL ) { (gdb) s 160 } (gdb) s 152 free( p ); (gdb) print P No symbol "P" in current context. (gdb) print p $2 = (void *) 0x1d64a10 (gdb) x/10x p 0x1d64a10: 0x00000000 0x00000000 0x005ffd31 0x00000000 0x1d64a20: 0x00000000 0x00000000 0x00000000 0x00000000 0x1d64a30: 0x00000000 0x00000000
So the same pointer is being freed twice by the 2 connections which miss the cache. I'm unable to figure out who is responsible for that call though, but the same op->ors_attrs is freed by do_search():
if ( op->ors_attrs != NULL ) { op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
Parameters seem correct in both cases: (gdb) print op->o_hdr->oh_tmpmfuncs->bmf_free $12 = (BER_MEMFREE_FN *) 0x4733f0 <slap_sl_free> (gdb) print op->o_request.oq_search.rs_attrs $15 = (AttributeName *) 0x1f85a10
The call is done via connection_operation(), but that code part is a bit above my head, so I'm unable to track this further.
I thought this could be due to a threading problem, but building slapd with --with-threads=no does not make a difference.
I tried uploading the core dump to your ftp server, but seems like there is an issue with ftp.openldap.org [root@centos63 tmp]# ftp ftp.openldap.org Trying 204.152.186.57... Connected to ftp.openldap.org (204.152.186.57). 220- OpenLDAP FTP Service 220 boole.openldap.org FTP server (Version 6.00LS) ready. Name (ftp.openldap.org:cisco): anonymous 331 Guest login ok, send your email address as password. Password: 230- Copyright 1998-2010, The OpenLDAP Foundation, All Rights Reserved. 230- COPYING RESTRICTIONS APPLY, see: 230- ftp://ftp.openldap.org/COPYRIGHT 230- ftp://ftp.openldap.org/LICENSE 230 Guest login ok, access restrictions apply. Remote system type is UNIX. Using binary mode to transfer files. ftp> cd incoming 250 CWD command successful. ftp> binary 200 Type set to I. ftp> put core-slapd-6-55-55-26548-1368063267 local: core-slapd-6-55-55-26548-1368063267 remote: core-slapd-6-55-55-26548-1368063267 227 Entering Passive Mode (204,152,186,57,242,33) 553 core-slapd-6-55-55-26548-1368063267: No space left on device.
Let me know if you need anything. I can provide further debugs or cores. I'm also happy to try things out.
Cheers, Alex