(ITS#5665) slapd crashing with slapo-pcache when using attrset "*"
by toby@inf.ed.ac.uk
Full_Name: Toby Blake
Version: 2.4.11
OS: Scientific Linux 5.1
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (129.215.24.127)
Hi there,
I have been seeing problems when using slapo-pcache with
openldap-2.4.11, specifically when using an attrset of "*".
- openldap-2.4.11 on scientific linux 5.1
- We build our own RPMs. I have built them with no optimisation (-O0)
for the purposes of debugging.
Relevant part of slapd.conf:
overlay pcache
proxycache bdb 5000 1 500 60
proxycachequeries 10000
proxyattrset 0 "*"
proxytemplate (uid=) 0 60 60
What seems to happen is that a matching query will get answered and
added to the cache - all is fine until that entry expires and is then
deleted from the cache. The next matching query will then cause slapd
to crash, either with an abort or a segfault. This is repeatable.
I have been testing with the above configuration and the following
queries:
ldapsearch -x "uid=toby"
ldapsearch -x "uid=blah"
(the first for a positive reply, the second for a negative)
I have seen three different types of crash, all at the same point
(i.e. directly triggered by the query following the entry being
deleted from the cache).
So, here are the 3 different backtraces:
backtrace 1:
Thread 1 (process 13771):
#0 0x081c39fb in ber_put_string (ber=0x9839c00,
str=0x79626f74 <Address 0x79626f74 out of bounds>, tag=4294967295)
at encode.c:396
#1 0x081c488a in ber_printf (ber=0x9839c00, fmt=0x8227d5d "v}N}")
at encode.c:828
#2 0x08198957 in ldap_build_search_req (ld=0x9827920,
base=0xb56051a4 "dc=inf,dc=ed,dc=ac,dc=uk", scope=2,
filter=0xb5605234 "(uid=toby)", attrs=0x9831838, attrsonly=0, sctrls=0x0,
cctrls=0x0, timelimit=3600, sizelimit=24576, idp=0xb5f05d78)
at search.c:328
#3 0x081982fa in ldap_search_ext (ld=0x9827920,
base=0xb56051a4 "dc=inf,dc=ed,dc=ac,dc=uk", scope=2,
filter=0xb5605234 "(uid=toby)", attrs=0x9831838, attrsonly=0, sctrls=0x0,
cctrls=0x0, timeout=0xb5f05e28, sizelimit=24576, msgidp=0xb5f05e3c)
at search.c:100
#4 0x08116466 in ldap_back_search (op=0x9811140, rs=0xb5f07110)
at search.c:216
#5 0x080eb88e in overlay_op_walk (op=0x9811140, rs=0xb5f07110,
which=op_search, oi=0x97a6da8, on=0x0) at backover.c:646
#6 0x080eba96 in over_op_func (op=0x9811140, rs=0xb5f07110, which=op_search)
at backover.c:698
#7 0x080ebb3a in over_op_search (op=0x9811140, rs=0xb5f07110)
at backover.c:720
#8 0x08070e83 in fe_op_search (op=0x9811140, rs=0xb5f07110) at search.c:366
#9 0x080707e1 in do_search (op=0x9811140, rs=0xb5f07110) at search.c:217
#10 0x0806d530 in connection_operation (ctx=0xb5f07200, arg_v=0x9811140)
at connection.c:1084
#11 0x0806da1d in connection_read_thread (ctx=0xb5f07200, argv=0x18)
at connection.c:1211
#12 0x08192de9 in ldap_int_thread_pool_wrapper (xpool=0x9785880) at tpool.c:663
#13 0x0076046b in start_thread () from /lib/libpthread.so.0
#14 0x006b7dbe in clone () from /lib/libc.so.6
(gdb)
backtrace 2:
Thread 1 (process 27627):
#0 0x0065305a in free () from /lib/libc.so.6
#1 0x081c69ca in ber_memfree_x (p=0x9c8a1a0, ctx=0x0) at memory.c:152
#2 0x080d4020 in slap_sl_free (ptr=0x9c8a1a0, ctx=0x9c87c40)
at sl_malloc.c:456
#3 0x080708de in do_search (op=0x9c89d78, rs=0xb5b8d110) at search.c:233
#4 0x0806d530 in connection_operation (ctx=0xb5b8d200, arg_v=0x9c89d78)
at connection.c:1084
#5 0x0806da1d in connection_read_thread (ctx=0xb5b8d200, argv=0x10)
at connection.c:1211
#6 0x08192de9 in ldap_int_thread_pool_wrapper (xpool=0x9bfe880) at tpool.c:663
#7 0x0076046b in start_thread () from /lib/libpthread.so.0
#8 0x006b7dbe in clone () from /lib/libc.so.6
(gdb)
backtrace 3:
Thread 1 (process 10333):
#0 0x080bc4b2 in ad_inlist (desc=0x8efa9c8, attrs=0x8f8c488) at ad.c:586
#1 0x08080641 in fe_aux_operational (op=0x8f8bce0, rs=0xb5b8b110)
at backend.c:1885
#2 0x08080809 in backend_operational (op=0x8f8bce0, rs=0xb5b8b110)
at backend.c:1933
#3 0x080829f6 in slap_send_search_entry (op=0x8f8bce0, rs=0xb5b8b110)
at result.c:778
#4 0x0811684c in ldap_back_search (op=0x8f8bce0, rs=0xb5b8b110)
at search.c:338
#5 0x080eb88e in overlay_op_walk (op=0x8f8bce0, rs=0xb5b8b110,
which=op_search, oi=0x8f21da8, on=0x0) at backover.c:646
#6 0x080eba96 in over_op_func (op=0x8f8bce0, rs=0xb5b8b110, which=op_search)
at backover.c:698
#7 0x080ebb3a in over_op_search (op=0x8f8bce0, rs=0xb5b8b110)
at backover.c:720
#8 0x08070e83 in fe_op_search (op=0x8f8bce0, rs=0xb5b8b110) at search.c:366
#9 0x080707e1 in do_search (op=0x8f8bce0, rs=0xb5b8b110) at search.c:217
#10 0x0806d530 in connection_operation (ctx=0xb5b8b200, arg_v=0x8f8bce0)
at connection.c:1084
#11 0x0806da1d in connection_read_thread (ctx=0xb5b8b200, argv=0x10)
at connection.c:1211
#12 0x08192de9 in ldap_int_thread_pool_wrapper (xpool=0x8f00880) at tpool.c:663
#13 0x0076046b in start_thread () from /lib/libpthread.so.0
#14 0x006b7dbe in clone () from /lib/libc.so.6
(gdb)
In an hour of testing (with a positive query) yesterday, nine of the
crashes were with backtrace 3, two were with backtrace 1, and one was
with backtrace 2.
In an hour of testing with a negative query, all of the crashes were
essentially backtrace 2, but with a longer stack:
Thread 1 (process 18684):
#0 0x00220402 in __kernel_vsyscall ()
#1 0x0060fd20 in raise () from /lib/libc.so.6
#2 0x00611631 in abort () from /lib/libc.so.6
#3 0x00647e6b in __libc_message () from /lib/libc.so.6
#4 0x0064fb16 in _int_free () from /lib/libc.so.6
#5 0x00653070 in free () from /lib/libc.so.6
#6 0x081c69ca in ber_memfree_x (p=0x9bf1488, ctx=0x0) at memory.c:152
#7 0x080d4020 in slap_sl_free (ptr=0x9bf1488, ctx=0x9bee420)
at sl_malloc.c:456
#8 0x080708de in do_search (op=0x9bf1110, rs=0xb5f27110) at search.c:233
#9 0x0806d530 in connection_operation (ctx=0xb5f27200, arg_v=0x9bf1110)
at connection.c:1084
#10 0x0806da1d in connection_read_thread (ctx=0xb5f27200, argv=0x10)
at connection.c:1211
#11 0x08192de9 in ldap_int_thread_pool_wrapper (xpool=0x9b65880) at tpool.c:663
#12 0x0076046b in start_thread () from /lib/libpthread.so.0
#13 0x006b7dbe in clone () from /lib/libc.so.6
(gdb)
Please let me know if there is any additional information I can
provide.
Cheers
Toby Blake
School of Informatics
University of Edinburgh
14 years, 9 months
Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
by ali.pouya@free.fr
Hi Pierangelo,
>> contextCSN: 20080727021429.070493Z#000000#000#000000
>> contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==
>
> which looks like
>
> 4 bytes of garbage + "0802033718.300111Z#000000#001#000000"
>
Yes, but I would like to bring a precision :
under VI the 4 bytes are handled as 2 characters only. In fact each time
the problem occurs I repair my database using a BDB C program wich reads
the first key from id2entry.bdb and writes it on disk.
Then I use vi to fix the contextCSN, before writing the key back to the
database.
Using vi I do not delete any characters. I only replace them by 20, then
I fix the rest of the fields.
Another precision : when the first two chars take corrupted, the rest of
the contextCSN gets stuck and does not follow write operations.
> I note that, according to the sid values you assigned to servers A and
> B, the first contextCSN should not appear, since it has sid == 0,
> while the second one, apart from the corruption, is plausible (as
> you're writing to server A, with sid == 1).
>
Yes.
The contextCSN with sid=0 is there because at the beginning I initiated
my directory without SID (defaults to 0), then I set two difrent SIDs
for A and B.
Best Regards
Ali
14 years, 9 months
Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
by ando@sys-net.it
ali.pouya(a)free.fr wrote:
> Full_Name: Ali Pouya
> Version: 2.4.11
> OS: Linux 2.6
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (145.242.11.4)
>
>
> I think there is a documentation issue for OpenLdap 2.4.11 :
> The chapter 17.4.4 of the Admin Guide recommends configuring TWO sycrepl
> directives for each mirror side. If I do so, the contextCSN of the stand by
> mirror gets corrupted very easily. But if I confugure the mirrors with only ONE
> syncrepl directive it's OK.
>
> The test environment :
> I have a test directory with two mirrors A (sid=1) and B (sid=2) configured as
> recommended in the Admin's Guide, and a replica C connected to A.
> The directory contains 10 million objects, and I use the server A for writing
> 500 000 new ones.
>
> Very often and without any apparent reason the contextCSN in the memory of B
> gets suddenly corrupted while those of A and C are OK.
> In this situation the contextCSN of B gets stuck but B continues to receive data
> from A.
>
> The value of contextCSN in base 64 is :
>
> contextCSN: 20080727021429.070493Z#000000#000#000000
> contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==
which looks like
4 bytes of garbage + "0802033718.300111Z#000000#001#000000"
I note that, according to the sid values you assigned to servers A and
B, the first contextCSN should not appear, since it has sid == 0, while
the second one, apart from the corruption, is plausible (as you're
writing to server A, with sid == 1).
> I note that only the part indicating the year (2008) is garbled. May be this
> part is handled differently ?
No.
> At service shutdown B writes the corrupt contextCSN to the disk.
> At service startup B reads the corrupt contextCSN from the disk and begins to
> scan ALL of the data base.
>
> Also it sends a sync request to A (a persitent search containing the corrupt
> contextCSN in the control field) causing A to scan the WHOLE data base.
> The replica C remains safe.
The fact that the two servers scan the whole database is a side effect
of the incorrect contextCSN; I wouldn't bother, as soon as the
corruption gets tracked and fixed.
> If I reverse the roles of A and B the corruption occurs on A (always on the
> stand by mirror).
>
> I have already encountered the contextCSN corruption problem in OpenLdap 2.3 and
> this was one of my reasons to migrate to 2.4.11.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Fax: +39 0382 476497
Email: ando(a)sys-net.it
-----------------------------------
14 years, 9 months
Re: (ITS#5652) configure errors
by ando@sys-net.it
edpena(a)cisco.com wrote:
> Full_Name: Ed Pena
> Version: openldap-2.3.39
> OS: hpux 11.11
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (64.102.254.33)
Please provide config.log resulting from the execution of configure.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Fax: +39 0382 476497
Email: ando(a)sys-net.it
-----------------------------------
14 years, 9 months
Re: (ITS#5653) Segmentation Fault running slapd with mysql back-end
by ando@sys-net.it
A fix is in HEAD code; it checks args before trying to parse them.
Please test. p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Fax: +39 0382 476497
Email: ando(a)sys-net.it
-----------------------------------
14 years, 9 months
Re: (ITS#5662) Comments in schema declarations separated by semicolon
by Kurt@OpenLDAP.org
On Aug 21, 2008, at 9:57 AM, michael(a)stroeder.com wrote:
> Hallvard B Furuseth wrote:
>> michael(a)stroeder.com writes:
>>> hyc(a)symas.com wrote:
>>>> Who benefits from this feature?
>>> An admin copying&pasting a schema from an standard document which
>>> uses
>>> this format. I'm currently looking at such a document with ~500
>>> occurences of OIDs used in declarations instead of NAMEs.
>>
>> Which one? It's not RFC 4512 format. RFC 4512 uses ';' for comments
>> _about_ the syntax of schema elements, not _in_ their syntax.
>
> http://tools.ietf.org/draft/draft-dally-acp133-and-ldap/
There is a lot of crap in I-Ds, and even some crap in RFCs.
-- Kurt
14 years, 9 months
Re: (ITS#5664) Deadlocks when writing in parallell (two processes)
by quanah@zimbra.com
--On Thursday, August 21, 2008 6:29 PM +0000 hyc(a)symas.com wrote:
> stelios.xx.grigoriadis(a)ericsson.com wrote:
>> tom.bjorkholm(a)aastra.com wrote:
>>> Full_Name: Stelios Grigoriadis& Tom Bj?rkholm
>>> Version: 2.3.39
>>> OS: Novell SLES 10
>>> URL: ftp://ftp.openldap.org/incoming/
>>> Submission from: (NULL) (194.237.142.7)
>>>
>>>
>>> We get a lot of DB_LOCK_DEADLOCK when using client programs that for a
>>> period of time continuously writes to OpenLDAP.
>>> Version is 2.3.39.
>>>
>>> The information added is of the form:
>>> ebcmdCustomer=0+ebcmdDir=220xx,ou=AuthCodes,ebcmdVersion=0,ebcmdProduct
>>> =ebcmd,dc=example,dc=com where xx varies.
>>>
>>> Snippet of the output:
>>> Mar 27 13:03:21 ldapt1 slapd[7589]: => bdb_dn2id_add: subtree
>>> (ebcmdCustomer=0+ebcmdDir=22037,ou=authcodes,ebcmdVersion=0,ebcmdProduc
>>> t=ebcmd,dc=example,dc=com) put failed: -30995
>>> Mar 27 13:03:26 ldapt1 slapd[7589]: => bdb_idl_insert_key: c_put id
>>> failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30995)
>>> Mar 27 13:03:26 ldapt1 slapd[7589]: => bdb_dn2id_add: parent
>>> (ou=authcodes,ebcmdVersion=0,ebcmdProduct=ebcmd,dc=example,dc=com)
>>> insert failed: -30995
>>> Mar 27 13:03:28 ldapt1 slapd[7589]: => bdb_idl_insert_key: c_put id
>>> failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30995)
>>> Mar 27 13:03:28 ldapt1 slapd[7589]: => bdb_dn2id_add: parent
>>> (ou=authcodes,ebcmdVersion=0,ebcmdProduct=ebcmd,dc=example,dc=com)
>>> insert failed: -30995
>>> Mar 27 13:03:36 ldapt1 slapd[7589]: => bdb_idl_insert_key: c_put id
>>> failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30995)
>>> Mar 27 13:03:36 ldapt1 slapd[7589]: => bdb_dn2id_add: parent
>>> (ou=authcodes,ebcmdVersion=0,ebcmdProduct=ebcmd,dc=example,dc=com)
>>> insert failed: -30995
>>> Mar 27 13:03:38 ldapt1 slapd[7589]: => bdb_idl_insert_key: c_put id
>>> failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30995)
>>>
>>>
>>>
>>
>> We've temporarily fixed the problem by introducing a static mutex before
>> any add/update operation.
>
> There's no problem to fix. Deadlocks are normal in these scenarios, and
> the code automatically retries. This ITS will be closed.
I will note that testing on 2.3 has shown time and again that serialized
updates perform better, regardless. Using accesslog with delta-syncrepl
replication essentially enforces this. The more you can serialize updates,
particularly with batch provisioning, the smoother your system will
operate. This may not apply to 2.4.
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
14 years, 9 months
Re: (ITS#5664) Deadlocks when writing in parallell (two processes)
by hyc@symas.com
stelios.xx.grigoriadis(a)ericsson.com wrote:
> tom.bjorkholm(a)aastra.com wrote:
>> Full_Name: Stelios Grigoriadis& Tom Björkholm
>> Version: 2.3.39
>> OS: Novell SLES 10
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (194.237.142.7)
>>
>>
>> We get a lot of DB_LOCK_DEADLOCK when using client programs that for a period of
>> time continuously writes to OpenLDAP.
>> Version is 2.3.39.
>>
>> The information added is of the form:
>> ebcmdCustomer=0+ebcmdDir=220xx,ou=AuthCodes,ebcmdVersion=0,ebcmdProduct=ebcmd,dc=example,dc=com
>> where xx varies.
>>
>> Snippet of the output:
>> Mar 27 13:03:21 ldapt1 slapd[7589]: => bdb_dn2id_add: subtree
>> (ebcmdCustomer=0+ebcmdDir=22037,ou=authcodes,ebcmdVersion=0,ebcmdProduct=ebcmd,dc=example,dc=com)
>> put failed: -30995
>> Mar 27 13:03:26 ldapt1 slapd[7589]: => bdb_idl_insert_key: c_put id failed:
>> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30995)
>> Mar 27 13:03:26 ldapt1 slapd[7589]: => bdb_dn2id_add: parent
>> (ou=authcodes,ebcmdVersion=0,ebcmdProduct=ebcmd,dc=example,dc=com) insert
>> failed: -30995
>> Mar 27 13:03:28 ldapt1 slapd[7589]: => bdb_idl_insert_key: c_put id failed:
>> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30995)
>> Mar 27 13:03:28 ldapt1 slapd[7589]: => bdb_dn2id_add: parent
>> (ou=authcodes,ebcmdVersion=0,ebcmdProduct=ebcmd,dc=example,dc=com) insert
>> failed: -30995
>> Mar 27 13:03:36 ldapt1 slapd[7589]: => bdb_idl_insert_key: c_put id failed:
>> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30995)
>> Mar 27 13:03:36 ldapt1 slapd[7589]: => bdb_dn2id_add: parent
>> (ou=authcodes,ebcmdVersion=0,ebcmdProduct=ebcmd,dc=example,dc=com) insert
>> failed: -30995
>> Mar 27 13:03:38 ldapt1 slapd[7589]: => bdb_idl_insert_key: c_put id failed:
>> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30995)
>>
>>
>>
>
> We've temporarily fixed the problem by introducing a static mutex before
> any add/update operation.
There's no problem to fix. Deadlocks are normal in these scenarios, and the
code automatically retries. This ITS will be closed.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
14 years, 9 months
Re: (ITS#5662) Comments in schema declarations separated by semicolon
by michael@stroeder.com
Hallvard B Furuseth wrote:
> michael(a)stroeder.com writes:
>> hyc(a)symas.com wrote:
>>> Who benefits from this feature?
>> An admin copying&pasting a schema from an standard document which uses
>> this format. I'm currently looking at such a document with ~500
>> occurences of OIDs used in declarations instead of NAMEs.
>
> Which one? It's not RFC 4512 format. RFC 4512 uses ';' for comments
> _about_ the syntax of schema elements, not _in_ their syntax.
http://tools.ietf.org/draft/draft-dally-acp133-and-ldap/
Not for my professional work.
I was just looking for really complex schemas for testing web2ldap.
Ciao, Michael.
14 years, 9 months
Re: (ITS#5653) Segmentation Fault running slapd with mysql back-end
by ando@sys-net.it
ollieeillo(a)yahoo.co.uk wrote:
> backsql_oc_get_attr_mapping(): executing at_query
> "SELECT name,sel_expr,from_tbls,join_where,add_proc,delete_proc,param_order,expect_return,sel_expr_u
> FROM ldap_attr_mappings WHERE oc_map_id=?"
> for objectClass "document"
> with param oc_id="2"
> attributeType:
> name="(null)"
> sel_expr="(null)"
> from="(null)"
> join_where="(null)"
> add_proc="(null)"
> delete_proc="(null)"
> sel_expr_u=""
> Segmentation fault
This log seems to indicate that the ldap_attr_mappings table contains
NULLs for required fields. A check will be added to the code, but yours
definitely looks like a user error... fix your table and retry.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Fax: +39 0382 476497
Email: ando(a)sys-net.it
-----------------------------------
14 years, 9 months