Full_Name: Mark
Version: 2.3.38
OS: Suse Linux 10.01
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (84.128.87.178)
** Problem
test001-slapdadd fails with segmentation fault.
** Environment
Suse Linux 10.1
Berkly DB 4.6.21
Openldap 2.3.38
** Configuration of ldap
configure --prefix=/usr --enable-debug
** Shared Library depency
libdb-4.6.so => /usr/lib/libdb-4.6.so (0x40028000)
libsasl2.so.2 => /usr/lib/libsasl2.so.2 (0x40169000)
libdl.so.2 => /lib/libdl.so.2 (0x40182000)
libresolv.so.2 => /lib/libresolv.so.2 (0x40185000)
libpthread.so.0 => /lib/i686/libpthread.so.0 (0x40197000)
libc.so.6 => /lib/i686/libc.so.6 (0x401e9000)
** StackTrace
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 32771 (LWP 28247)]
0x400bd3bd in __lock_get_internal (lt=0x8215ae8, sh_locker=0x7, flags=0,
obj=0x82168d4, lock_mode=DB_LOCK_READ, timeout=0, lock=0x40d8e50c) at
../lock/lock.c:740
740 no_dd = sh_locker->master_locker == INVALID_ROFF &&
(gdb) bt
#0 0x400bd3bd in __lock_get_internal (lt=0x8215ae8, sh_locker=0x7, flags=0,
obj=0x82168d4, lock_mode=DB_LOCK_READ, timeout=0, lock=0x40d8e50c) at
../lock/lock.c:740
#1 0x400bcc85 in __lock_get (dbenv=0x82154e8, locker=0x7, flags=0,
obj=0x82168d4, lock_mode=DB_LOCK_READ, lock=0x40d8e50c) at ../lock/lock.c:447
#2 0x400ec2f7 in __db_lget (dbc=0x8216858, action=0, pgno=1, mode=DB_LOCK_READ,
lkflags=0, lockp=0x40d8e50c) at ../db/db_meta.c:1012
#3 0x40054d7d in __bam_get_root (dbc=0x8216858, pg=1, slevel=1, flags=1409,
stack=0x40d8e614) at ../btree/bt_search.c:94
#4 0x400551a4 in __bam_search (dbc=0x8216858, root_pgno=1, key=0x40d8e97c,
flags=1409, slevel=1, recnop=0x0, exactp=0x40d8e818) at
../btree/bt_search.c:200
#5 0x40045160 in __bamc_search (dbc=0x8216858, root_pgno=0, key=0x40d8e97c,
flags=26, exactp=0x40d8e818) at ../btree/bt_cursor.c:2486
#6 0x400411bc in __bamc_get (dbc=0x8216858, key=0x40d8e97c, data=0x40d8e95c,
flags=26, pgnop=0x40d8e8ac) at ../btree/bt_cursor.c:961
#7 0x400da81d in __dbc_get (dbc_arg=0x8217158, key=0x40d8e97c, data=0x40d8e95c,
flags=26) at ../db/db_cam.c:697
#8 0x400e7dfb in __dbc_get_pp (dbc=0x8217158, key=0x40d8e97c, data=0x40d8e95c,
flags=26) at ../db/db_iface.c:2022
#9 0x080d455f in bdb_id2entry (be=0x7, tid=0x0, locker=7, id=1, e=0x40d8e9f8)
at id2entry.c:125
#10 0x080cdabb in bdb_cache_find_id (op=0x822a638, tid=0x0, id=1,
eip=0x40d8ea84, islocked=0, locker=7, lock=0x40d8eb1c) at cache.c:760
#11 0x080d12cd in bdb_dn2entry (op=0x822a638, tid=0x0, dn=0x0, e=0x40d8eb14,
matched=1, locker=7, lock=0x40d8eb1c) at dn2entry.c:68
#12 0x080b4ae9 in bdb_search (op=0x822a638, rs=0x40e4fc9c) at search.c:374
#13 0x0805ec5b in fe_op_search (op=0x822a638, rs=0x40e4fc9c) at search.c:355
#14 0x0805e43f in do_search (op=0x822a638, rs=0x40e4fc9c) at search.c:217
#15 0x0805cae2 in connection_operation (ctx=0x7, arg_v=0x822a638) at
connection.c:1133
#16 0x080fd514 in ldap_int_thread_pool_wrapper (xpool=0x81c0a88) at tpool.c:478
#17 0x4019cf60 in pthread_start_thread () from /lib/i686/libpthread.so.0
#18 0x4019d0fe in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#19 0x402c5327 in clone () from /lib/i686/libc.so.6
** Description
We added some printf's and found the that sh_locker has the value 0x7
which seems to be an index but not a valid lock for the database.
Note: With the following modification the tests are running:
File: servers/slapd/back-bdb/id2entry.c
Line: 120
#if 0
/* Use our own locker if needed */
if ( !tid && locker )
cursor->locker = locker;
#endif
Any help is appriated
Regards
Mark
> It's still rather suspicious that slave4 and slave6 both had identical log
> status for base1 (1/188113) but different requested locations (1/8730339 vs
> 1/8730401). If they're identically configured slaves then they ought to be in
> lock-step. Then again, obviously they're not identical since slave6 doesn't
> show base4 in your log.
Identical is relative. They've got the same OpenLDAP and supporting
binaries running on the same patches of Solaris 9 running identical
turn-up scripts with identical configuration files. But this is
production, so we've got data changes over time. For instance, the slaves
bootstrap with a slapadd -q, and the underlying slapcat could easily be
different from slave4 vs. slave6 (the most recent one is automatically
used). I'd imagine this would look different at the db layer, even once
syncrepl eventually converged the logical data?
> Do you have the db_stat output from an uncorrupted slave? What about the
> master?
Sure... https://www.nbcs.rutgers.edu/~richton/its5171_dbstatl2
Aaron Richton wrote:
>> itself. Again, we can't really tell without single-stepping thru the BDB
>> library code. It may not be worth the effort, but that's your call.
>
> The lock was
>
> env_region.c:290 MUTEX_LOCK(dbenv, &renv->mutex);
>
> but that wasn't making much sense....and after a couple minutes in dbx I
> realized that I've been killing myself with the attempts at db_stat.
> Yesterday's attempts were running db_* binaries with a wrong (but
> compatible) ABI. It'd be nice if Sleepycat had some more/earlier checks
> for that, but oh well...
Kinda figured that that's what happened.
> So anyway, I corrupted base2/slave4 by running the wrong db_stat, but that
> left three other bases on slave4 and all three bases on slave6. I ran
> db_stat -l on them, the output is:
>
> https://www.nbcs.rutgers.edu/~richton/its5171_dbstatl
> BTW, this ABI screwup shouldn't be the root cause of the failures...I
> haven't tried any db tools until the course of debugging this. These are
> AUTOREMOVE, so db_archive is unlikely, for instance.
It's still rather suspicious that slave4 and slave6 both had identical log
status for base1 (1/188113) but different requested locations (1/8730339 vs
1/8730401). If they're identically configured slaves then they ought to be in
lock-step. Then again, obviously they're not identical since slave6 doesn't
show base4 in your log.
Do you have the db_stat output from an uncorrupted slave? What about the master?
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
> itself. Again, we can't really tell without single-stepping thru the BDB
> library code. It may not be worth the effort, but that's your call.
The lock was
env_region.c:290 MUTEX_LOCK(dbenv, &renv->mutex);
but that wasn't making much sense....and after a couple minutes in dbx I
realized that I've been killing myself with the attempts at db_stat.
Yesterday's attempts were running db_* binaries with a wrong (but
compatible) ABI. It'd be nice if Sleepycat had some more/earlier checks
for that, but oh well...
So anyway, I corrupted base2/slave4 by running the wrong db_stat, but that
left three other bases on slave4 and all three bases on slave6. I ran
db_stat -l on them, the output is:
https://www.nbcs.rutgers.edu/~richton/its5171_dbstatl
BTW, this ABI screwup shouldn't be the root cause of the failures...I
haven't tried any db tools until the course of debugging this. These are
AUTOREMOVE, so db_archive is unlikely, for instance.
h.b.furuseth(a)usit.uio.no wrote:
> Full_Name: Hallvard B Furuseth
> Version: HEAD, RE23
> OS: Linux
> URL:
> Submission from: (NULL) (129.240.202.105)
> Submitted by: hallvard
>
>
> syncprov + back-ldap, and presumably + back-meta if that were used,
> give an array bounds violation in test045-syncreplication-proxied:
>
> syncprov_db_open() uses connection_fake_init(), which sets op->o_tag=0.
> It passes op to back-ldap.
>
> ldap_back_op_result() assumes op is a known LDAP request: It calls
> slap_req2op() and gets SLAP_OP_LAST (for unknown tag). That is used as
> an index into ldapinfo_t.li_timeout[], which has size SLAP_OP_LAST.
>
> back-meta/bind.c does the same in meta_back_bind_op_result() and
> meta_back_op_result().
I fixed back-ldap but I believe the same fix is still needed in back-meta.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Full_Name: HR Pattanaik
Version: 2.2.29
OS: Windows XP Professional
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (203.126.136.220)
When I have sign in my authentication page .It's successfully authenticate 5-6
times after that it's failing . I have searched lot of forums But I didn't get
any solution for that. So I have raised this issues on your web site. Hopefully
I'll get any solutions for that.Please reply me
I have mention my exception is here.Please follows below.
javax.naming.ServiceUnavailableException: localhost:
389; socket closed
Not sure if ordering of optional sequence members is required by RFC
4234, but the change you suggest sounds harmless. OpenLDAP software, in
this sense, is usually permissive in what is accepted and strict in what
is emitted.
Thanks, p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
On Sun, 2007-10-07 at 16:29 -0700, Howard Chu wrote:
> m.d.t.evans(a)qmul.ac.uk wrote:
> > Full_Name: Martin Evans
> > Version: 2.4.5beta
> > OS: Linux
> > URL: ftp://ftp.openldap.org/incoming/
> > Submission from: (NULL) (138.37.8.140)
> >
> >
> > This is mentioned in #4611 but marked as fixed there.
>
> Now fixed in HEAD.
>
Thanks! I spotted your fixes to slapi/plugin.c, back-monitor/conn.c and
back-monitor/database.c applied these to my version of 2.4.5beta and
they seem to work fine.
Martin.
--
-- Dr MDT Evans, Computing Services, Queen Mary, University of London
Full_Name: Brian Hanafee
Version: 1.24.2.2
OS: Mac OS/X
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (24.4.251.75)
The final definition in /servers/slapd/schema/openldap.schema is not valid per
RFC 4512.
It reads:
objectClass ( OpenLDAPobjectClass:6
NAME 'OpenLDAPdisplayableObject'
DESC 'OpenLDAP Displayable Object'
MAY displayName AUXILIARY )
Per RFC 4512, section 4.1.1, the 'kind' AUXILIARY comes before any MUST or MAY
entries. The corrected entry should read:
objectClass ( OpenLDAPobjectClass:6
NAME 'OpenLDAPdisplayableObject'
DESC 'OpenLDAP Displayable Object'
AUXILIARY
MAY displayName )
richton(a)nbcs.rutgers.edu wrote:
>> If this is happening even with slapd cleanly shut down then it should also
>> prevent slapd from restarting, since slapd first attempts to join an existing
>> environment before trying to create a new one. And that really implies that
>> the rest of the environment is shot.
>
> Agreed, but that's a pretty awful condition to have in a long-running
> slapd process. Without db_stat (easily) working, is there any hope at
> finding clues as to how this might have happened, or is it just time to
> rm/slapadd and hope it doesn't happen again?
It doesn't seem like we can get much more info out of this. One more thing to
try would be a full-debug build of libdb, so we can see exactly where it hangs
when trying to join the environment. Looking thru the code, I only see one
mutex to acquire the environment, and looking at your stack trace it's already
past that location, but the trace could be lying.
Also the mutex used to lock the environment is a regular mutex, not a
persistent lock. So when all processes have closed the environment, there
shouldn't be anything left to conflict with here. So most likely the
environment data structures are hosed, and the thread is locking against
itself. Again, we can't really tell without single-stepping thru the BDB
library code. It may not be worth the effort, but that's your call.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/