(ITS#6276) paused pool can deadlock if writers are waiting
by hyc@OpenLDAP.org
Full_Name: Howard Chu
Version: RE24/HEAD
OS: Solaris 10
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (76.91.220.157)
Submitted by: hyc
test050 hung on me after some number of iterations. Unfortunately I didn't save
the stack traces, but basically there was one thread waiting in send_ldap_ber()
on the write2 cv, and another thread in config_back_add() waiting for a pool
pause to succeed. netstat showed that no connections had queued data, so there
should have been no reason for the writer to still be waiting.
I believe what happened here is that while the writer was waiting (it was a
syncprov qtask replaying events for a psearch) the psearch connection got
closed. Solaris is using select, and select() doesn't specially distinguish
socket close events - they're reported as read events. The deadlock is because
we queue read events into the thread pool, and we don't discover they're
actually closed sockets until the read thread gets to run and tries to read from
the socket (and gets zero bytes back). But since the pool is entering a pause,
the reader thread cannot run, so it can't detect the hangup and dispose of the
waiting writer.
The ideal fix for this is to process hangup events inline in the listener thread
instead of pushing them into the thread pool. But that requires being able to
cheaply determine that a hangup actually occurred, and select() doesn't give us
this information.
We could get this info using poll() instead. Since nowadays any POSIX platform
that implements select() also implements poll() we can probably just switch to
poll() and drop select(). One exception is Windows; Winsock only supports poll()
on Windows Vista and newer.
(Note, we had a patch that added a connection_hangup() handler for Linux epoll()
at one point, but I dropped it later because it seemed to have strange
interactions with Samba. Should look into resurrecting it again.)
I don't think we can really fix this issue without knowing for certain when
hangup events occur. If we're forced to keep using select, that implies that the
main listener thread must attempt a read on the socket before deciding how to
dispatch the connection. Any thoughts?
14 years, 3 months
Re: (ITS#6274) nssov: makefile suffix rules too greedy
by hyc@symas.com
jonathan(a)phillipoux.net wrote:
> Full_Name: Jonathan Clarke
> Version: RE24
> OS: Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (82.67.204.30)
>
>
> When running "make" in the nssov module from contrib/slapd-modules, I get this:
> 8<-------------------
> [...]/contrib/slapd-modules/nssov$ make
> ../../../libtool --mode=compile gcc -g -O2 -I../../../include
> -I../../../include -I../../../servers/slapd -Inss-ldapd -c alias.c nssov.h
> libtool: compile: cannot determine name of library object from `nssov.h'
> 8<-------------------
>
> This trivial patch to Makefile corrects this:
> 8<-------------------
> - $(LIBTOOL) --mode=compile $(CC) $(OPT) $(DEFS) $(INCS) -c $?
> + $(LIBTOOL) --mode=compile $(CC) $(OPT) $(DEFS) $(INCS) -c $<
> 8<-------------------
>
>
Thanks, fixed in HEAD.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
14 years, 3 months
(ITS#6275) syncrepl taking long(not sync) when consumer not connect for a moment
by rlvcosta@yahoo.com
Full_Name: Rodrigo Luiz Vargas Costa
Version: 2.4.17
OS: CentOS release 5.2 (Final)
URL: ftp://ftp.openldap.org/incoming/<TBD>
Submission from: (NULL) (135.245.8.5)
Openldap developers,
I have being exchange some information at openldap lists where looks like some
improvements are being done in replication for release 2.4.18.
The architecture I'm running has 2 machines in MirrorMode in the same subnet(at
the same switch). These systems are part of a HA system sharing a VIP and where
both machines have slapd running simultaneously(bind to any local interface) and
only VIP is exchanged for HA purposes.
The issue I'm facing is related, in a general user view, is when I stop the
secondary Provider2(master 2) for backup purposes using slapcat. The
Provider1(master 1) continues to provide ldap service where some entrances can
be created during the time backup is running(no consumer from Provider 2).
Even a small number of entrances are different when consumer in Provider 2
connects to Provider 1 then syncrepl enters in the full DB search as expected.
For definition purposes I have some memory limitations where I need to limit
dncachesize for around 80% of DB entrances.
>From a user perspective I see that after cache is filled system enters in some
state where synchronization doesn't happen anymore. For full reference(config,
gdb, etc), please see file attached in FTP.
Then I see 2 issues :
1)Consumer from Provider2, even passed days and only a small number of
differences for test purpose happen(no traffic), the syncrepl never ends and
there isn't replication(Provider 1 stay continuously consuming 100% CPU);
2)Even I stop the Provider2(then its consumer) I do not see any change in
Provider 1 activities. The CPU continues in 100% even passed days what suggest
some hang in the thread or logic.
I compiled openldap with GDB symbols and then execute some traces in the threads
during the state 2 report above. Looks like it stay looping forever locked in
some thread lock.
I could also note that when in this situation the monitor cache, in a very slow
pace, changes the cache in a single entrance. Being more specific :
dn: cn=Database 1,cn=Databases,cn=Monitor
structuralObjectClass: monitoredObject
creatorsName:
modifiersName:
createTimestamp: 20090821145848Z
modifyTimestamp: 20090821145848Z
monitoredInfo: bdb
monitorIsShadow: TRUE
namingContexts: ou=CONTENT,o=domain,c=fr
readOnly: FALSE
monitorOverlay: syncprov
olmBDBEntryCache: 19920
olmBDBDNCache: 3896287
olmBDBIDLCache: 2
olmDbDirectory: /var/openldap-data/bdb1/
entryDN: cn=Database 1,cn=Databases,cn=Monitor
subschemaSubentry: cn=Subschema
hasSubordinates: TRUE
Stays running in the values 3896287 and 3896288. Looks like the memory re-use is
being too short causing locks that takes long time causing a non
synchronization.
I made several GDB traces for different conditions. Please see ftp attachment
file for details.
Thanks,
Rodrigo.
PS-> I could not put the file in the openldap ftp. It says device full. Please
let me know how can I send this file.
14 years, 3 months
Re: (ITS#6257) libldap: getopt flag to return the SASL username
by michael@stroeder.com
masarati(a)aero.polimi.it wrote:
>> masarati(a)aero.polimi.it wrote:
>
>> I'd appreciate it very much if it would be exactly behave in the same way
>> like
>> all other string-valued options.
>
> On a somewhat related issue, I note that LDAP_OPT_X_SASL_MECHLIST returns
> a pointer to an array of chars that apparently cannot be mucked with.
>
> Assuming my understanding is correct, I wonder if this behavior is
> desirable or not, given the fact that if another mech is added, e.g. by
> adding a dynamic module, I expect this list to change.
These are SASL mechs with the plugin modules. Right?
>From an operational standpoint: If a SASL plugin module for a mech was added I
think it's acceptable that a software which queries this option is restarted
before this SASL mech is known to the software. Probably one has to add
additional configuration for this SASL mech.
Now the question is what happens if a SASL plugin module is removed and the
software trys to use the removed SASL mech. Clearly removing plugin modules in
a running system is asking for trouble anyway...
Having said this I would not care too much about this list going to change...
Ciao, Michael.
14 years, 3 months
(ITS#6274) nssov: makefile suffix rules too greedy
by jonathan@phillipoux.net
Full_Name: Jonathan Clarke
Version: RE24
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (82.67.204.30)
When running "make" in the nssov module from contrib/slapd-modules, I get this:
8<-------------------
[...]/contrib/slapd-modules/nssov$ make
../../../libtool --mode=compile gcc -g -O2 -I../../../include
-I../../../include -I../../../servers/slapd -Inss-ldapd -c alias.c nssov.h
libtool: compile: cannot determine name of library object from `nssov.h'
8<-------------------
This trivial patch to Makefile corrects this:
8<-------------------
- $(LIBTOOL) --mode=compile $(CC) $(OPT) $(DEFS) $(INCS) -c $?
+ $(LIBTOOL) --mode=compile $(CC) $(OPT) $(DEFS) $(INCS) -c $<
8<-------------------
14 years, 3 months
Re: (ITS#6200) slapd crashes under load w/ syncrepl
by richton@rci.rutgers.edu
What does this look with top and/or in dmesg over the run time? Is it a
simple out-of-memory? Definitely a bit of a gross method, but what's ls
-lh core show for size?
(If so, is it warranted given your load or is there a leak, etc etc...and
of course make sure you're up to date on the surrounding packages,
OpenLDAP isn't the only thing that can leak.)
14 years, 3 months
Re: (ITS#6270) Conflict between ppolicy (pwdReset flag) and unique overlays
by michael@stroeder.com
michael(a)stroeder.com wrote:
> clem.oudot(a)gmail.com wrote:
>> I have a rootdn. An extract of my slapd.conf is :
>>
>> ---
>> database bdb
>> suffix dc=3Dexample,dc=3Dcom
>> rootdn cn=3Dmanager,dc=3Dexample,dc=3Dcom
>> rootpw secret
>> directory /var/lib/ldap
>>
>> overlay ppolicy
>> ppolicy_use_lockout
>> ppolicy_hash_cleartext
>>
>> overlay unique
>> unique_uri ldap:///ou=3Dusers,dc=3Dexample,dc=3Dcom?uid?sub?(objectClass=3D=
>> inetOrgPerson)
>
> Could you please repost this without the broken quoted printables? Especially
> the filter part of the value for 'unique_uri'.
I suspect the ITS software messed this up because the direct Cc:-ed messages
to me did not contain the messed up quoted printables. So here's what Clément
orginally sent as excerpt of his config:
---
database bdb
suffix dc=example,dc=com
rootdn cn=manager,dc=example,dc=com
rootpw secret
directory /var/lib/ldap
overlay ppolicy
ppolicy_use_lockout
ppolicy_hash_cleartext
overlay unique
unique_uri ldap:///ou=users,dc=example,dc=com?uid?sub?(objectClass=inetOrgPerson)
---
Ciao, Michael.
14 years, 3 months
Re: (ITS#6270) Conflict between ppolicy (pwdReset flag) and unique overlays
by michael@stroeder.com
clem.oudot(a)gmail.com wrote:
> I have a rootdn. An extract of my slapd.conf is :
>
> ---
> database bdb
> suffix dc=3Dexample,dc=3Dcom
> rootdn cn=3Dmanager,dc=3Dexample,dc=3Dcom
> rootpw secret
> directory /var/lib/ldap
>
> overlay ppolicy
> ppolicy_use_lockout
> ppolicy_hash_cleartext
>
> overlay unique
> unique_uri ldap:///ou=3Dusers,dc=3Dexample,dc=3Dcom?uid?sub?(objectClass=3D=
> inetOrgPerson)
Could you please repost this without the broken quoted printables? Especially
the filter part of the value for 'unique_uri'.
Ciao, Michael.
14 years, 3 months
(ITS#6273) NSS overlay (nssov) fails to load
by battery@writeme.com
Full_Name: Matt Kassawara
Version: 2.4.17
OS: Ubuntu 9.10 (Karmic)
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (8.7.94.151)
Loading the nssov module using 'ldapadd' reports an olcModuleLoad handler
error...
# ldapadd -H ldapi:/// -Y external
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
dn: cn=module{2},cn=config
objectclass: olcmodulelist
cn: module{2}
olcmoduleload: {0}nssov
olcmodulepath: /usr/lib/ldap
adding new entry "cn=module{2},cn=config"
ldap_add: Other (e.g., implementation specific) error (80)
additional info: <olcModuleLoad> handler exited with 1
Output from 'slapd' in debug level 7 (incorrectly) reports file not found...
oc_check_required entry (cn=module{2},cn=config), objectClass "olcModuleList"
oc_check_allowed type "objectClass"
oc_check_allowed type "cn"
oc_check_allowed type "olcModuleLoad"
oc_check_allowed type "olcModulePath"
oc_check_allowed type "structuralObjectClass"
lt_dlopenext failed: (nssov) file not found
Output from 'LD_DEBUG' reveals undefined symbol 'ber_bvmatch' in nssov.so.0...
18016: /usr/lib/ldap/nssov.so.0: error: symbol lookup error: undefined
symbol: ber_bvmatch (fatal)
18016:
18016: file=/usr/lib/ldap/nssov.so.0 [0]; destroying link map
14 years, 3 months
(ITS#6272) test045 freed memory access
by richton@nbcs.rutgers.edu
Full_Name: Aaron Richton
Version: RE24
OS: Solaris 9
URL: https://www.nbcs.rutgers.edu/~richton/richton-bt-200908221228.txt
Submission from: (NULL) (128.6.31.135)
t@5 (l@5) terminated by signal SEGV (no mapping at the fault address)
Current function is connection_abandon
729 op.orn_msgid = o->o_msgid;
(dbx) where
current thread: t@5
=>[1] connection_abandon(c = 0x106df088), line 729 in "connection.c"
[2] connection_closing(c = 0x106df088, why = 0x2859e0 "connection lost"), line
777 in "connection.c"
[3] connection_read(s = 11, cri = 0xfd3ffd64), line 1427 in "connection.c"
[4] connection_read_thread(ctx = 0xfd3ffe0c, argv = 0xb), line 1245 in
"connection.c"
[5] ldap_int_thread_pool_wrapper(xpool = 0x10491f08), line 685 in "tpool.c"
(dbx) print o->o_hdr
o->o_hdr = 0xdeadbeef
Full backtrace in ITS link. testrun directory:
https://www.nbcs.rutgers.edu/~richton/richton-testrun-200908221228.tar.bz2
14 years, 3 months