Full_Name: Miki Manó
Version: 2.4.18
OS: debian linux
URL:
Submission from: (NULL) (91.120.147.134)
hello
i'm running slapd server with 100 clients over ssl. i see the server don't close
the file descriptors always and the limit of open files is exceeded. how could i
help your work to find this bug?
karavelov(a)spnet.net wrote:
> Hello,
>
> I have recompiled slapd (2.4.18-release) on another machine (i386) in
> order to see if the bug is architecture dependent - the other servers
> are on amd64 architecture. It shows the same bug.
>
> The log file could be found here:
> http://purgatory.spnet.net/~karavelov/d2
>
> for the failing record, search for:
> filter="(&(objectClass=mailDomain)(dc=justillusion.net))"
>
> It does not fail with "scope not ok" error, but with "bdb_search: no
> candidates". I have seen the same error appear on the other servers
> (amd64) but I had trouble to isolate the whole history of the query. On
> this test server I have seen the other error too ("scope not ok") so the
> failing mode is not architecture dependent.
This is now fixed in HEAD overlays/pcache.c. The bug occurred because the
cache was seeing a child entry first, and so the dc=justillusion.net object
got created as a glue entry. Later when the actual dc=justillusion.net entry
was received, the modify to store its true values in the cache DB failed
because it needed the manageDSAit control.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Hello,
I have recompiled slapd (2.4.18-release) on another machine (i386) in
order to see if the bug is architecture dependent - the other servers
are on amd64 architecture. It shows the same bug.
The log file could be found here:
http://purgatory.spnet.net/~karavelov/d2
for the failing record, search for:
filter="(&(objectClass=mailDomain)(dc=justillusion.net))"
It does not fail with "scope not ok" error, but with "bdb_search: no
candidates". I have seen the same error appear on the other servers
(amd64) but I had trouble to isolate the whole history of the query. On
this test server I have seen the other error too ("scope not ok") so the
failing mode is not architecture dependent.
I have also noticed that there are some messages like:
bdb_add: dn2id_add failed: DB_LOCK_DEADLOCK: Locker killed to resolve a
deadlock (-30995)
May be this is in the source of the errors I see ?
The used config could be found here:
http://purgatory.spnet.net/~karavelov/slapd.conf.h2
Thanks in advance for help and suggestions
Luben
Hello,
I have got the same errors plus additional abort on assertion using
differend configuration.
I have decided to split slapd functions in 2 daemons:
1st instance of slapd is configured with back-sql and listent to unix
domain socket
2nd instance is a proxy to the first instance with pcache overlay - it
servers clients.
The config of the slapd with sql backend could be found here:
http://purgatory.spnet.net/~karavelov/proxy/slapd.conf
It works without any error
The config of the proxy could be found here:
http://purgatory.spnet.net/~karavelov/proxy/slapd-proxy.conf
It countinues to corrupt random records in the cache. A full log of the
slapd could be found here:
http://purgatory.spnet.net/~karavelov/proxy/debug-rubella.bg
For the failing record look for:
filter="(&(objectClass=mailDomain)(dc=rubella.bg))"
Additionally, now it aborts on assertion at random times. The output of
the server could be found here:
http://purgatory.spnet.net/~karavelov/proxy/slapd-abort
Started in background it emits a lot of ld warnings like the one in the
beginning of the latest file
Thanks in advance for any help and suggestions
Luben
jzeleny(a)redhat.com wrote:
> Full_Name: Jan Zeleny
> Version: 2.4.18
> OS: Fedora 11
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (62.40.79.66)
> Following bug report is a good introduction to the issue:
> https://bugzilla.redhat.com/show_bug.cgi?id=509230
>
> I managed to reproduce it simply by turning on TLS and setting TLSVerifyClient
> allow. In that configuration local connections to ldaps still work, but
> connections from remote machines don't work in about 80-90% cases.
>
> I tried to trace the bug, so far I found that when using this option, slapd
> sends it's certificate to TCP socket and gets the EAGAIN in the middle of
> writing. After that it goes to epoll_wait and there it waits indefinitely. I
> suspect the EAGAIN happens because TCP socket is full or something like that.
> Notice that when you turn on debugging information about packet handling, this
> issue disappears - maybe socket has time to get empty?
>
> I tried and confirmed the bug in several versions of openldap (incl. 2.4.18) and
> several Linux distributions to eliminate the possibility this issue is caused by
> some other component or it was solved already.
I'm unable to reproduce this using slapd on a debian x86-64 system, whether on
the local LAN or from 13 hops away. I've also used the tcp-buffer option to
set a minimum sized socket buffer and still could not duplicate the problem.
You will need to provide more explicit information on how to reproduce this
issue. Perhaps providing a set of CA/server certs will also be necessary.
Please note that the bug report you reference (509230) gives inconsistent
information; it says that no hang occurs with -d2, but that hangs occur with
no diagnostics, even with -d -1. Obviously -d -1 includes -d 2, so: does it
hang, or not, with -d -1?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Full_Name: Dagobert Michelsen
Version: 2.4.18
OS: Solaris
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (86.103.225.254)
There is a problem compiling the current OpenLDAP (2.4.18) in 64 bit
on Solaris. The problem occurs when building with modules and
enabling 64 bit in CFLAGS rather than setting CC to the compiler
including flags. The source of the problem is the old libtool
version 1.5.x included in OpenLDAP. The libtool maintainers
recommended upgrading libtool to at least 2.2 where the
problem was fixed.
The corresponding libtool thread is available at
<http://lists.gnu.org/archive/html/bug-libtool/2009-09/msg00017.html>
This is the excerpt of my tests:
This works fine:
./configure CC='/opt/studio/SOS11/SUNWspro/bin/cc -xarch=v9'
CPPFLAGS='-I/opt/csw/include' LDFLAGS='-L/opt/csw/lib/64 -R/opt/csw/ lib/64' &&
gmake
And it also works with modules:
./configure CC='/opt/studio/SOS11/SUNWspro/bin/cc -xarch=v9'
CPPFLAGS='-I/opt/csw/include' LDFLAGS='-L/opt/csw/lib/64 -R/opt/csw/ lib/64'
--enable-modules && gmake
Without modules this works also:
./configure CC='/opt/studio/SOS11/SUNWspro/bin/cc' CFLAGS='-xarch=v9'
CPPFLAGS='-I/opt/csw/include' LDFLAGS='-L/opt/csw/lib/64 -R/opt/csw/ lib/64' &&
gmake
This does not work:
./configure CC='/opt/studio/SOS11/SUNWspro/bin/cc' CFLAGS='-xarch=v9'
CPPFLAGS='-I/opt/csw/include' LDFLAGS='-L/opt/csw/lib/64 -R/opt/csw/ lib/64'
--enable-modules && gmake
For now this tiny patch fixed my problem, but it needs to be worked on for
general usage:
diff -Naur openldap-2.4.17.orig/build/ltmain.sh
openldap-2.4.17.patched/build/ltmain.sh
--- openldap-2.4.17.orig/build/ltmain.sh 2009-01-22 01:00:41.000000000
+0100
+++ openldap-2.4.17.patched/build/ltmain.sh 2009-09-11 14:26:06.136891084
+0200
@@ -4745,7 +4745,10 @@
case "$compile_command " in
*" -static "*) ;;
*) pic_flag_for_symtable=" $pic_flag";;
- esac
+ esac;;
+ *-*-solaris*)
+ LTCFLAGS="$compiler_flags"
+ ;;
esac
# Now compile the dynamic symbol file.
----- michael(a)stroeder.com wrote:
> Full_Name: Michael Str=EF=BF=BDder
> Version: HEAD
> OS:=20
> URL:=20
> Submission from: (NULL) (84.163.127.85)
>=20
>=20
> In slapo-chain(5) there is no descriptive text for configuration
> directive
> 'chain-rebind-as-user' yet like for the other directives.
Thanks,
Will take a look.
--=20
Kind Regards,
Gavin Henry.
OpenLDAP Engineering Team.
E ghenry(a)OpenLDAP.org
Community developed LDAP software.
http://www.openldap.org/project/
Full_Name: Michael Ströder
Version: HEAD
OS:
URL:
Submission from: (NULL) (84.163.127.85)
In slapo-chain(5) there is no descriptive text for configuration directive
'chain-rebind-as-user' yet like for the other directives.
jzeleny(a)redhat.com writes:
> I guess new version of glibc has some kind of mechanism which is
> checking boundaries of structures and isn't allowing write out of
> those boundaries.
Could you test if this works instead?
http://folk.uio.no/hbf/ol-struct-hack-1.patch
If that doesn't work, similar code elsewhere may be in danger.
Not that it's important in this case since the back-ldif code
isn't run often. It just avoids one malloc, one check for whether
that succeeded, and one free. Your patch forgot the last two.
Actually the boundary check you mention is exactly the problem the
BVL_NAME macro avoids, though I'm not sure why I didn't just use
the standard "struct hack". Maybe the problem is with padding
bytes after fname. Anyway, I suppose this means the old "struct
hack" is now definitely getting dangerous to use.
Whatever is going on, I'd like to find out. Which versions of gcc
and glibc, and which architecture is this? (32-bit i683, 64-bit
amd, etc). And if it doesn't take much time, could you try if
these variants fix the problem too?
http://folk.uio.no/hbf/ol-struct-hack-2.patchhttp://folk.uio.no/hbf/ol-struct-hack-3.patch
I don't plan to use them, we can use your variant if my first
patch doesn't work. I'm just curious what's going on.
--
Hallvard
Full_Name: Jan Zeleny
Version: 2.4.18
OS: Fedora 11
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (62.40.79.66)
Following bug report is a good introduction to the issue:
https://bugzilla.redhat.com/show_bug.cgi?id=509230
I managed to reproduce it simply by turning on TLS and setting TLSVerifyClient
allow. In that configuration local connections to ldaps still work, but
connections from remote machines don't work in about 80-90% cases.
I tried to trace the bug, so far I found that when using this option, slapd
sends it's certificate to TCP socket and gets the EAGAIN in the middle of
writing. After that it goes to epoll_wait and there it waits indefinitely. I
suspect the EAGAIN happens because TCP socket is full or something like that.
Notice that when you turn on debugging information about packet handling, this
issue disappears - maybe socket has time to get empty?
I tried and confirmed the bug in several versions of openldap (incl. 2.4.18) and
several Linux distributions to eliminate the possibility this issue is caused by
some other component or it was solved already.