(ITS#5510) Does GSSAPI failure kill slapd?
by aej@wpi.edu
Full_Name: Allan E. Johannesen
Version: 2.4.9
OS: RHEL4 i686
URL:
Submission from: (NULL) (130.215.24.208)
slapd appeared to exit after this:
May 14 13:05:42 ALUM slapd[28252]: SASL [conn=1] Failure: GSSAPI Error: The
context has expired (No error)
May 14 13:05:42 ALUM slapd[28252]: send_search_entry: conn 1 ber write failed.
Is termination expected or should I look for something else?
Thanks.
If you want me to post anything about the environment, please let me know.
15 years, 4 months
Re: (ITS#5504) ldapsearch hangs retrieving info
by quanah@zimbra.com
--On Wednesday, May 14, 2008 9:56 AM +0000 Javier.Fernandez(a)cern.ch wrote:
> unfortunately there?s no update for openladp under SL4, that?s why we
> are using such version. In fact I see no newer versions but for Fedora
> and Mandriva
<http://staff.telkomsa.net/packages/>
or
<http://www.symas.com>
> In fact, other sites are living nice with that version or older ones.
> In any case, I have compiled and built latest stable version from
> openldap project webpage (2.3.39) and I get the same problem. I'm not
> saying this is a bug from ldap, but something with local area network
> configuration.
>
> I'm asking for some support to debug this problem actually.
If the bug is not specifically in the OpenLDAP software, I suggest you
peruse:
<http://www.openldap.org/support/>
I would note I don't see anything particular in what you provide indicating
the problem is with ldapsearch. What version of OpenLDAP is the server in
question running (I see it is OpenLDAP by querying its rootDSE)?
I'll note that a *limited* ldapsearch works just fine:
[quanah@freelancer ~]$ ldapsearch -x -H ldap://exp-bdii.cern.ch:2170 -b ""
-s base +
# extended LDIF
#
# LDAPv3
# base <> with scope baseObject
# filter: (objectclass=*)
# requesting: +
#
#
dn:
structuralObjectClass: OpenLDAProotDSE
namingContexts: o=grid
supportedControl: 2.16.840.1.113730.3.4.18
supportedControl: 2.16.840.1.113730.3.4.2
supportedControl: 1.3.6.1.4.1.4203.1.10.1
supportedControl: 1.2.840.113556.1.4.1413
supportedControl: 1.2.840.113556.1.4.1339
supportedControl: 1.2.840.113556.1.4.319
supportedControl: 1.2.826.0.1.334810.2.3
supportedExtension: 1.3.6.1.4.1.1466.20037
supportedExtension: 1.3.6.1.4.1.4203.1.11.1
supportedExtension: 1.3.6.1.4.1.4203.1.11.3
supportedFeatures: 1.3.6.1.4.1.4203.1.5.1
supportedFeatures: 1.3.6.1.4.1.4203.1.5.2
supportedFeatures: 1.3.6.1.4.1.4203.1.5.3
supportedFeatures: 1.3.6.1.4.1.4203.1.5.4
supportedFeatures: 1.3.6.1.4.1.4203.1.5.5
supportedLDAPVersion: 2
supportedLDAPVersion: 3
supportedSASLMechanisms: DIGEST-MD5
supportedSASLMechanisms: CRAM-MD5
subschemaSubentry: cn=Subschema
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1
[quanah@freelancer ~]$
I would also note that for me, doing a dump of the entire server works just
fine:
ldapsearch -x -H ldap://exp-bdii.cern.ch:2170 -b "o=grid"
results in:
# search result
search: 2
result: 0 Success
# numResponses: 43962
# numEntries: 43961
Adding -d -1 to the query, I eventually see the same thing you do:
ber_get_next failed.
wait4msg continue ld 0x233f0e0 msgid -1 all 0
** ld 0x233f0e0 Connections:
* host: exp-bdii.cern.ch port: 2170 (default)
refcnt: 2 status: Connected
last used: Wed May 14 09:34:06 2008
** ld 0x233f0e0 Outstanding Requests:
* msgid 2, origid 2, status InProgress
outstanding referrals 0, parent count 0
** ld 0x233f0e0 Response Queue:
Empty
ldap_chkResponseList ld 0x233f0e0 msgid -1 all 0
ldap_chkResponseList returns ld 0x233f0e0 NULL
ldap_int_select
read1msg: ld 0x233f0e0 msgid -1 all 0
ber_get_next
ldap_read: want=1142, got=0
ber_get_next failed.
ldap_perror
ldap_result: Can't contact LDAP server (-1)
ldap_free_request (origid 2, msgid 2)
ldap_free_connection 1 1
ldap_send_unbind
ber_flush: 7 bytes to sd 3
0000: 30 05 02 01 03 42 00 0....B.
ldap_write: want=7, written=7
0000: 30 05 02 01 03 42 00 0....B.
ldap_free_connection: actually freed
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
15 years, 4 months
(ITS#5504) ldapsearch hangs retrieving info
by Javier.Fernandez@cern.ch
Hi Howard,
unfortunately thereŽs no update for openladp under SL4, thatŽs why we
are using such version. In fact I see no newer versions but for Fedora
and Mandriva
http://www.rpmfind.net/linux/rpm2html/search.php?query=openldap
In fact, other sites are living nice with that version or older ones.
In any case, I have compiled and built latest stable version from
openldap project webpage (2.3.39) and I get the same problem. I'm not
saying this is a bug from ldap, but something with local area network
configuration.
I'm asking for some support to debug this problem actually.
Javi
--
+--------------------------------------------------------------+
Javier Fernandez Menendez
Grupo de Fisica de AAEE
Universidad de Oviedo
C/ Calvo Sotelo, s/n
33005 Oviedo
Phone: +34 985106252
mailto:Javier.Fernandez@cern.ch
+---------------------------------------------------------------+
15 years, 4 months
Re: (ITS#5507) Ldap clients leak file descriptors
by h.b.furuseth@usit.uio.no
I've no idea how SELinux works, so I'm not up to testing any of what
I'm saying here myself, but some notes follow anyway. Please clarify:
> On system protected by SELinux, when an application with active LDAP
> connection tries to exec() binary with different security context,
> SELinux inspects all opened filedescriptors, including the ldap one,
> and denies access to the ones, which do not conform active policy (the
> executed binary is not authorized to contact LDAP servers). Users are
> then annoyed by security warnings in the logs.
More to the point, some of those security warnings may indicate real
security problems. Bind with password, exec some application, and then
that application has the bound user's access to LDAP.
The message could be indicating a bug in the app - that it should
release its resources (such as descriptors) before exec(). Except, I
presume this happens in system() as well? I'd be unfortunate if
LDAP apps could not use system() safely.
> +#ifdef _GNU_SOURCE
> + fcntl(s, F_SETFD, FD_CLOEXEC);
> +#endif
_GNU_SOURCE depends on the compiler flags. Please try
#ifdef FD_CLOEXEC
instead.
Also I expect the same should be done in os-local.c:ldap_pvt_socket(),
which opens ldapi:// sockets. Can you check if the same problem
occurs if you run
slapd ... -h ldapi://
and a client which listens to that URL instead of ldap:// ?
ldapi:// creates a unix-domain socket file somehwere, typically
/usr/local/var/run/ldapi
If you want to use some other filename you can use
ldapi://<URL-escaped filename>/
e.g.
ldapi://%2Fhome%2Fjsafrane%2Fldapi/
--
Hallvard
15 years, 4 months
Re: (ITS#5504) ldapsearch hangs retrieving info
by Javier.Fernandez@cern.ch
Hi Howard,
unfortunately thereŽs no update for openladp under SL4, thatŽs why we
are using such version. In fact I see no newer versions but for Fedora
and Mandriva
http://www.rpmfind.net/linux/rpm2html/search.php?query=openldap
In fact, other sites are living nice with that version or older ones.
In any case, I have compiled and built latest stable version from
openldap project webpage (2.3.39) and I get the same problem. I'm not
saying this is a bug from ldap, but something with local area network
configuration.
I'm asking for some support to debug this problem actually.
Javi
--
+--------------------------------------------------------------+
Javier Fernandez Menendez
Grupo de Fisica de AAEE
Universidad de Oviedo
C/ Calvo Sotelo, s/n
33005 Oviedo
Phone: +34 985106252
mailto:Javier.Fernandez@cern.ch
+---------------------------------------------------------------+
15 years, 4 months
Re: (ITS#5465) Delta-Syncrepl cookie problems
by quanah@zimbra.com
More data on another occurrence of this problem, this time with some packet level logging.
15:56:55 is when the replica's disconnect:
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463287 op=1 MOD dn="uid=xxxxx,ou=people,dc=xxx,dc=xxxx,dc=xxx"
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463287 op=1 MOD attr=zimbraPasswordLockoutFailureTime zimbraPasswordLockoutFailureTime
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463286 op=1 RESULT tag=103 err=0 text=
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463286 op=2 SRCH base="uid=yyyyyy,ou=people,dc=xxx,dc=xxxx,dc=xxxx" scope=0 deref=3 filter="(objectClass=*)"
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463286 op=2 SEARCH RESULT tag=101 err=0 nentries=1 text=
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463286 op=3 UNBIND
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463286 fd=27 closed
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463287 op=1 RESULT tag=103 err=0 text=
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463287 op=2 SRCH base="uid=xxxxx,ou=people,dc=xxx,dc=xxxx,dc=xxx" scope=0 deref=3 filter="(objectClass=*)"
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463287 op=2 SEARCH RESULT tag=101 err=0 nentries=1 text=
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463287 op=3 UNBIND
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1463287 fd=28 closed
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1443587 op=3 UNBIND
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1443587 fd=25 closed
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1443455 op=3 UNBIND
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1443455 fd=40 closed
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1367867 op=3 UNBIND
May 7 15:56:55 neo-ldap-1 slapd[7732]: conn=1367867 fd=24 closed
1443587, 1443455 are the connections from the two replicas.
In the packet log for the replica, I see:
l
0000: 30 0d 02 02 3c da 65 07 0a 01 00 04 00 04 00 0...<.e........
ldap_write: want=15, written=15
0000: 30 0d 02 02 3c da 65 07 0a 01 00 04 00 04 00 0...<.e........
ldap_read: want=8, got=0
ldap_read: want=8, got=0
ldap_read: want=8, got=8
0000: 30 7a 02 01 03 64 30 04 0z...d0.
ldap_read: want=116, got=116
0000: 2c 72 65 71 53 74 61 72 74 3d 32 30 30 38 30 34 ,reqStart=200804
0010: 32 39 32 30 35 36 32 30 2e 30 30 30 30 30 31 5a 29205620.000001Z
0020: 2c 63 6e 3d 61 63 63 65 73 73 6c 6f 67 30 00 a0 ,cn=accesslog0..
0030: 43 30 41 04 18 31 2e 33 2e 36 2e 31 2e 34 2e 31 C0A..1.3.6.1.4.1
0040: 2e 34 32 30 33 2e 31 2e 39 2e 31 2e 32 04 25 30 .4203.1.9.1.2.%0
0050: 23 0a 01 03 04 10 76 89 f9 82 aa 7a 10 2c 8e 1b #.....v....z.,..
0060: 05 5f 81 eb fa ec 04 0c 63 73 6e 3d 2c 72 69 64 ._......csn=,rid
0070: 3d 31 30 30 =100
0000: 30 05 02 01 04 42 00 0....B.
ldap_write: want=7, written=7
0000: 30 05 02 01 04 42 00 0....B.
do_syncrepl: rid 100 retrying
This looks to be due to expiry, as that timestamp is approximately 7 days before the current time.
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
15 years, 4 months
Re: (ITS#5508) slapd process consumes all of CPU
by whm@stanford.edu
--On Tuesday, May 13, 2008 02:45:06 AM -0700 Howard Chu <hyc(a)symas.com>
wrote:
> Bill MacAllister wrote:
>> Attached is the output of db4.2_stat -CA of the database.
>>
>> Thanks for looking at this.
>>
> So far it just looks like a very busy server. Can you turn off the
> network access to it and see if it settles down when the query traffic
> stops?
Last night the server tried to do a log rotation. When I look at the log
now it is zero length and nothing is getting written to it. An ldapsearch
on the server just hangs.
I logged into the console, shutdown the network interface down and the CPU
is still pinned.
> It's a bit odd that a single transaction has so many pages of the
> suPrivilegeGroup index locked.
>
> The backtrace is somewhat suspicious, there are several <value optimized
> out> items in the trace. In thread 8, frames 5 and 6 the locker value is
> odd; usually in BDB the locker ID associated with a transaction has bit
> 31 set, yielding a very large 32 bit number. Also there is no locker with
> that ID in the db_stat output you provided.
>
> It looks like you'll have to try this again with a non-optimized binary
> to get a reliable backtrace.
Yes, we were afraid of that. I will build a debug version of bdb. The
real rub is that we don't seem to be able to make this happen on demand. I
tried taking the log from the pinned server, turned the log into a shell
script of ldapsearch commands, and pointed it at another server. I could
not make the second server go CPU bound. So, we will just have to deploy
the debug bdb support on our test servers and wait.
Bill
>> Bill
>>
>> --On Tuesday, May 13, 2008 01:20:49 AM -0700 Howard Chu<hyc(a)symas.com>
>> wrote:
>>
>>> whm(a)stanford.edu wrote:
>>>> Full_Name: Bill MacAllister
>>>> Version: 2.3.41-1su2
>>>> OS: debian etch kernel 2.6.18-4-amd64
>>>> URL: http://www.stanford.edu/~whm/ldap-test1-bt.txt
>>>> Submission from: (NULL) (171.64.19.165)
>>>>
>>>>
>>>> The slapd process will sometimes consume all of available CPU. We
>>>> observed this when we upgraded our production servers from 2.3.35-2su2
>>>> to 2.3.41-1su2. The problem was bad enough that we downgraded the
>>>> production servers to 2.3.35-2su2. We have been trying to provoke the
>>>> problem in our test environment and have not been successful in
>>>> making it happen on demand. Today, we noticed that one of our test
>>>> servers went completely CPU bound. I took a backtrace. It is
>>>> available at the URL below. The interesting thing about the problem
>>>> is that although top shows a pinned CPU and a high load the server is
>>>> still responsive and continues to answer LDAP searches. The test
>>>> server that exhibits the problem is still CPU bound and has been for
>>>> 2-3 hours now. We will leave this server in this state in case there
>>>> is other information that we should harvest in resolving the problem.
>>> Please also provide the output from db_stat -CA on the database in
>>> question, thanks.
--
Bill MacAllister <whm(a)stanford.edu>
Systems Programmer, ITS Unix Systems, Stanford University
15 years, 4 months
Re: (ITS#5508) slapd process consumes all of CPU
by hyc@symas.com
Bill MacAllister wrote:
> Attached is the output of db4.2_stat -CA of the database.
>
> Thanks for looking at this.
>
So far it just looks like a very busy server. Can you turn off the network
access to it and see if it settles down when the query traffic stops?
It's a bit odd that a single transaction has so many pages of the
suPrivilegeGroup index locked.
The backtrace is somewhat suspicious, there are several <value optimized out>
items in the trace. In thread 8, frames 5 and 6 the locker value is odd;
usually in BDB the locker ID associated with a transaction has bit 31 set,
yielding a very large 32 bit number. Also there is no locker with that ID in
the db_stat output you provided.
It looks like you'll have to try this again with a non-optimized binary to get
a reliable backtrace.
> Bill
>
> --On Tuesday, May 13, 2008 01:20:49 AM -0700 Howard Chu<hyc(a)symas.com>
> wrote:
>
>> whm(a)stanford.edu wrote:
>>> Full_Name: Bill MacAllister
>>> Version: 2.3.41-1su2
>>> OS: debian etch kernel 2.6.18-4-amd64
>>> URL: http://www.stanford.edu/~whm/ldap-test1-bt.txt
>>> Submission from: (NULL) (171.64.19.165)
>>>
>>>
>>> The slapd process will sometimes consume all of available CPU. We
>>> observed this when we upgraded our production servers from 2.3.35-2su2
>>> to 2.3.41-1su2. The problem was bad enough that we downgraded the
>>> production servers to 2.3.35-2su2. We have been trying to provoke the
>>> problem in our test environment and have not been successful in making
>>> it happen on demand. Today, we noticed that one of our test servers
>>> went completely CPU bound. I took a backtrace. It is available at the
>>> URL below. The interesting thing about the problem is that although top
>>> shows a pinned CPU and a high load the server is still responsive and
>>> continues to answer LDAP searches. The test server that exhibits the
>>> problem is still CPU bound and has been for 2-3 hours now. We will
>>> leave this server in this state in case there is other information that
>>> we should harvest in resolving the problem.
>> Please also provide the output from db_stat -CA on the database in
>> question, thanks.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
15 years, 4 months