Dear openldap-bugs,
I've had a reasonable trawl through the archives, and not found anything that matches the issue I'm seeing, but please point me in the right direction if I've missed anything.
(Apologies to anyone receiving this more than once, I tried sending this report to to openldap-bugs and openldap-its over the last few days seen no sign of it in the archives, and not received any message about pending moderation, so thought I would try openldap-technical which might be more appropriate anyway.)
We use OpenLDAP for a few directories, one of which is in the process of being migrated to newer hardware, with OS upgrade thrown in, and I've noticed an issue with ldapmodify that I thought was worth reporting. The directory in question has some scripted tooling around it to manage updates from a number of sources, which are staged in a Postgresql database before having some LDIF generated to update the directory itself.
During the course of my testing (we've not seen this in production, thankfully) I've noticed that, with reasonably lengthy updates for an entry, ldapmodify dies with an error like the following:
#---8<--- Command Output ---------------------------------------------- modifying entry "<DN-FOR-FAILED-ENTRY>" ldap_result: Can't contact LDAP server (-1) #---8<-----------------------------------------------------------------
There are matching log-entries in the system's syslog (timestamp, hostname and PID trimmed off to save some linewrapping) and slapd logs (we run slapd under daemontools supervision, and capture it's stdout/stderr):
#---8<---- SysLog Output ---------------------------------------------- local4.debug slapd: conn=1002 op=3158 MOD dn="<DN-OF-LAST-GOOD-ENTRY>" local4.debug slapd: conn=1002 op=3158 MOD attr=member local4.debug slapd: conn=1002 op=3158 RESULT tag=103 err=0 text= local4.debug slapd: conn=1002 fd=13 closed (connection lost) #---8<-----------------------------------------------------------------
#---8<---- Slapd Output ----------------------------------------------- 54a2e47f conn=1002 op=3158 MOD dn="<DN-OF-LAST-GOOD-ENTRY>" 54a2e47f conn=1002 op=3158 MOD attr=member 54a2e47f conn=1002 op=3158 RESULT tag=103 err=0 text= sb_sasl_cyrus_decode: failed to decode packet: generic failure sb_sasl_generic_read: failed to decode packet 54a2e47f conn=1002 fd=13 closed (connection lost) #---8<-----------------------------------------------------------------
The LDIF for the failed entry consists of:
#---8<----------------------------------------------------------------- dn: <DN-FOR-FAILED-ENTRY> changetype: modify replace: member member: <DN-FOR-MEMBER> ... member: <DN-FOR-ANOTHER-MEMBER> #---8<-----------------------------------------------------------------
where the list of members was, in this case, 9799 long. The LDIF itself is 30097 lines long, and was happy for the first ~15000 lines.
If I prune out the modifications for the troublesome DN, the remainder of the file also goes through happily.
As a work-around I can manually split up the list into several blocks (tested with roughly 1000 member updates per block) with "replace: member" for the first, to match the current behaviour, and "add: member" for the rest. In this format, ldapmodify is happy to process the LDIF (all in one connection, but as discreet operations). (Note: the authenticated user has "unlimited" limits in the config)
Given that, it sounds like it could be a bug in ldapmodify (I don't think it's on the slapd end). I've tested on both the debian stable package (version 2.4.31-1+nmu2), and with a locally compiled build direct from the project's download pages (also 2.4.31 to start with, to see if it was an introduced bug).
Has anyone else on the list seen anything like this before?
Thanks for your time and any hints and tips.
Cheers.
Dameon.
Dameon Wagner wrote:
Dear openldap-bugs,
I've had a reasonable trawl through the archives, and not found anything that matches the issue I'm seeing, but please point me in the right direction if I've missed anything.
(Apologies to anyone receiving this more than once, I tried sending this report to to openldap-bugs and openldap-its over the last few days seen no sign of it in the archives, and not received any message about pending moderation, so thought I would try openldap-technical which might be more appropriate anyway.)
openldap-its is not a discussion list at all. Both lists require you to actually submit a ticket and reference the ticket number.
We use OpenLDAP for a few directories, one of which is in the process of being migrated to newer hardware, with OS upgrade thrown in, and I've noticed an issue with ldapmodify that I thought was worth reporting. The directory in question has some scripted tooling around it to manage updates from a number of sources, which are staged in a Postgresql database before having some LDIF generated to update the directory itself.
During the course of my testing (we've not seen this in production, thankfully) I've noticed that, with reasonably lengthy updates for an entry, ldapmodify dies with an error like the following:
Define lengthy - how many bytes?
#---8<--- Command Output ---------------------------------------------- modifying entry "<DN-FOR-FAILED-ENTRY>" ldap_result: Can't contact LDAP server (-1) #---8<-----------------------------------------------------------------
Define crash - that doesn't look like a crash.
There are matching log-entries in the system's syslog (timestamp, hostname and PID trimmed off to save some linewrapping) and slapd logs (we run slapd under daemontools supervision, and capture it's stdout/stderr):
#---8<---- SysLog Output ---------------------------------------------- local4.debug slapd: conn=1002 op=3158 MOD dn="<DN-OF-LAST-GOOD-ENTRY>" local4.debug slapd: conn=1002 op=3158 MOD attr=member local4.debug slapd: conn=1002 op=3158 RESULT tag=103 err=0 text= local4.debug slapd: conn=1002 fd=13 closed (connection lost) #---8<-----------------------------------------------------------------
#---8<---- Slapd Output ----------------------------------------------- 54a2e47f conn=1002 op=3158 MOD dn="<DN-OF-LAST-GOOD-ENTRY>" 54a2e47f conn=1002 op=3158 MOD attr=member 54a2e47f conn=1002 op=3158 RESULT tag=103 err=0 text= sb_sasl_cyrus_decode: failed to decode packet: generic failure sb_sasl_generic_read: failed to decode packet 54a2e47f conn=1002 fd=13 closed (connection lost) #---8<-----------------------------------------------------------------
That looks like a bug in Cyrus SASL. What SASL mechanism are you using?
The LDIF for the failed entry consists of:
#---8<----------------------------------------------------------------- dn: <DN-FOR-FAILED-ENTRY> changetype: modify replace: member member: <DN-FOR-MEMBER> ... member: <DN-FOR-ANOTHER-MEMBER> #---8<-----------------------------------------------------------------
where the list of members was, in this case, 9799 long. The LDIF itself is 30097 lines long, and was happy for the first ~15000 lines.
If I prune out the modifications for the troublesome DN, the remainder of the file also goes through happily.
As a work-around I can manually split up the list into several blocks (tested with roughly 1000 member updates per block) with "replace: member" for the first, to match the current behaviour, and "add: member" for the rest. In this format, ldapmodify is happy to process the LDIF (all in one connection, but as discreet operations). (Note: the authenticated user has "unlimited" limits in the config)
Given that, it sounds like it could be a bug in ldapmodify (I don't think it's on the slapd end). I've tested on both the debian stable package (version 2.4.31-1+nmu2), and with a locally compiled build direct from the project's download pages (also 2.4.31 to start with, to see if it was an introduced bug).
Has anyone else on the list seen anything like this before?
2.4.31 is over 2 years old. 2.4.41 is about to be released. Debian is known to break their packages from time to time with ill-advised "security" patches. You should try with freshly built source.
Thanks for your time and any hints and tips.
Cheers.
Dameon.
On Sat, Jan 03 2015 at 01:31:10 +0000, Howard Chu scribbled in "Re: ldapmodify crashes with long-ish input for an entry": <SNIP>
Hi Howard,
openldap-its is not a discussion list at all. Both lists require you to actually submit a ticket and reference the ticket number.
Thanks for the clarification.
We use OpenLDAP for a few directories, one of which is in the process of being migrated to newer hardware, with OS upgrade thrown in, and I've noticed an issue with ldapmodify that I thought was worth reporting. The directory in question has some scripted tooling around it to manage updates from a number of sources, which are staged in a Postgresql database before having some LDIF generated to update the directory itself.
During the course of my testing (we've not seen this in production, thankfully) I've noticed that, with reasonably lengthy updates for an entry, ldapmodify dies with an error like the following:
Define lengthy - how many bytes?
For the complete LDIF of the entry update in question, it is 672808 bytes (658K) -- I haven't done an exhaustive "trial and error" analysis to find the boundary between OK and too-big, but at 68233 bytes (67K, or 1000 lines of updates for the entry) the update completes OK, but at 205105 bytes (201K or 3000 lines) the update fails as described.
#---8<--- Command Output ------------------------------------------ modifying entry "<DN-FOR-FAILED-ENTRY>" ldap_result: Can't contact LDAP server (-1) #---8<-------------------------------------------------------------
Define crash - that doesn't look like a crash.
Sorry, you're right. I should rather have said that ldapmodify behaves in an unexpected manner and fails to complete the update when the list of changes for a DN is reasonably lengthy.
There are matching log-entries in the system's syslog (timestamp, hostname and PID trimmed off to save some linewrapping) and slapd logs (we run slapd under daemontools supervision, and capture it's stdout/stderr):
#---8<---- SysLog Output ------------------------------------------ local4.debug slapd: conn=1002 op=3158 MOD dn="<DN-OF-LAST-GOOD-ENTRY>" local4.debug slapd: conn=1002 op=3158 MOD attr=member local4.debug slapd: conn=1002 op=3158 RESULT tag=103 err=0 text= local4.debug slapd: conn=1002 fd=13 closed (connection lost) #---8<-------------------------------------------------------------
#---8<---- Slapd Output ------------------------------------------- 54a2e47f conn=1002 op=3158 MOD dn="<DN-OF-LAST-GOOD-ENTRY>" 54a2e47f conn=1002 op=3158 MOD attr=member 54a2e47f conn=1002 op=3158 RESULT tag=103 err=0 text= sb_sasl_cyrus_decode: failed to decode packet: generic failure sb_sasl_generic_read: failed to decode packet 54a2e47f conn=1002 fd=13 closed (connection lost) #---8<-------------------------------------------------------------
That looks like a bug in Cyrus SASL. What SASL mechanism are you using?
GSSAPI -- we use Kerberos for nearly everything, and the task in question uses a credential cache created from a keytab using k5start, though as an independent process (we're not using k5start to run the task's commands itself).
/etc/sasl/slapd.conf for the system in question contains only "mech_list: gssapi".
While I must admit that C isn't my forte (I'm more on the admin side of sysadmin than on the development side), I'm comfortable reading code and had a look through the code-base for those sb_sasl* messages and didn't see anything specific enough for my untrained (or unfamiliar) eye to use as a clue for further sleuthing.
While it is quite probably an incorrect assumption, I assumed at the time that the messages may have been caused by the client side closing or dropping the connection unexpectedly.
If it is of any interest, I see the same issue with both libsasl2-modules-gssapi-heimdal and libsasl2-modules-gssapi-mit Debian packages.
The LDIF for the failed entry consists of:
#---8<------------------------------------------------------------- dn: <DN-FOR-FAILED-ENTRY> changetype: modify replace: member member: <DN-FOR-MEMBER> ... member: <DN-FOR-ANOTHER-MEMBER> #---8<-------------------------------------------------------------
where the list of members was, in this case, 9799 long. The LDIF itself is 30097 lines long, and was happy for the first ~15000 lines.
If I prune out the modifications for the troublesome DN, the remainder of the file also goes through happily.
As a work-around I can manually split up the list into several blocks (tested with roughly 1000 member updates per block) with "replace: member" for the first, to match the current behaviour, and "add: member" for the rest. In this format, ldapmodify is happy to process the LDIF (all in one connection, but as discreet operations). (Note: the authenticated user has "unlimited" limits in the config)
Given that, it sounds like it could be a bug in ldapmodify (I don't think it's on the slapd end). I've tested on both the debian stable package (version 2.4.31-1+nmu2), and with a locally compiled build direct from the project's download pages (also 2.4.31 to start with, to see if it was an introduced bug).
Has anyone else on the list seen anything like this before?
2.4.31 is over 2 years old. 2.4.41 is about to be released. Debian is known to break their packages from time to time with ill-advised "security" patches. You should try with freshly built source.
Yeah, we've hit that sort of thing before, and is one reason why we have compiled our own packages in the past (not just for openldap), but it's something we try to avoid if possible, and attempt to re-converge on Debian packages where patches and updates we we're applying are eventually applied in Debian.
Yesterday I complied 2.4.40 and switched the managed configuration to use that package for the test system in question. Unfortunately it had the same issue.
Since writing originally, I've had a look (with my previously admitted limited C skills) to compare any differences between ldapmodify for the (working) 2.4.24, the (not working) 2.4.31 and the (also not working) 2.4.40. There's little change to speak of from what I can see, and nothing that looked to my eye like it might be the cause (though I've only really looked through clients/tools/ldapmodify.c, and very briefly through libraries/libldap/modify.c)
After that, I'm less sure that ldapmodify is where the problem lies (rather it's just where I'm seeing it manifest).
Thanks for your help Howard -- if you're at FlossUK again this year I should remember to buy you a pint by way of thanks in general (not just as a bribe for helping me out now :) and do bring the violin again, it's a great way to start a presentation! (Re: your talk about mdb in Edinburgh 2012)
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Cheers.
Dameon.
Dameon Wagner wrote:
On Sat, Jan 03 2015 at 01:31:10 +0000, Howard Chu scribbled
Define lengthy - how many bytes?
For the complete LDIF of the entry update in question, it is 672808 bytes (658K) -- I haven't done an exhaustive "trial and error" analysis to find the boundary between OK and too-big, but at 68233 bytes (67K, or 1000 lines of updates for the entry) the update completes OK, but at 205105 bytes (201K or 3000 lines) the update fails as described.
That looks like a bug in Cyrus SASL. What SASL mechanism are you using?
GSSAPI -- we use Kerberos for nearly everything, and the task in question uses a credential cache created from a keytab using k5start, though as an independent process (we're not using k5start to run the task's commands itself).
/etc/sasl/slapd.conf for the system in question contains only "mech_list: gssapi".
While I must admit that C isn't my forte (I'm more on the admin side of sysadmin than on the development side), I'm comfortable reading code and had a look through the code-base for those sb_sasl* messages and didn't see anything specific enough for my untrained (or unfamiliar) eye to use as a clue for further sleuthing.
While it is quite probably an incorrect assumption, I assumed at the time that the messages may have been caused by the client side closing or dropping the connection unexpectedly.
If it is of any interest, I see the same issue with both libsasl2-modules-gssapi-heimdal and libsasl2-modules-gssapi-mit Debian packages.
That's good information. Unfortunately I can't reproduce the problem here, ldapmodifying a group with 4640 members (237KB) works fine for me thru SASL.
It might be enlightening to run slapd with -d2 to show the packet decoding activity. Also run ldapmodify with -d2.
The LDIF for the failed entry consists of:
#---8<------------------------------------------------------------- dn: <DN-FOR-FAILED-ENTRY> changetype: modify replace: member member: <DN-FOR-MEMBER> ... member: <DN-FOR-ANOTHER-MEMBER> #---8<-------------------------------------------------------------
where the list of members was, in this case, 9799 long. The LDIF itself is 30097 lines long, and was happy for the first ~15000 lines.
If I prune out the modifications for the troublesome DN, the remainder of the file also goes through happily.
As a work-around I can manually split up the list into several blocks (tested with roughly 1000 member updates per block) with "replace: member" for the first, to match the current behaviour, and "add: member" for the rest. In this format, ldapmodify is happy to process the LDIF (all in one connection, but as discreet operations). (Note: the authenticated user has "unlimited" limits in the config)
Given that, it sounds like it could be a bug in ldapmodify (I don't think it's on the slapd end). I've tested on both the debian stable package (version 2.4.31-1+nmu2), and with a locally compiled build direct from the project's download pages (also 2.4.31 to start with, to see if it was an introduced bug).
Has anyone else on the list seen anything like this before?
2.4.31 is over 2 years old. 2.4.41 is about to be released. Debian is known to break their packages from time to time with ill-advised "security" patches. You should try with freshly built source.
Yeah, we've hit that sort of thing before, and is one reason why we have compiled our own packages in the past (not just for openldap), but it's something we try to avoid if possible, and attempt to re-converge on Debian packages where patches and updates we we're applying are eventually applied in Debian.
Yesterday I complied 2.4.40 and switched the managed configuration to use that package for the test system in question. Unfortunately it had the same issue.
Since writing originally, I've had a look (with my previously admitted limited C skills) to compare any differences between ldapmodify for the (working) 2.4.24, the (not working) 2.4.31 and the (also not working) 2.4.40. There's little change to speak of from what I can see, and nothing that looked to my eye like it might be the cause (though I've only really looked through clients/tools/ldapmodify.c, and very briefly through libraries/libldap/modify.c)
I also see no relevant changes from 2.4.24 to the current RE for ldapmodify.c or libldap/cyrus.c (where encoding/decoding is done).
If your tools are dynamically linked, you might try mixing the tools and libraries. E.g., try the 2.4.24 ldapmodify against the newer libldap. If that reproduces the bug, then we know to look closer at libldap.
After that, I'm less sure that ldapmodify is where the problem lies (rather it's just where I'm seeing it manifest).
Thanks for your help Howard -- if you're at FlossUK again this year I should remember to buy you a pint by way of thanks in general (not just as a bribe for helping me out now :) and do bring the violin again, it's a great way to start a presentation! (Re: your talk about mdb in Edinburgh 2012)
Ah I hadn't submitted a talk for FlossUK this year. Will see if I get sufficiently motivated to come up with one...
On Sat, Jan 03 2015 at 15:39:38 +0000, Howard Chu scribbled in "Re: ldapmodify crashes with long-ish input for an entry":
Dameon Wagner wrote:
On Sat, Jan 03 2015 at 01:31:10 +0000, Howard Chu scribbled
Define lengthy - how many bytes?
For the complete LDIF of the entry update in question, it is 672808 bytes (658K) -- I haven't done an exhaustive "trial and error" analysis to find the boundary between OK and too-big, but at 68233 bytes (67K, or 1000 lines of updates for the entry) the update completes OK, but at 205105 bytes (201K or 3000 lines) the update fails as described.
That looks like a bug in Cyrus SASL. What SASL mechanism are you using?
GSSAPI -- we use Kerberos for nearly everything, and the task in question uses a credential cache created from a keytab using k5start, though as an independent process (we're not using k5start to run the task's commands itself).
/etc/sasl/slapd.conf for the system in question contains only "mech_list: gssapi".
While I must admit that C isn't my forte (I'm more on the admin side of sysadmin than on the development side), I'm comfortable reading code and had a look through the code-base for those sb_sasl* messages and didn't see anything specific enough for my untrained (or unfamiliar) eye to use as a clue for further sleuthing.
While it is quite probably an incorrect assumption, I assumed at the time that the messages may have been caused by the client side closing or dropping the connection unexpectedly.
If it is of any interest, I see the same issue with both libsasl2-modules-gssapi-heimdal and libsasl2-modules-gssapi-mit Debian packages.
That's good information. Unfortunately I can't reproduce the problem here, ldapmodifying a group with 4640 members (237KB) works fine for me thru SASL.
It might be enlightening to run slapd with -d2 to show the packet decoding activity. Also run ldapmodify with -d2.
There's quite a bit out output, which is to be expected I guess, and comes to just over 5MB when bzip2 compressed -- would you mind if I sent the files to you directly rather than spam the maillist with the whole files?
The output of both ldapmodify and slapd with -d2, for a single attempted ldapmodify update appears to have matching data on both sides up until the list of modifications is prepared to be sent -- at this point the ldapmodify output lists "ldap_write: want=714986, written=714986" followed by the DN, some encoded data, and the list of members to add (the full list from what I can tell).
On the slapd side, it reports the now familiar sasl message, and an ldap_read error:
#---8<----------------------------------------------------------------- sb_sasl_cyrus_decode: failed to decode packet: generic failure sb_sasl_generic_read: failed to decode packet ldap_read: want=8 error=Input/output error #---8<-----------------------------------------------------------------
After that, both sides of the debug output have the same 229 bytes, which I'm guessing is closing the connection. Slapd has nothing else to say other than in response to me shutting down the daemon process. On the client side, ldapmodify has little more to say other than:
#---8<----------------------------------------------------------------- ldap_result: Can't contact LDAP server (-1) tls_write: want=69, written=69 0000: 15 03 03 00 40 3a 27 c1 34 5c 5f ef 7e 6c c7 68 ....@:'.4_.~l.h 0010: 92 f3 5f bc 8b a7 72 78 57 27 ed 72 9a 99 fa 6d .._...rxW'.r...m 0020: 1d 42 66 07 49 04 77 86 57 4f 06 b4 8c 28 58 7e .Bf.I.w.WO...(X~ 0030: 94 f4 22 ab df 7d 8f 43 a5 3b 99 84 18 92 ca 73 .."..}.C.;.....s 0040: a4 41 f8 ab 3f .A..? tls_write: want=277 error=Bad file descriptor #---8<-----------------------------------------------------------------
None of the above may be useful in isolation though, requiring additional context from the debug output to make complete sense to someone more familiar with what's supposed to be going on.
<SNIP>
Since writing originally, I've had a look (with my previously admitted limited C skills) to compare any differences between ldapmodify for the (working) 2.4.24, the (not working) 2.4.31 and the (also not working) 2.4.40. There's little change to speak of from what I can see, and nothing that looked to my eye like it might be the cause (though I've only really looked through clients/tools/ldapmodify.c, and very briefly through libraries/libldap/modify.c)
I also see no relevant changes from 2.4.24 to the current RE for ldapmodify.c or libldap/cyrus.c (where encoding/decoding is done).
If your tools are dynamically linked, you might try mixing the tools and libraries. E.g., try the 2.4.24 ldapmodify against the newer libldap. If that reproduces the bug, then we know to look closer at libldap.
(I meant to type 2.4.23, apologies for the typo)
I've had a brief try, but I'm getting library load errors relating to wanting libssl.so.0.9.8, where as wheezy now has libssl.so.1.0.0, and a couple of quick attempts to build 2.4.23 for wheezy are failing when running the configure script with:
#---8<----------------------------------------------------------------- checking for db.h... yes checking for Berkeley DB major version in db.h... 5 checking for Berkeley DB minor version in db.h... 1 checking if Berkeley DB version supported by BDB/HDB backends... yes checking for Berkeley DB link (default)... no configure: error: BDB/HDB: BerkeleyDB not available make: *** [configure] Error 1 #---8<-----------------------------------------------------------------
I'll have another go tomorrow though to see if I can tweak that enough to build -- google has a few hits, but most that I've looked at so far related to making sure libdb-dev is installed, which it is.
After that, I'm less sure that ldapmodify is where the problem lies (rather it's just where I'm seeing it manifest).
Thanks for your help Howard -- if you're at FlossUK again this year I should remember to buy you a pint by way of thanks in general (not just as a bribe for helping me out now :) and do bring the violin again, it's a great way to start a presentation! (Re: your talk about mdb in Edinburgh 2012)
Ah I hadn't submitted a talk for FlossUK this year. Will see if I get sufficiently motivated to come up with one...
Should be pretty good this year as it's in York, which other than being a lovely town is, I believe, the hometown of the main sponsors.
I missed last year (I was at CERN for the OpenAFS/Kerberos conference the week before, which was great too), but a few colleagues went to the OpenLDAP workshop prior to the FlossUK conference itself, and by all accounts found it very useful.
Thanks again for all your help (on a weekend too), it really is appreciated.
Cheers.
Dameon.
Dameon Wagner wrote:
There's quite a bit out output, which is to be expected I guess, and comes to just over 5MB when bzip2 compressed -- would you mind if I sent the files to you directly rather than spam the maillist with the whole files?
The standard way of providing bigger files is uploading them to write-only location
ftp://ftp.openldap.org/incoming/
Ciao, Michael.
On Sun, Jan 04 2015 at 22:30:26 +0100, Michael Ströder scribbled in "Re: ldapmodify crashes with long-ish input for an entry":
Dameon Wagner wrote:
There's quite a bit out output, which is to be expected I guess, and comes to just over 5MB when bzip2 compressed -- would you mind if I sent the files to you directly rather than spam the maillist with the whole files?
The standard way of providing bigger files is uploading them to write-only location
ftp://ftp.openldap.org/incoming/
Thanks for the info Michael, I'll remember that next time. I've already sent an URL to Howard, but for anyone else who might be able to help, the files will be here for a couple of days:
http://users.ox.ac.uk/~oucs0146/
Cheers.
Dameon.
openldap-technical@openldap.org