I don't know whether 2.3.43 is new enough to NOT be told to go to hell, but it's the latest of the 2.3.x series and I can't get migrated to 2.4 until I get slurpd gone... and oddly enough, I think turning off slurpd caused some of my problems.
This morning our two slaves and master server began experiencing bad sync lag... the slaves were close to two or more hours behind the master. I discovered that a large automated job had touched over thirty thousand entries and altered the values of a lot of attributes (it was a course enrollment update, as it happens).
I *suspect* that the huge number of updates overflowed the syncprov session log and the slaves moved from small updates to whole-entry updates. My syncprov session log was set to 500... which I think was hideously undersized.
Am I correct in my assumption?
Stemming from that, I've noticed that trying to use LDAP to alter anything in the cn=config tree - whether it happens to be to change the session log size, or to add a new index - causes slapd to freeze. Not a true hang, as it continues to accept connections, but all operations are deferred and pending, even though slapd's CPU usage remains low. I can kill and restart slapd and I'm okay. Also, altering cn=config by editing the on-disk ldif files while slapd is dead causes no problem.
And a third thing: does ~3h to add 250k entries to a new database, using 'slapcat -q' sound ridiculously long?
Brandon Hume wrote:
I don't know whether 2.3.43 is new enough to NOT be told to go to hell,
Nobody would ever tell you that. But 2.3.43 is over a year old and 2.4 has been the stable release for quite a long time. Insisting on using it is the same as you telling us to go to hell with our bug fixes.
but it's the latest of the 2.3.x series and I can't get migrated to 2.4 until I get slurpd gone... and oddly enough, I think turning off slurpd caused some of my problems.
This morning our two slaves and master server began experiencing bad sync lag... the slaves were close to two or more hours behind the master. I discovered that a large automated job had touched over thirty thousand entries and altered the values of a lot of attributes (it was a course enrollment update, as it happens).
I *suspect* that the huge number of updates overflowed the syncprov session log and the slaves moved from small updates to whole-entry updates. My syncprov session log was set to 500... which I think was hideously undersized.
Am I correct in my assumption?
No. Standard syncrepl always uses whole-entry updates.
Stemming from that, I've noticed that trying to use LDAP to alter anything in the cn=config tree - whether it happens to be to change the session log size, or to add a new index - causes slapd to freeze. Not a true hang, as it continues to accept connections, but all operations are deferred and pending, even though slapd's CPU usage remains low. I can kill and restart slapd and I'm okay. Also, altering cn=config by editing the on-disk ldif files while slapd is dead causes no problem.
That's a side-effect of cn=config, which requires no other threads to be running before it makes a change. The scheduling mechanism has been fixed in 2.4 so this freeze no longer occurs.
And a third thing: does ~3h to add 250k entries to a new database, using 'slapcat -q' sound ridiculously long?
slapcat should be able to read the contents of a database at a rate of several thousand entries per second. But slapcat won't add any number of entries to a database, no matter how much time you give it.
Howard Chu wrote:
Nobody would ever tell you that. But 2.3.43 is over a year old and 2.4 has been the stable release for quite a long time. Insisting on using it is the same as you telling us to go to hell with our bug fixes.
Moving to 2.4 is very, very much a priority for me. But I was under the impression that slurpd is actually physically gone from 2.4, and I've got other systems dependent on it. I'm working on changing/ridding myself of them, but I have to be careful (especially with internal pressure from above to switch to AD). Alas, disasters don't care for schedules.
I'm only looking to dig out of the hole I'm in right at this very moment so that I can concentrate on moving everything to 2.4.
No. Standard syncrepl always uses whole-entry updates.
I thought I was using partial updates, but was wrong. Your reply has given be enough keywords to find the proper configuration in the docs. I can fix that today, hopefully.
That's a side-effect of cn=config, which requires no other threads to be running before it makes a change. The scheduling mechanism has been fixed in 2.4 so this freeze no longer occurs.
Okay... and when I was running slurpd for replication, I was slipping in between updates to make successful changes. Now that I've got the secondary servers running refreshAndPersist there's always a thread running. It seems to make sense.
And a third thing: does ~3h to add 250k entries to a new database, using 'slapcat -q' sound ridiculously long?
slapcat should be able to read the contents of a database at a rate of several thousand entries per second. But slapcat won't add any number of entries to a database, no matter how much time you give it.
Oops. s/slapcat/slapadd/ Sorry the mistake wasn't more obvious.
On 9/18/09 3:47 AM, Howard Chu wrote:
Brandon Hume wrote:
I don't know whether 2.3.43 is new enough to NOT be told to go to hell,
Nobody would ever tell you that. But 2.3.43 is over a year old and 2.4 has been the stable release for quite a long time. Insisting on using it is the same as you telling us to go to hell with our bug fixes.
This is getting ridiculous from my perspective. We've had a rash of people reporting problems against older releases and being effectively told to go to hell (which is what we hear when the development team or some proxy for them tells us to upgrade to 2.4).
2.4 is not "stable" by any definition other than the OpenLDAP project has designated it so.
I am still seeing people complaining about syncrepl problems. So, how about you developers stop adding all the new wiz-bang bells and whistles and concentrate on stability and performance?
Have you fixed the fact that 2.4 is so much slower than 2.3 as reported by Quanah two months ago yet?
On Fri, 2009-09-18 at 07:33 -0400, Francis Swasey wrote:
This is getting ridiculous from my perspective. We've had a rash of people reporting problems against older releases and being effectively told to go to hell (which is what we hear when the development team or some proxy for them tells us to upgrade to 2.4).
"The fix is in 2.4 but can't be backported" is certainly a valid answer, and one I can live with. I'm not married to 2.3, and I think syncrepl is the best thing since copulation.
However, part of the reason I'm so slow moving to 2.4 is because I'm pretty much the only person here running this large LDAP directory. And part of the reason I'm the only person running the show, in addition to my other tasks, is because my coworkers and technical friends look at the openldap-software mailing list and say "I don't want to deal with those people". (Some of these people have worked with Theo de Raat...)
I realize that users ask stupid questions and run ancient versions, but I also realize that sometimes those users are experiencing a catastrophe and have eighty thousand users banging on the door demanding explanation (ie: me). In that kind of situation people miss parts as they review docs and conflate symptoms and frequently make things worse before they make it better... and, yes, ask stupid questions.
Part of the reason I'm slow moving to 2.4 is because I actually had to work myself up to asking my syncrepl-client question. I braced myself for one-word answers, "RTFM"-type answers, and variations on "why in the world are you doing something like that?" I got lucky, my question was apparently worthwhile, and I got useful information along with tangible relief. But what does that say about the environment?
When I come across technical posts, when someone decides to share their knowledge, it's a delight. But there's never any doubt when some of those people think you're wasting their time.
I can handle being told that my version is too old and is unsupported. I just wish we could scale back a bit on the contempt while being told.
Brandon Hume wrote:
On Fri, 2009-09-18 at 07:33 -0400, Francis Swasey wrote:
This is getting ridiculous from my perspective. We've had a rash of people reporting problems against older releases and being effectively told to go to hell (which is what we hear when the development team or some proxy for them tells us to upgrade to 2.4).
"The fix is in 2.4 but can't be backported" is certainly a valid answer, and one I can live with. I'm not married to 2.3, and I think syncrepl is the best thing since copulation.
Agreed - syncrepl is fantastic.
However, part of the reason I'm so slow moving to 2.4 is because I'm pretty much the only person here running this large LDAP directory. And part of the reason I'm the only person running the show, in addition to my other tasks, is because my coworkers and technical friends look at the openldap-software mailing list and say "I don't want to deal with those people". (Some of these people have worked with Theo de Raat...)
It is unfortunate that such reports are not isolated amongst a few individuals.
I realize that users ask stupid questions and run ancient versions, but I also realize that sometimes those users are experiencing a catastrophe and have eighty thousand users banging on the door demanding explanation (ie: me). In that kind of situation people miss parts as they review docs and conflate symptoms and frequently make things worse before they make it better... and, yes, ask stupid questions.
Part of the reason I'm slow moving to 2.4 is because I actually had to work myself up to asking my syncrepl-client question. I braced myself for one-word answers, "RTFM"-type answers, and variations on "why in the world are you doing something like that?" I got lucky, my question was apparently worthwhile, and I got useful information along with tangible relief. But what does that say about the environment?
Regrettably, this has become the accepted nature of the list and IRC channel. You either say nothing, accept it, and hope to get some useful morsels peppered in between the chastising, or you complain about it and risk alienation by those in the know. I have been more vocal than most on the topic (although, more in IRC than the mailing lists), and it's certainly reflected in some of the answers I've received in my recent mailing list postings.
I'm not saying that anybody is "deserving of a response" because clearly the list is volunteer-only, but in my case (http://www.mail-archive.com/openldap-software@openldap.org/msg15769.html) a request for clarification of a technical statement got twisted in to an accusation claiming I'd misrepresented another individual's response, instead of an answer to an earnest question - a side effect of being vocal about the tone you describe. But, things aren't likely to change unless more people are willing to sacrifice help in return for questioning the aforementioned resentful nature.
When I come across technical posts, when someone decides to share their knowledge, it's a delight. But there's never any doubt when some of those people think you're wasting their time.
I can handle being told that my version is too old and is unsupported. I just wish we could scale back a bit on the contempt while being told.
Yes, it's less than an uncommon request...
Respectfully, Ryan
Ryan Steele wrote:
Brandon Hume wrote:
I realize that users ask stupid questions and run ancient versions, but I also realize that sometimes those users are experiencing a catastrophe and have eighty thousand users banging on the door demanding explanation (ie: me). In that kind of situation people miss parts as they review docs and conflate symptoms and frequently make things worse before they make it better... and, yes, ask stupid questions.
Part of the reason I'm slow moving to 2.4 is because I actually had to work myself up to asking my syncrepl-client question. I braced myself for one-word answers, "RTFM"-type answers, and variations on "why in the world are you doing something like that?" I got lucky, my question was apparently worthwhile, and I got useful information along with tangible relief. But what does that say about the environment?
Regrettably, this has become the accepted nature of the list and IRC
channel. You either say nothing, accept it, and
hope to get some useful morsels peppered in between the chastising, or you
complain about it and risk alienation by
those in the know. I have been more vocal than most on the topic (although,
more in IRC than the mailing lists), and
it's certainly reflected in some of the answers I've received in my recent
mailing list postings.
http://catb.org/~esr/faqs/smart-questions.html
I'm not saying that anybody is "deserving of a response" because clearly the
list is volunteer-only, but in my case
(http://www.mail-archive.com/openldap-software@openldap.org/msg15769.html) a
request for clarification of a technical
statement got twisted in to an accusation claiming I'd misrepresented
another individual's response, instead of an
answer to an earnest question - a side effect of being vocal about the tone
you describe. But, things aren't likely to
change unless more people are willing to sacrifice help in return for
questioning the aforementioned resentful nature.
This is the way of the world. If you want warm'n'fuzzy "customer support" that first requires you to be someone's customer.
When I come across technical posts, when someone decides to share their knowledge, it's a delight. But there's never any doubt when some of those people think you're wasting their time.
I can handle being told that my version is too old and is unsupported. I just wish we could scale back a bit on the contempt while being told.
Yes, it's less than an uncommon request...
It's so common that someone already wrote a lengthy article about how to deal with it. Learn.
--On Friday, September 18, 2009 7:33 AM -0400 Francis Swasey Frank.Swasey@uvm.edu wrote:
On 9/18/09 3:47 AM, Howard Chu wrote:
Brandon Hume wrote:
I don't know whether 2.3.43 is new enough to NOT be told to go to hell,
Nobody would ever tell you that. But 2.3.43 is over a year old and 2.4 has been the stable release for quite a long time. Insisting on using it is the same as you telling us to go to hell with our bug fixes.
This is getting ridiculous from my perspective. We've had a rash of people reporting problems against older releases and being effectively told to go to hell (which is what we hear when the development team or some proxy for them tells us to upgrade to 2.4).
Once a release is no longer supported, that's the end of life for it.
2.4 is not "stable" by any definition other than the OpenLDAP project has designated it so.
I've found the last few releases to be stable. Which is why I'm using 2.4.18 now.
I am still seeing people complaining about syncrepl problems. So, how about you developers stop adding all the new wiz-bang bells and whistles and concentrate on stability and performance?
The problem in this post is one common with syncrepl and is what delta-syncrepl alleviates. I think it is also worth noting that the syncrepl in 2.4 is actually substantially better than the syncrepl in 2.3.
The problems with syncrepl I see people reporting right now have to do with MMR, which is a feature I personally am avoiding until it stabilizes a bit more. But the old single-slave many replica method is definitely better in 2.4 than it was in 2.3. For both normal syncrepl *and* delta-syncrepl.
I.e., if you ignore the wiz-bang features, 2.4 is at this point better than 2.3.
Have you fixed the fact that 2.4 is so much slower than 2.3 as reported by Quanah two months ago yet?
Fixing the slowness in 2.4 requires writing an entirely new backend. See the discussions on -devel. However, there are things you can do to mitigate the slowness, such as using BDB 4.8.
I'll note that the 2.4 connection manager is actually *faster* than 2.3. The problem is that because of bugs that showed up in the 2.3 release, more locking mechanisms were added to back-hdb/bdb to fix them, and that slowed things down.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Fri, 18 Sep 2009, Francis Swasey wrote:
2.4 is not "stable" by any definition other than the OpenLDAP project has designated it so.
I would disagree with this. I'm not at all involved in the official project designations, and I can say that I gave a talk at Rutgers in March 2009 (2.4.15 at the time) saying that RE24 is suitable for production use. It's on all of my replicas in production, and has been for months. One of the large reasons it's not on my master is because I've been spending my "OpenLDAP time" on RE24 testing (for the entire community) rather than my own development.
I am still seeing people complaining about syncrepl problems. So, how about you developers stop adding all the new wiz-bang bells and whistles and concentrate on stability and performance?
Honestly, I think this sort of answers the whole point: we're talking about a fairly "small" project, and I'd rather see those limited resources applied to stability and performance moving forward. Backporting to RE23 would distract from time needed towards making a "perfect" RE24 (or RE25 or...).
I don't know how it would be taken if somebody wanted to open up "openldap-legacy" to attempt backports. My guess is it'd all be fine until there's some seriously incompatible change, at which point you'd either need a good dev team (to do a fork) or to say "this effort is over, go to RE++." The former is extremely tough to come by, and the latter is essentially where we are today, so from a pragmatic standpoint I might even argue we're as good as we can be with the current model.
Have you fixed the fact that 2.4 is so much slower than 2.3 as reported by Quanah two months ago yet?
Quanah already pointed this out to some extent: I find the performance comparisons fallacious. Part of RE24 development has revealed some fairly serious issues in back-bdb, and there's a lot more locking in the code now to account for that. The perceived speed of RE23 really doesn't matter if it can't produce proper results. After all, if you want *real* performance, just define mutex_lock/unlock to be noops. I guarantee you you'll get good ops/second, during the limited time it runs at least...
openldap-software@openldap.org