First of: I know it's old, we ARE going to upgrade at the next service interval in a few weeks!
But in the meantime, is there any way to know/figure out if the master and it's slave(s) are in sync?
One idea, of using a special object which is written to every x minute and then checked for consistency came up... Of course it's not a nice solution, but it is A solution...
The reason for this is that yesterday our secondary LDAP server (the primary read server) stopped returning queries (might be a file lock or open filehandles problem - exact reason unknown for the moment). And for some reason, the primary LDAP server (the one we use for writes, the sync master) had an old version of the database - we THINK it happened at the last power failure in that serverroom. It brought up an old version (bdb problems possibly).
So the failover to the master worked, but it was to old. And i didn't manage to do a recover on the failed bdb database. Luckily we had a SECOND replica in another city (which was in sync with the changes we did just a few hours earlier), so I did a dump of that database and loaded the primary replica (the one that failed/hung/crashed) with that.
But the fact that the replica master and the slaves where out of sync worries us a little. This will be fixed correctly with an upgrade, but until then we would like to have at least _some_ way of checking the status of the sync...
Any ideas for a quick hack? -- ... but you know as soon as Oracle starts waving its wallet at a Company it's time to run - fast. /illumos mailing list
On 24/5/2012 12:13 μμ, Turbo Fredriksson wrote:
But in the meantime, is there any way to know/figure out if the master and it's slave(s) are in sync?
This was discussed only yesterday!
Supposing you are replicating the full DIT: slapcat both ends, use the ldifsort utility to sort the outputs, then use diff to check for any differences.
Nick
On Thu, May 24, 2012 at 12:44:04PM +0300, Nick Milas wrote:
But in the meantime, is there any way to know/figure out if the master and it's slave(s) are in sync?
This was discussed only yesterday!
Supposing you are replicating the full DIT: slapcat both ends, use the ldifsort utility to sort the outputs, then use diff to check for any differences.
You only ned to do that if you are worried about a replication protocol failure or a database failure. In normal operation it should be enough to read the contextCSN attribute from the root of the replicated subtree on each server:
#!/bin/sh # # check-replication
PATH=/usr/local/bin:/usr/bin:/bin export PATH
for host in ermine.example.org trude.example.org do echo $host ldapsearch -LLL -x -H ldap://$host/ -b 'dc=example,dc=org' -s base contextCSN done
If the servers are in sync then the values you see will be identical on all servers. If any of the values differ you can parse them to work out how far out-of-date each server is.
Andrew
2012/5/25 Andrew Findlay andrew.findlay@skills-1st.co.uk:
On Thu, May 24, 2012 at 12:44:04PM +0300, Nick Milas wrote:
But in the meantime, is there any way to know/figure out if the master and it's slave(s) are in sync?
This was discussed only yesterday!
Supposing you are replicating the full DIT: slapcat both ends, use the ldifsort utility to sort the outputs, then use diff to check for any differences.
You only ned to do that if you are worried about a replication protocol failure or a database failure. In normal operation it should be enough to read the contextCSN attribute from the root of the replicated subtree on each server:
#!/bin/sh # # check-replication
PATH=/usr/local/bin:/usr/bin:/bin export PATH
for host in ermine.example.org trude.example.org do echo $host ldapsearch -LLL -x -H ldap://$host/ -b 'dc=example,dc=org' -s base contextCSN done
If the servers are in sync then the values you see will be identical on all servers. If any of the values differ you can parse them to work out how far out-of-date each server is.
If you are looking for a Nagios script, you can find one here: http://ltb-project.org/wiki/documentation/nagios-plugins/check_ldap_syncrepl...
Clément.
On Fri, 25 May 2012 12:46:50 +0100, Andrew Findlay wrote:
In normal operation it should be enough to read the contextCSN attribute from the root of the replicated subtree on each server:
Ok, most of the servers are now upgraded, but unfortunatly there's two that can't (for various reasons) not be upgraded at this time.
The sync seems to work quite nice between the 2.4.23 and the 2.3.43 servers.
However, the contextCSN missmatches and after examining, it's the password policy object that won't sync...
paragon: 20120616082046.474977
leonis: 20120616082046.474977 leporis: 20120616082046.474977
kelvin: 20120616081559.003550 inbgdxrambo: 20120616081559.003550
The first three is 2.4 (paragon is the provider) and kelvin and rambo is 2.3...
I've included the ppolicy.schema in all the servers, schemachecking=off on the consumers but still no policy object...
I do however, now when I look closer, get an error/warning in the log:
Jun 16 15:29:21 rambo slapd[28729]: syncrepl_message_to_entry: rid 444 mods check (pwdAttribute: value #0 invalid per syntax) Jun 16 15:29:21 rambo slapd[28729]: do_syncrepl: rid 444 retrying
I tried to take ppolicy.schema from paragon (the original one was version 1.2.2.5 and paragons is 1.7.2.5) but that didn't help.
Turbo Fredriksson wrote:
On Fri, 25 May 2012 12:46:50 +0100, Andrew Findlay wrote:
In normal operation it should be enough to read the contextCSN attribute from the root of the replicated subtree on each server:
Ok, most of the servers are now upgraded, but unfortunatly there's two that can't (for various reasons) not be upgraded at this time.
The sync seems to work quite nice between the 2.4.23 and the 2.3.43 servers.
However, the contextCSN missmatches and after examining, it's the password policy object that won't sync...
paragon: 20120616082046.474977 leonis: 20120616082046.474977 leporis: 20120616082046.474977 kelvin: 20120616081559.003550 inbgdxrambo: 20120616081559.003550
The first three is 2.4 (paragon is the provider) and kelvin and rambo is 2.3...
I've included the ppolicy.schema in all the servers, schemachecking=off on the consumers but still no policy object...
ppolicy.schema only defines the user attributes. The operational attributes are implemented in the ppolicy overlay. The overlay must be configured on every server for the operational attributes to be replicated.
I do however, now when I look closer, get an error/warning in the log:
Jun 16 15:29:21 rambo slapd[28729]: syncrepl_message_to_entry: rid
444 mods check (pwdAttribute: value #0 invalid per syntax) Jun 16 15:29:21 rambo slapd[28729]: do_syncrepl: rid 444 retrying
I tried to take ppolicy.schema from paragon (the original one was version 1.2.2.5 and paragons is 1.7.2.5) but that didn't help.
On Sat, 16 Jun 2012 07:41:19 -0700, Howard Chu wrote:
The overlay must be configured on every server for the operational attributes to be replicated.
Ah, ok. Thanx. That's a problem then, because CentOS 5.x don't seem to have that compiled in...
Ah, well. We're not going to use ppolicy on those servers/sites anyway. -- ... but you know as soon as Oracle starts waving its wallet at a Company it's time to run - fast. /illumos mailing list
On Sat, 16 Jun 2012 07:41:19 -0700, Howard Chu wrote:
Turbo Fredriksson wrote:
paragon: 20120616082046.474977 kelvin: 20120616081559.003550
The overlay must be configured on every server for the operational attributes to be replicated.
After monitoring a colleagues batch modifications, I now see that the contextCSN now match:
paragon: 20120616153755.474331 kelvin: 20120616153755.474331
This, BESPITE that the actual ppolicy does NOT exist on kelvin!
Isn't this somewhat of a problem? If these values matches, but the DB doesn't, can contextCSN actually be used for this (monitoring syncing)? -- ... but you know as soon as Oracle starts waving its wallet at a Company it's time to run - fast. /illumos mailing list
On Sat, 16 Jun 2012 17:49:07 +0200, Turbo Fredriksson wrote:
After monitoring a colleagues batch modifications, I now see that the contextCSN now match:
paragon: 20120616153755.474331 kelvin: 20120616153755.474331
This, BESPITE that the actual ppolicy does NOT exist on kelvin!
And even worse: Some modifications does not propagate to the slaves! Sometimes. Running his script again, the changes seems to be there.. -- ... but you know as soon as Oracle starts waving its wallet at a Company it's time to run - fast. /illumos mailing list
That makes me wonder what his script is doing... Sounds like the script is handling some part of the replication.
Also: Cent6.2 has been out for quite a while, with OpenLDAP 2.4.23 binaries. Considered old and 'unsupported' by the OpenLDAP crew, but still working flawlessly in our simple setup (used for just system auth for ssh, sudo, svn, etc.). We are using ppolicy, but found the syncing password failures AND having a user's actual password be checked (making the password failure sync useless) turned out to not be doable (I won't say possible as I'm open to our configs being off, but haven't heard any suggestions). In short, I understand not wanting to compile and support your own binaries, but Cent6.2 is a pretty easy upgrade (opt for sssd vs pam_ldap).
- chris
Chris Jacobs Systems Administrator, Technology Services Group
Apollo Group | Apollo Marketing & Product Development | Aptimus, Inc. 1501 4th Ave | Suite 2500 | Seattle, WA 98101 direct 206.839.8245 | cell 206.601.3256 | Fax 206.644.0628 email: chris.jacobs@apollogrp.edu
----- Original Message ----- From: openldap-technical-bounces@OpenLDAP.org openldap-technical-bounces@OpenLDAP.org To: openldap-technical@openldap.org openldap-technical@openldap.org Sent: Sat Jun 16 10:02:10 2012 Subject: Re: Monitoring 2.3.43?
On Sat, 16 Jun 2012 17:49:07 +0200, Turbo Fredriksson wrote:
After monitoring a colleagues batch modifications, I now see that the contextCSN now match:
paragon: 20120616153755.474331 kelvin: 20120616153755.474331
This, BESPITE that the actual ppolicy does NOT exist on kelvin!
And even worse: Some modifications does not propagate to the slaves! Sometimes. Running his script again, the changes seems to be there.. -- ... but you know as soon as Oracle starts waving its wallet at a Company it's time to run - fast. /illumos mailing list
This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.
On Sat, 16 Jun 2012 10:27:55 -0700, Chris Jacobs wrote:
That makes me wonder what his script is doing... Sounds like the script is handling some part of the replication.
Nope, nothing fancy. 15-20 lines of bash code. I helped him write it :). Just an ldapmodify with some error checking before and after in a loop.
I've since been tuning the syncrepl-* and retry options and can't duplicate the issue. He's going to do another big batch update tomorrow, and then I'll be monitoring more closely...
Cent6.2 has been out for quite a while, with OpenLDAP 2.4.23 binaries. Considered old and 'unsupported' by the OpenLDAP crew
Yeah, I mentioned that fact when we were planing the upgrades, but I never pushed the issue (I'm leaving at the end of the month) because no one have the knowledge/time to maintain a homebuilt package. -- ... but you know as soon as Oracle starts waving its wallet at a Company it's time to run - fast. /illumos mailing list
openldap-technical@openldap.org