Hello list.
I'm using delta-syncrepl in search-and-persist mode between my slaves and my master server. And I'm using a nagios plugin to check sync status, based on value of contextCSN attribute. But I'm often sync alerts for unknown reasons.
First issue, is this an expected result to have an higher contextCSN on the slave side ? From what I've understood from contextCSN, this attribute is updated each time a write operation is performed on the server. As the slave server is not supposed not to perform any write operation, this should never happens. However, it does:
[root@etoile ~]# /usr/share/nagios/plugins/check_syncrepl.py ldap://ldap1.msr-inria.inria.fr ldap://ldap2.msr-inria.inria.fr -b dc=msr-inria,dc=inria,dc=fr -v [..] 2009-05-25 13:36:49,740 - check_syncrepl.py - DEBUG - Retrieving Provider contextCSN 2009-05-25 13:36:49,741 - check_syncrepl.py - DEBUG - contextCSN = 20090520141922.274229Z#000000#000#000000 2009-05-25 13:36:49,742 - check_syncrepl.py - DEBUG - Retrieving Consumer contextCSN 2009-05-25 13:36:49,742 - check_syncrepl.py - DEBUG - contextCSN = 20090525095027.118111Z#000000#000#000000 2009-05-25 13:36:49,752 - check_syncrepl.py - INFO - Consumer NOT in SYNCH 2009-05-25 13:36:49,753 - check_syncrepl.py - INFO - Delta is -5 days, 4:28:55
Second issue, how does syncrepl sync operational attributes ? When using ppolicy, for instance, each failed bind operation result in a pwdChangedTime attribute added to the user entry. From my own attempts, the slave and the master maintain their own list separatly.
As synchronisation is performed from master to slave only, it seems quite logical failed authentication on the slaves doesn't impact the user entry on the master. However, from reading slapd.conf man page, syncrepl is supposed to synchronise operational attributes too by default: The attrs list defaults to "*,+" to return all user and operational attributes, and attrsonly is unset by default.
Also, the logs on the slave clearly show something happens when a failed autentication is performed on the master.
Start state: Provider contextCSN = 20090525122053.257812Z#000000#000#000000 Consumer contextCSN = 20090525122053.257812Z#000000#000#000000
Logs: May 25 14:18:34 nation slapd[28717]: do_syncrep2: cookie=rid=123,csn=20090525121834.036489Z#000000#000#000000 May 25 14:18:34 nation slapd[28717]: slap_queue_csn: queing 0x93d55d8 20090525121834.036489Z#000000#000#000000 May 25 14:18:34 nation slapd[28717]: slap_graduate_commit_csn: removing 0x9450550 20090525121834.036489Z#000000#000#000000 May 25 14:18:34 nation slapd[28717]: syncrepl_message_to_op: rid=123 be_modify uid=rousse,ou=users,dc=msr-inria,dc=inria,dc=fr (0) May 25 14:18:34 nation slapd[28717]: slap_queue_csn: queing 0x9461d18 20090525121834.036489Z#000000#000#000000 May 25 14:18:34 nation slapd[28717]: slap_graduate_commit_csn: removing 0x945e640 20090525121834.036489Z#000000#000#000000
End state Provider contextCSN = 20090525122053.257812Z#000000#000#000000 Consumer contextCSN = 20090525122122.287486Z#000000#000#000000
The provider didn't increases its contextCSN value, while performing a change, and the consumer did increase its own, while not performing the change :(
Here is my syncrepl configuration: syncrepl rid=123 provider=ldaps://ldap1.msr-inria.inria.fr type=refreshAndPersist retry="60 +" logbase="cn=log" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" syncdata=accesslog searchbase="dc=msr-inria,dc=inria,dc=fr" scope=sub schemachecking=off bindmethod=simple binddn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" credentials=XXXXXX
Guillaume Rousse wrote:
Hello list.
I'm using delta-syncrepl in search-and-persist mode between my slaves and my master server. And I'm using a nagios plugin to check sync status, based on value of contextCSN attribute. But I'm often sync alerts for unknown reasons.
First issue, is this an expected result to have an higher contextCSN on the slave side ? From what I've understood from contextCSN, this attribute is updated each time a write operation is performed on the server. As the slave server is not supposed not to perform any write operation, this should never happens. However, it does:
Ordinarily, a slave cannot initiate any write operations. However, you appear to be using ppolicy. The ppolicy overlay writes Bind status updates to the local server, regardless of master or slave status. Thus, it can cause the slave's contextCSN to be newer than the master's.
[root@etoile ~]# /usr/share/nagios/plugins/check_syncrepl.py ldap://ldap1.msr-inria.inria.fr ldap://ldap2.msr-inria.inria.fr -b dc=msr-inria,dc=inria,dc=fr -v [..] 2009-05-25 13:36:49,740 - check_syncrepl.py - DEBUG - Retrieving Provider contextCSN 2009-05-25 13:36:49,741 - check_syncrepl.py - DEBUG - contextCSN = 20090520141922.274229Z#000000#000#000000 2009-05-25 13:36:49,742 - check_syncrepl.py - DEBUG - Retrieving Consumer contextCSN 2009-05-25 13:36:49,742 - check_syncrepl.py - DEBUG - contextCSN = 20090525095027.118111Z#000000#000#000000 2009-05-25 13:36:49,752 - check_syncrepl.py - INFO - Consumer NOT in SYNCH 2009-05-25 13:36:49,753 - check_syncrepl.py - INFO - Delta is -5 days, 4:28:55
Second issue, how does syncrepl sync operational attributes ? When using ppolicy, for instance, each failed bind operation result in a pwdChangedTime attribute added to the user entry. From my own attempts, the slave and the master maintain their own list separatly.
As synchronisation is performed from master to slave only, it seems quite logical failed authentication on the slaves doesn't impact the user entry on the master. However, from reading slapd.conf man page, syncrepl is supposed to synchronise operational attributes too by default: The attrs list defaults to "*,+" to return all user and operational attributes, and attrsonly is unset by default.
Also, the logs on the slave clearly show something happens when a failed autentication is performed on the master.
Start state: Provider contextCSN = 20090525122053.257812Z#000000#000#000000 Consumer contextCSN = 20090525122053.257812Z#000000#000#000000
Logs: May 25 14:18:34 nation slapd[28717]: do_syncrep2: cookie=rid=123,csn=20090525121834.036489Z#000000#000#000000 May 25 14:18:34 nation slapd[28717]: slap_queue_csn: queing 0x93d55d8 20090525121834.036489Z#000000#000#000000 May 25 14:18:34 nation slapd[28717]: slap_graduate_commit_csn: removing 0x9450550 20090525121834.036489Z#000000#000#000000 May 25 14:18:34 nation slapd[28717]: syncrepl_message_to_op: rid=123 be_modify uid=rousse,ou=users,dc=msr-inria,dc=inria,dc=fr (0) May 25 14:18:34 nation slapd[28717]: slap_queue_csn: queing 0x9461d18 20090525121834.036489Z#000000#000#000000 May 25 14:18:34 nation slapd[28717]: slap_graduate_commit_csn: removing 0x945e640 20090525121834.036489Z#000000#000#000000
End state Provider contextCSN = 20090525122053.257812Z#000000#000#000000 Consumer contextCSN = 20090525122122.287486Z#000000#000#000000
The provider didn't increases its contextCSN value, while performing a change, and the consumer did increase its own, while not performing the change :(
Here is my syncrepl configuration: syncrepl rid=123 provider=ldaps://ldap1.msr-inria.inria.fr type=refreshAndPersist retry="60 +" logbase="cn=log" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" syncdata=accesslog searchbase="dc=msr-inria,dc=inria,dc=fr" scope=sub schemachecking=off bindmethod=simple binddn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" credentials=XXXXXX
Howard Chu a écrit :
Guillaume Rousse wrote:
Hello list.
I'm using delta-syncrepl in search-and-persist mode between my slaves and my master server. And I'm using a nagios plugin to check sync status, based on value of contextCSN attribute. But I'm often sync alerts for unknown reasons.
First issue, is this an expected result to have an higher contextCSN on the slave side ? From what I've understood from contextCSN, this attribute is updated each time a write operation is performed on the server. As the slave server is not supposed not to perform any write operation, this should never happens. However, it does:
Ordinarily, a slave cannot initiate any write operations. However, you appear to be using ppolicy. The ppolicy overlay writes Bind status updates to the local server, regardless of master or slave status. Thus, it can cause the slave's contextCSN to be newer than the master's.
OK, I guess it means higher CSN reports on slave side can be discarded.
I suggest adding your answer to section 18.1.1.2. (Syncrepl Details) of the Admin guide, with a few modifications.
Ordinarily, a consumer cannot initiate any write operations. However, some specific overlays may bring exceptions to this rule. For instance, the ppolicy overlay writes Bind status updates to the local server, regardless of its master or slave status. Thus, it can cause the consumer's contextCSN to be newer than the provider's.
Also, you didn't answer to my second question: syncrepl is also supposed to sync operational attributes. Does ppolicy also constitute an exception here ?
Guillaume Rousse wrote:
Howard Chu a écrit :
Guillaume Rousse wrote:
Hello list.
I'm using delta-syncrepl in search-and-persist mode between my slaves and my master server. And I'm using a nagios plugin to check sync status, based on value of contextCSN attribute. But I'm often sync alerts for unknown reasons.
First issue, is this an expected result to have an higher contextCSN on the slave side ? From what I've understood from contextCSN, this attribute is updated each time a write operation is performed on the server. As the slave server is not supposed not to perform any write operation, this should never happens. However, it does:
Ordinarily, a slave cannot initiate any write operations. However, you appear to be using ppolicy. The ppolicy overlay writes Bind status updates to the local server, regardless of master or slave status. Thus, it can cause the slave's contextCSN to be newer than the master's.
OK, I guess it means higher CSN reports on slave side can be discarded.
I suggest adding your answer to section 18.1.1.2. (Syncrepl Details) of the Admin guide, with a few modifications.
Ordinarily, a consumer cannot initiate any write operations. However, some specific overlays may bring exceptions to this rule. For instance, the ppolicy overlay writes Bind status updates to the local server, regardless of its master or slave status. Thus, it can cause the consumer's contextCSN to be newer than the provider's.
Also, you didn't answer to my second question: syncrepl is also supposed to sync operational attributes. Does ppolicy also constitute an exception here ?
For the moment, yes - it writes directly to the underlying database, bypassing syncprov. The question of how policy state should behave in a replicated environment is complicated, and the ppolicy spec is silent in this area.
On Wednesday 27 May 2009 13:12:03 Howard Chu wrote:
Guillaume Rousse wrote:
Howard Chu a écrit :
Guillaume Rousse wrote:
Hello list.
I'm using delta-syncrepl in search-and-persist mode between my slaves and my master server. And I'm using a nagios plugin to check sync status, based on value of contextCSN attribute. But I'm often sync alerts for unknown reasons.
First issue, is this an expected result to have an higher contextCSN on the slave side ? From what I've understood from contextCSN, this attribute is updated each time a write operation is performed on the server. As the slave server is not supposed not to perform any write operation, this should never happens. However, it does:
Ordinarily, a slave cannot initiate any write operations. However, you appear to be using ppolicy. The ppolicy overlay writes Bind status updates to the local server, regardless of master or slave status. Thus, it can cause the slave's contextCSN to be newer than the master's.
OK, I guess it means higher CSN reports on slave side can be discarded.
I suggest adding your answer to section 18.1.1.2. (Syncrepl Details) of the Admin guide, with a few modifications.
Ordinarily, a consumer cannot initiate any write operations. However, some specific overlays may bring exceptions to this rule. For instance, the ppolicy overlay writes Bind status updates to the local server, regardless of its master or slave status. Thus, it can cause the consumer's contextCSN to be newer than the provider's.
Also, you didn't answer to my second question: syncrepl is also supposed to sync operational attributes. Does ppolicy also constitute an exception here ?
For the moment, yes - it writes directly to the underlying database, bypassing syncprov. The question of how policy state should behave in a replicated environment is complicated, and the ppolicy spec is silent in this area.
There are some complications with ppolicy in a replicated environment that cannot be elegantly solved (IMHO) without policy state on replicas being propagated to the other servers.
Specifically, in an environment with multiple slaves, I found it necessary to search all servers for locked out accounts, and if an account was locked out on a replica but not on the master, I would have to first lock out (by adding pwdAccountLockedTime) the account on the master, then unlock it / reset it.
However, this would still leave password failures (pwdAccountFailureTime) attributes stored on all the servers for this account, and (AFAICR) could not be removed via any means, meaning that these accounts could then be locked out by one additional failed attempt. (Since our contractors are no longer on- site, I haven't needed the script I used for this much, so the details may not be 100% correct).
IMHO, password policy state must be propagated to all servers (e.g., chained to the master).
Regards, Buchan
openldap-software@openldap.org