I will be highly thankful to you, if you can help me out in below issue.
We have three servers on rhel 6.3, 16 vcpu and 32G RAM , openldap 2.4.33 with mdb db of 200G
We are facing replication issue on our servers, otherwise servers are good with login and user registration from website.(doing on one server only as of now)
We imported data on one server mmam01 and copied it to other 2 servers. After some time we got big diff b/w db size of mmam01 and other two servers. Then exported data and restored it on other two servers.
i tried adding a user and i got replicated to other two servers, but after some time, new users stop getting replicated to other servers.
initial replication status after some 30 min
Even when i tried with blank db
it initally started and then stopped.
i got errors like
dn_callback : entries have identical CSN
syncrepl_entry: rid=111 entry unchanged, ignored
Sat Jan 12 12:40:41 EST 2013
DR-SJ contextCSN: 20130101132757.303803Z#000000#000#000000 contextCSN: 20130111144013.926562Z#000000#001#000000 contextCSN: 20130112174023.266193Z#000000#002#000000 DC-mmam01 contextCSN: 20130101132757.303803Z#000000#000#000000 contextCSN: 20130112174006.314483Z#000000#001#000000 contextCSN: 20130112174023.266193Z#000000#002#000000 DC-mmam04 contextCSN: 20130101132757.303803Z#000000#000#000000 contextCSN: 20130111144013.926562Z#000000#001#000000 contextCSN: 20130112174023.266193Z#000000#002#000000
After 2 hours
DR-SJ contextCSN: 20130101132757.303803Z#000000#000#000000 contextCSN: 20130111144013.926562Z#000000#001#000000 contextCSN: 20130112174023.266193Z#000000#002#000000 contextCSN: 20130112175710.938307Z#000000#003#000000 DC-mmam01 contextCSN: 20130101132757.303803Z#000000#000#000000 contextCSN: 20130112193219.242546Z#000000#001#000000 contextCSN: 20130112174023.266193Z#000000#002#000000 contextCSN: 20130112175710.938307Z#000000#003#000000 DC-mmam04 contextCSN: 20130101132757.303803Z#000000#000#000000 contextCSN: 20130111144013.926562Z#000000#001#000000 contextCSN: 20130112174023.266193Z#000000#002#000000 contextCSN: 20130112175710.938307Z#000000#003#000000
My ldap.conf file is(same on all server) , we have host-ip mapping in /etc/hosts file
BASE dc=example, dc=com URI ldap://mmam01.com ldaps://mmam01.com ldap://mmam04.com ldaps:// mmam04.com ldap://sjam01.com ldaps://sjam01.com TLS_REQCERT demand TLS_CACERT /etc/openldap/cacerts/cacert.pem
slapd.conf file(from mmam01)
include /etc/openldap/schema/core.schema include /etc/openldap/schema/cosine.schema include /etc/openldap/schema/nis.schema include /etc/openldap/schema/inetorgperson.schema include /etc/openldap/schema/openldap.schema include /etc/openldap/schema/dyngroup.schema include /etc/openldap/schema/ppolicy.schema include /etc/openldap/schema2/channelIdentifier.schema include /etc/openldap/schema2/platform.schema include /etc/openldap/schema2/extendedProfileKey.schema include /etc/openldap/schema2/extendedProfileValue.schema include /etc/openldap/schema2/behaviorKey.schema include /etc/openldap/schema2/behaviorValue.schema include /etc/openldap/schema2/questionAnswer.schema include /etc/openldap/schema2/extendedTop.schema include /etc/openldap/schema2/counter.schema serverid 1 TLSCipherSuite HIGH:MEDIUM:+SSLv3 TLSCACertificateFile /etc/openldap/cacerts/cacert.pem TLSCertificateFile /etc/openldap/cacerts/mmam01.crt TLSCertificateKeyFile /etc/openldap/cacerts/mmam01.key TLSVerifyClient never pidfile /var/symas/run/slapd.pid argsfile /var/symas/run/slapd.args loglevel sync stats idletimeout 30 writetimeout 30 modulepath /etc/openldap/lib64/openldap moduleload back_mdb.la moduleload ppolicy.la moduleload unique.la moduleload syncprov.la database mdb suffix "dc=example,dc=com" directory /openldap/var/data access to attrs=userPassword by self write by anonymous auth by * break
access to * by group/groupOfUniqueNames/uniqueMember.exact="cn=PWrite,ou=bGroup,dc=example,dc=com" manage by group/groupOfUniqueNames/uniqueMember.exact="cn=PRead,ou=bGroup,dc=example,dc=com" read by * break access to * by self write by anonymous auth by * read rootdn "cn=Manager,dc=example,dc=com" rootpw {SSHA}dXDESQeFjSoa/A1HfJ2TAzYf4DrSYWY index mail,uid,postalCode,smail,channelType,channelValue,answer,behavName,objectclass,type eq index givenName,sn,city,cn,extName sub index displayName approx index entryCSN,entryUUID eq checkpoint 128 15 maxsize 274877906944 syncrepl rid=111 provider=ldap://sjam01.com binddn="cn=Manager,dc=example,dc=com" bindmethod=simple credentials=0m2013 tls_cacert=/etc/openldap/cacerts/cacert.pem searchbase="dc=example,dc=com" type=refreshAndPersist retry="5 5 60 +" network-timeout=10 timeout=10 syncrepl rid=222 provider=ldap://mmam04.com binddn="cn=Manager,dc=example,dc=com" bindmethod=simple credentials=0m2013 tls_cacert=/etc/openldap/cacerts/cacert.pem searchbase="dc=example,dc=com" type=refreshAndPersist retry="5 5 60 +" network-timeout=10 timeout=10 overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 100 mirrormode true overlay unique unique_attributes mail overlay ppolicy ppolicy_default "cn=default,ou=pwdPolicy,dc=example,dc=com" ppolicy_use_lockout
logs
DR-sj
Jan 12 14:51:28 sjprodam01 slapd[25165]: do_syncrep2: rid=111 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jan 12 14:51:28 sjprodam01 slapd[25165]: do_syncrep2: rid=111 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jan 12 14:51:28 sjprodam01 slapd[25165]: do_syncrep2: rid=111 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jan 12 14:51:28 sjprodam01 slapd[25165]: do_syncrep2: rid=111 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jan 12 14:51:28 sjprodam01 slapd[25165]: do_syncrep2: rid=111 LDAP_RES_INTERMEDIATE - SYNC_ID_SET
mmam04
Jan 12 14:53:24 mmprodam04 slapd[14108]: do_syncrep2: rid=222 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jan 12 14:53:24 mmprodam04 slapd[14108]: do_syncrep2: rid=222 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jan 12 14:53:25 mmprodam04 slapd[14108]: do_syncrep2: rid=222 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jan 12 14:53:25 mmprodam04 slapd[14108]: do_syncrep2: rid=222 LDAP_RES_INTERMEDIATE - SYNC_ID_SET
mmam01 Users are add on this server
Jan 12 14:53:26 mmprodam01 slapd[24380]: slap_queue_csn: queing 0x7f55dd4fa120 20130112195326.941804Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=291 RESULT tag=105 err=0 text= Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_graduate_commit_csn: removing 0x7f55d010ee30 20130112195326.941804Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=292 MOD dn="extName=PIT,cn=entitlements,cn=extendedProfile,uid=6a9ddf85-1072-48b4-9f09-10f032c8f05e,ou=endUsers,dc=example,dc=com" Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=292 MOD attr=extValue Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_queue_csn: queing 0x7f55deffc210 20130112195327.100182Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=292 RESULT tag=103 err=0 text= Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_graduate_commit_csn: removing 0x7f55d8392770 20130112195327.100182Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=293 MOD dn="extName=RVW,cn=entitlements,cn=extendedProfile,uid=6a9ddf85-1072-48b4-9f09-10f032c8f05e,ou=endUsers,dc=example,dc=com" Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=293 MOD attr=extValue Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_queue_csn: queing 0x7f560a22e210 20130112195327.103686Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=293 RESULT tag=103 err=0 text= Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_graduate_commit_csn: removing 0x7f55fc6f35e0 20130112195327.103686Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=294 MOD dn="extName=ICA,cn=entitlements,cn=extendedProfile,uid=6a9ddf85-1072-48b4-9f09-10f032c8f05e,ou=endUsers,dc=example,dc=com" Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=294 MOD attr=extValue Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_queue_csn: queing 0x7f55f3ffd210 20130112195327.107815Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=294 MOD attr=extValue Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_queue_csn: queing 0x7f55f3ffd210 20130112195327.107815Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=294 RESULT tag=103 err=0 text= Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_graduate_commit_csn: removing 0x7f55e87ae550 20130112195327.107815Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=295 MOD dn="extName=RVP,cn=entitlements,cn=extendedProfile,uid=6a9ddf85-1072-48b4-9f09-10f032c8f05e,ou=endUsers,dc=example,dc=com" Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=295 MOD attr=extValue Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_queue_csn: queing 0x7f55f0cf8210 20130112195327.112994Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=295 RESULT tag=103 err=0 text= Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_graduate_commit_csn: removing 0x7f55e4618780 20130112195327.112994Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=296 MOD dn="extName=RAD,cn=entitlements,cn=extendedProfile,uid=6a9ddf85-1072-48b4-9f09-10f032c8f05e,ou=endUsers,dc=example,dc=com" Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=296 MOD attr=extValue Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_queue_csn: queing 0x7f55dd4f9210 20130112195327.117321Z#000000#001#000000 Jan 12 14:53:27 mmprodam01 slapd[24380]: conn=8516 op=296 RESULT tag=103 err=0 text= Jan 12 14:53:27 mmprodam01 slapd[24380]: slap_graduate_commit_csn: removing 0x7f55d0001490 20130112195327.117321Z#000000#001#000000
I have tried adding users and changing passwords one by one, it doesn't work. I can search old entries from one server to other server.
--On Tuesday, January 15, 2013 7:56 PM +0530 anil beniwal beni.anil@gmail.com wrote:
Even when i tried with blank db
it initally started and then stopped.
i got errors like
dn_callback : entries have identical CSN
syncrepl_entry: rid=111 entry unchanged, ignored
If you continue to ignore my advise to use delta-syncrepl instead of standard syncrepl, then you can expect to continue to have problems. Also, since you are using MDB, grab the latest OpenLDAP code from RE24.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
--On Tuesday, January 15, 2013 9:13 AM -0800 Quanah Gibson-Mount quanah@zimbra.com wrote:
--On Tuesday, January 15, 2013 7:56 PM +0530 anil beniwal beni.anil@gmail.com wrote:
Even when i tried with blank db
it initally started and then stopped.
i got errors like
dn_callback : entries have identical CSN
syncrepl_entry: rid=111 entry unchanged, ignored
And just to be clear, this particular bit is not an error message.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
I think using delta-syncrepl is good advice, I've tried it out and don't see any of the problems I've seen with plain syncrepl. However, ... I am having problems upgrading my production system from using plain syncrepl to delta-syncrepl. We can't afford downtime on the production system, so I've just been testing the upgrade in a dev environment, and so far I've not managed a clean upgrade. The production system has 4 LDAP servers set up for MMR syncrepl, though in reality only one of the servers is actively used, the others are currently only for failover in case the active server fails (this may well change in the future). The upgrade has to create a cn=accesslog db and syncprov overlays, and modify the syncrepl attributes of the main db to use access log (cn=config still uses plain syncrepl). The first problem I get is that when the creation of the cn=accesslog db and sync prof overlays is replicated, the changes are applied out of order, so that syncrepl tries to create the sync prof overlay before the accesslog db to which it refers has been created, and breaks replication of the changes. I can work round this by creating the accesslog db, waiting for that to replicate and then creating the syncprov overlay, but this is still annoying and complicates the upgrade process unnecessarily. I see the next problem when modifying the syncrepl attributes to refer to accesslog, and so far I can't work round it consistently and confidently enough to try it out in production. Because it is an MMR setup, there is one syncrepl attribute for each server. I can modify one of these attributes, and it replicates with no problem. But as soon as I change the next attribute, one of the servers starts continually logging an error message to the slapd log, indicating that another server requires a refresh. The only way I have been able to cure this is by deleting the main db and the accesslog db and letting replication regenerate them. But this doesn't always seem to work and in any case is not really practical in the production environment where the main db has 5.5 million DNs and takes up nearly 20 Gb. This second problem doesn't happen consistently, but because I don't understand why it happens or how to fix it consistently, I can't go ahead with the production upgrade to delta-syncrepl, which is very frustrating. We are currently running openldap 2.4.31, but I do plan to see if 2.4.33 or RE24 behaves better. However, looking at the openldap sources I haven't spotted any fixes which look likely to help. Any ideas?
Chris
Date: Tue, 15 Jan 2013 09:13:02 -0800 From: quanah@zimbra.com To: beni.anil@gmail.com Subject: Re: Replication not working CC: openldap-technical@openldap.org
--On Tuesday, January 15, 2013 7:56 PM +0530 anil beniwal beni.anil@gmail.com wrote:
Even when i tried with blank db
it initally started and then stopped.
i got errors like
dn_callback : entries have identical CSN
syncrepl_entry: rid=111 entry unchanged, ignored
If you continue to ignore my advise to use delta-syncrepl instead of standard syncrepl, then you can expect to continue to have problems. Also, since you are using MDB, grab the latest OpenLDAP code from RE24.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
--On Tuesday, January 15, 2013 6:53 PM +0000 Chris Card ctcard@hotmail.com wrote:
This second problem doesn't happen consistently, but because I don't understand why it happens or how to fix it consistently, I can't go ahead with the production upgrade to delta-syncrepl, which is very frustrating. We are currently running openldap 2.4.31, but I do plan to see if 2.4.33 or RE24 behaves better. However, looking at the openldap sources I haven't spotted any fixes which look likely to help. Any ideas?
I take it you are replicating cn=config then? I never do that, so hard for me to comment on issues that may arise by doing that.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
--On Tuesday, January 15, 2013 6:53 PM +0000 Chris Card ctcard@hotmail.com wrote:
This second problem doesn't happen consistently, but because I don't understand why it happens or how to fix it consistently, I can't go ahead with the production upgrade to delta-syncrepl, which is very frustrating. We are currently running openldap 2.4.31, but I do plan to see if 2.4.33 or RE24 behaves better. However, looking at the openldap sources I haven't spotted any fixes which look likely to help. Any ideas?
I take it you are replicating cn=config then? I never do that, so hard for me to comment on issues that may arise by doing that.
Yes, though that's not the fundamental issue, since I can work round cn=configreplication strangeness.I've done some more investigation and I can now see what is causing the second problem.When I change the olcSyncrepl values for the main database to use cn=accesslog (i.e. to use delta-syncrepl rather than syncrepl), the slapd logs are flooded with messageslike this: "do_syncrep2: rid=xxx (4096) Content Sync Refresh Required" I have worked out why this is: the slapd server corresponding to rid=xxx is tryingto search for entries in its cn=accesslog database with entryCSN <= the contextCSN of itsmain database, but failing to find any matches, because the cn=accesslog database wascreated after the last change to the main database that happened through this server. I think I can work round this by making sure an LDAP modify is done to the main databaseagainst this server after the accesslog database is created, but this does seem like a bug tome. Chris
--On Tuesday, January 22, 2013 2:43 PM +0000 Chris Card ctcard@hotmail.com wrote:
I think I can work round this by making sure an LDAP modify is done to the main database against this server after the accesslog database is created, but this does seem like a bug to me.
Feel free to file it. ;)
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-technical@openldap.org