--On Wednesday, November 04, 2009 3:03 PM -0500 Edward Capriolo edlinuxguru@gmail.com wrote:
All, I had an event happen I would like to understand how openldap handled this and why.
We have two openldap nodes doing sync replication (configuration is below). One node locked up this weekend no response to ping etc. This in itself is a problem (since this server is not very high load) however I do not have any diagnostics on this so I will move on.
ldap1 crashed. It stayed off for a good part of the weekend. On returning to work ldap1 rebooted and openldap was started. At this point changes made to ldap2 were propagating to ldap1. However changes to ldap1 were not replicating to ldap2.
I restarted ldap1 with only replication debug on. I saw this... (dont mind the header upfront we are using daemon tools)
[root@nyldap1 ~]# tail -f /service/openldap/log/main/current @400000004af1d1582b3c9c74 bdb_db_open: warning - no DB_CONFIG file found in dire ctory /usr/local/openldap/var/openldap-data: (2). @400000004af1d1582b3caffc Expect poor performance for suffix "o=ec,c=US". @400000004af1d1582b88090c bdb_monitor_db_open: monitoring disabled; configure mo nitor database to enable @400000004af1d1582b8bfcc4 slapd starting @400000004af1d1592bf58c34 slap_client_connect: URI=ldap://nyldap1.ops.ec.com DN="cn=root,o=ec,c=us" ldap_sasl_bind_s failed (-5) @400000004af1d15a00006f54 do_syncrepl: rid=003 rc -5 retrying (4 retries left) @400000004af1d15a00022ca4 do_syncrep2: rid=004 LDAP_RES_INTERMEDIATE - REFRESH_D ELETE @400000004af1d15a2aac52f4 TLS: can't accept: (null). @400000004af1d15f2ab0f674 TLS: can't accept: (null). @400000004af1d15f2ab8f16c do_syncrep2: rid=003 LDAP_RES_INTERMEDIATE - REFRESH_D ELETE @400000004af1d1642aafae54 TLS: can't accept: (null). @400000004af1d1692ab2fa14 TLS: can't accept: (null).
Changes from nyldap1 were still not propagating to nyldap2.
Then I restarted nyldap2. replication was again working in both directions.
I based my setup on these notes: http://www.linuxtopia.org/online_books//network_administration_guides/lda p_administration/replication_LDAP_Sync_Replication.html
syncrepl rid=004 provider=ldap://nyldap2.ops.ec.com binddn="cn=root,o=ec,c=US" bindmethod=simple credentials=XXXXXXXX searchbase="o=ec,c=US" type=refreshAndPersist starttls=no tls_reqcert=never interval=00:00:00:10 retry="5 5 300 5" timeout=1
mirrormode true overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 100
So our database is very very low on write/update activity. I think I understand that syncprov-checkpoint timed out. I am going to change this to
syncprov-checkpoint 10000 9000 syncprov-sessionlog 10000
As I said our database is very low write, we add users periodically and peoples passwords expire that is all the writes/updates that happen.
My openldap is 2.4.16 build from source.
Latest stable is 2.4.19. There have been replication fixes since 2.4.16.
So important questions :
- Why did two way replication not restart ? If this is the correct
behavior for my configuration or a bug. Can I configure openldap to always start refresh and persist?
Sounds like a bug. But again, there have been fixes since the release you are running.
- Could some entries be out of sync now? If yes
2a) Can I use a tool to confirm these systems are in sync? "slapcat & diff" (better option)
I'd suggest doing such a diff, yes.
2b) if one side is out of sync can i force one side to replicate over the other?
man slapd, look at the -c option.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration