sync replication questions - openldap-software

4 Nov 2009


      All,
I had an event happen I would like to understand how openldap handled
this and why.
We have two openldap nodes doing sync replication (configuration is
below). One node locked up this weekend no response to ping etc. This
in itself is a problem (since this server is not very high load)
however I do not have any diagnostics on this so I will move on.
ldap1 crashed. It stayed off for a good part of the weekend. On
returning to work ldap1 rebooted and openldap was started.
At this point changes made to ldap2 were propagating to ldap1. However
changes to ldap1 were not replicating to ldap2.
I restarted ldap1 with only replication debug on. I saw this... (dont
mind the header upfront we are using daemon tools)
[root@nyldap1 ~]# tail -f /service/openldap/log/main/current
@400000004af1d1582b3c9c74 bdb_db_open: warning - no DB_CONFIG file found in dire
ctory /usr/local/openldap/var/openldap-data: (2).
@400000004af1d1582b3caffc Expect poor performance for suffix "o=ec,c=US".
@400000004af1d1582b88090c bdb_monitor_db_open: monitoring disabled; configure mo
nitor database to enable
@400000004af1d1582b8bfcc4 slapd starting
@400000004af1d1592bf58c34 slap_client_connect: URI=ldap://nyldap1.ops.ec.com
DN="cn=root,o=ec,c=us" ldap_sasl_bind_s failed (-5)
@400000004af1d15a00006f54 do_syncrepl: rid=003 rc -5 retrying (4 retries left)
@400000004af1d15a00022ca4 do_syncrep2: rid=004 LDAP_RES_INTERMEDIATE - REFRESH_D
ELETE
@400000004af1d15a2aac52f4 TLS: can't accept: (null).
@400000004af1d15f2ab0f674 TLS: can't accept: (null).
@400000004af1d15f2ab8f16c do_syncrep2: rid=003 LDAP_RES_INTERMEDIATE - REFRESH_D
ELETE
@400000004af1d1642aafae54 TLS: can't accept: (null).
@400000004af1d1692ab2fa14 TLS: can't accept: (null).
Changes from nyldap1 were still not propagating to nyldap2.
Then I restarted nyldap2. replication was again working in both directions.
I based my setup on these notes:
http://www.linuxtopia.org/online_books//network_administration_guides/ldap_a...
syncrepl rid=004
       provider=ldap://nyldap2.ops.ec.com
       binddn="cn=root,o=ec,c=US"
       bindmethod=simple
       credentials=XXXXXXXX
       searchbase="o=ec,c=US"
       type=refreshAndPersist
       starttls=no
       tls_reqcert=never
       interval=00:00:00:10
       retry="5 5 300 5"
       timeout=1
mirrormode      true
overlay         syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 100
So our database is very very low on write/update activity. I think I
understand that syncprov-checkpoint timed out. I am going to change
this to
syncprov-checkpoint 10000 9000
syncprov-sessionlog 10000
As I said our database is very low write, we add users periodically
and peoples passwords expire that is all the writes/updates that
happen.
My openldap is 2.4.16 build from source.
So important questions :
1) Why did two way replication not restart ? If this is the correct
behavior for my configuration or a bug. Can I configure openldap to
always start refresh and persist?
2) Could some entries be out of sync now? If yes
2a) Can I use a tool to confirm these systems are in sync? "slapcat &
diff" (better option)
2b) if one side is out of sync can i force one side to replicate over the other?