How to debug out of sync servers ? - openldap-technical

12 Oct 2022


      Hello,
I have a kind of annoying problem with two  openldap servers.
It is a simple pair setup in delta-sync and multimaster althought modifications are actually always done on the same one. Version is 2.5.7 (well I know 2.6 is out but I can’t work as fast as openldap people).
Several times a day the de-facto client is out of sync for several minutes. 
For example now I have a delta of about 50 minutes (usually it’s less), but the sync logs are very active here is some lines of the current operations
Oct 12 16:03:24 ldap-renater3 slapd[1343]: do_syncrep2: rid=901 cookie=rid=901,sid=029,csn=20221012140303.663754Z#000000#029#000000
Oct 12 16:03:24 ldap-renater3 slapd[1343]: slap_queue_csn: queueing 0x7fa7278dd0e0 20221012140303.663754Z#000000#029#000000
Oct 12 16:03:24 ldap-renater3 slapd[1343]: syncrepl_message_to_op: rid=901 tid 0x7fa826062700
Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_matchops: recording uuid for dn=cn=elafont,ou=bordeaux-inp.fr,ou=mailboxes,dc=ipb,dc=fr on opc=0x7fa81002d008
Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_add_slog: adding csn=20221012140303.663754Z#000000#029#000000 to sessionlog, uuid=d4eaa120-c1e0-103b-88a5-0dec75f30866
Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_add_slog: expiring csn=20221007192836.067967Z#000000#029#000000 from sessionlog (sessionlog size=10000001)
Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_add_slog: updating mincsn for sid=41 csn=20221007192836.058680Z#000000#029#000000 to 20221007192836.067967Z#000000#029#000000
Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_queue_csn: queueing 0x7fa7278d9d00 20221012140303.663754Z#000000#029#000000
Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_matchops: recording uuid for dn=reqStart=20221012140324.000068Z,cn=accesslog on opc=0x7fa81002dbf8
Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=39500 op=1 syncprov_matchops: skipping original sid 029
Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_graduate_commit_csn: removing 0x7fa7278d9d00 20221012140303.663754Z#000000#029#000000
Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_graduate_commit_csn: removing 0x7fa7278dd0e0 20221012140303.663754Z#000000#029#000000
Oct 12 16:03:25 ldap-renater3 slapd[1343]: syncrepl_message_to_op: rid=901 be_modify cn=elafont,ou=bordeaux-inp.fr,ou=mailboxes,dc=ipb,dc=fr (0)
Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_queue_csn: queueing 0x7fa7278de470 20221012140303.663754Z#000000#029#000000
Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_graduate_commit_csn: removing 0x7fa7278de470 20221012140303.663754Z#000000#029#000000
The behaviour that seems to occurs is that the logging sometimes stop (nothing more to do it seems) and when it stops the servers are in sync. So it seems the sync operations are very slow. Of course I  do not log on normal operation.
The directory as something like 70k entries (180Mb for the mdb file).
Servers have plenty of memory :
free
              total        used        free      shared  buff/cache   available
Mem:        8148668     3445380      125972         500     4577316     4390168
Swap:       2009084      183552     1825532
Servers are on an vmware cluster, from the vmware point of view cpu usage is very low.  OS is ubuntu 20.04
I don’t know where to dig...
Thanks in advance
— 
Frédéric Goudal
Ingénieur Système, DSI Bordeaux-INP
+33 556 84 23 11