Hello,
I have a kind of annoying problem with two openldap servers.
It is a simple pair setup in delta-sync and multimaster althought modifications are actually always done on the same one. Version is 2.5.7 (well I know 2.6 is out but I can’t work as fast as openldap people).
Several times a day the de-facto client is out of sync for several minutes. For example now I have a delta of about 50 minutes (usually it’s less), but the sync logs are very active here is some lines of the current operations
Oct 12 16:03:24 ldap-renater3 slapd[1343]: do_syncrep2: rid=901 cookie=rid=901,sid=029,csn=20221012140303.663754Z#000000#029#000000 Oct 12 16:03:24 ldap-renater3 slapd[1343]: slap_queue_csn: queueing 0x7fa7278dd0e0 20221012140303.663754Z#000000#029#000000 Oct 12 16:03:24 ldap-renater3 slapd[1343]: syncrepl_message_to_op: rid=901 tid 0x7fa826062700 Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_matchops: recording uuid for dn=cn=elafont,ou=bordeaux-inp.fr,ou=mailboxes,dc=ipb,dc=fr on opc=0x7fa81002d008 Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_add_slog: adding csn=20221012140303.663754Z#000000#029#000000 to sessionlog, uuid=d4eaa120-c1e0-103b-88a5-0dec75f30866 Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_add_slog: expiring csn=20221007192836.067967Z#000000#029#000000 from sessionlog (sessionlog size=10000001) Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_add_slog: updating mincsn for sid=41 csn=20221007192836.058680Z#000000#029#000000 to 20221007192836.067967Z#000000#029#000000 Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_queue_csn: queueing 0x7fa7278d9d00 20221012140303.663754Z#000000#029#000000 Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=-1 op=0 syncprov_matchops: recording uuid for dn=reqStart=20221012140324.000068Z,cn=accesslog on opc=0x7fa81002dbf8 Oct 12 16:03:25 ldap-renater3 slapd[1343]: conn=39500 op=1 syncprov_matchops: skipping original sid 029 Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_graduate_commit_csn: removing 0x7fa7278d9d00 20221012140303.663754Z#000000#029#000000 Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_graduate_commit_csn: removing 0x7fa7278dd0e0 20221012140303.663754Z#000000#029#000000 Oct 12 16:03:25 ldap-renater3 slapd[1343]: syncrepl_message_to_op: rid=901 be_modify cn=elafont,ou=bordeaux-inp.fr,ou=mailboxes,dc=ipb,dc=fr (0) Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_queue_csn: queueing 0x7fa7278de470 20221012140303.663754Z#000000#029#000000 Oct 12 16:03:25 ldap-renater3 slapd[1343]: slap_graduate_commit_csn: removing 0x7fa7278de470 20221012140303.663754Z#000000#029#000000
The behaviour that seems to occurs is that the logging sometimes stop (nothing more to do it seems) and when it stops the servers are in sync. So it seems the sync operations are very slow. Of course I do not log on normal operation.
The directory as something like 70k entries (180Mb for the mdb file). Servers have plenty of memory : free total used free shared buff/cache available Mem: 8148668 3445380 125972 500 4577316 4390168 Swap: 2009084 183552 1825532
Servers are on an vmware cluster, from the vmware point of view cpu usage is very low. OS is ubuntu 20.04
I don’t know where to dig...
Thanks in advance — Frédéric Goudal Ingénieur Système, DSI Bordeaux-INP +33 556 84 23 11
On Oct 12, 2022, at 9:25 AM, Frédéric Goudal frederic.goudal@bordeaux-inp.fr wrote:
The behaviour that seems to occurs is that the logging sometimes stop (nothing more to do it seems) and when it stops the servers are in sync. So it seems the sync operations are very slow. Of course I do not log on normal operation.
The directory as something like 70k entries (180Mb for the mdb file). Servers have plenty of memory : free total used free shared buff/cache available Mem: 8148668 3445380 125972 500 4577316 4390168 Swap: 2009084 183552 1825532
Servers are on an vmware cluster, from the vmware point of view cpu usage is very low. OS is ubuntu 20.04
I don’t know where to dig…
Hi,
Where I’d start is to quantify how far out of sync they’re getting. A good tool for that is the slapd watcher. It calculates based on contextCSN’s and also provides stats from the monitor database.
``` ldap://m02 Entries Bind Unbind Search Compare Modify ModDN Add Delete Abandon Extended Num 578764 502 1 44492 1 1 0 1805517 1515081 0 102 Num/s 0.00 0.00 0.00 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 contextCSN: 20221011141123.463727Z#000000#001#000000 actv@2022-10-10 13:08:15, idle@2022-10-11 19:16:45 contextCSN: 20221011141123.851236Z#000000#002#000000 actv@2022-10-10 13:08:15, ahead 16:17:51, max delta 16:17:51
ldap://m01, late@2022-10-10 21:53:43 Entries Bind Unbind Search Compare Modify ModDN Add Delete Abandon Extended Num 944921 250 1 6340 1 1 0 1010062 537473 0 102 Num/s -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 contextCSN: 20221010215333.012374Z#000000#001#000000 actv@2022-10-10 13:08:15, behind 16:17:50, max delta 16:17:50 contextCSN: 20221010215333.015837Z#000000#002#000000 actv@2022-10-10 13:08:15 ```
# Install the SLAPD Replication Watcher
## This is work-in-progress
1. Perform the following:
``` yum install gcc -y git clone https://github.com/openldap/openldap.git cd openldap ./configure make depend cd libraries make -j2 cd ../tests/progs/ make slapd-watcher ```
2. Run from openldap/tests/progs
``` ./slapd-watcher -b “dc=example,dc=com" ldap://host1 ldap://host2 ```
— Shawn
Hello,
Thanks for the idea and the practical informations
Of course just after I posted the message, I have found that a bug in the feeding program did something like 30k modifications. So the sync process was just running it’s speed, which, in this case, was far slower than the actions on the master.
But I have a second problem on a more complex setup (replication in push mode) and I will try to see what slapd watcher can do for me.
Thanks.
Fred.
Le 13 oct. 2022 à 14:27, Shawn McKinney smckinney@symas.com a écrit :
On Oct 12, 2022, at 9:25 AM, Frédéric Goudal frederic.goudal@bordeaux-inp.fr wrote:
The behaviour that seems to occurs is that the logging sometimes stop (nothing more to do it seems) and when it stops the servers are in sync. So it seems the sync operations are very slow. Of course I do not log on normal operation.
The directory as something like 70k entries (180Mb for the mdb file). Servers have plenty of memory : free total used free shared buff/cache available Mem: 8148668 3445380 125972 500 4577316 4390168 Swap: 2009084 183552 1825532
Servers are on an vmware cluster, from the vmware point of view cpu usage is very low. OS is ubuntu 20.04
I don’t know where to dig…
Hi,
Where I’d start is to quantify how far out of sync they’re getting. A good tool for that is the slapd watcher. It calculates based on contextCSN’s and also provides stats from the monitor database.
ldap://m02 Entries Bind Unbind Search Compare Modify ModDN Add Delete Abandon Extended Num 578764 502 1 44492 1 1 0 1805517 1515081 0 102 Num/s 0.00 0.00 0.00 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 contextCSN: 20221011141123.463727Z#000000#001#000000 actv@2022-10-10 13:08:15, idle@2022-10-11 19:16:45 contextCSN: 20221011141123.851236Z#000000#002#000000 actv@2022-10-10 13:08:15, ahead 16:17:51, max delta 16:17:51 ldap://m01, late@2022-10-10 21:53:43 Entries Bind Unbind Search Compare Modify ModDN Add Delete Abandon Extended Num 944921 250 1 6340 1 1 0 1010062 537473 0 102 Num/s -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 contextCSN: 20221010215333.012374Z#000000#001#000000 actv@2022-10-10 13:08:15, behind 16:17:50, max delta 16:17:50 contextCSN: 20221010215333.015837Z#000000#002#000000 actv@2022-10-10 13:08:15
# Install the SLAPD Replication Watcher
## This is work-in-progress
- Perform the following:
yum install gcc -y git clone https://github.com/openldap/openldap.git cd openldap ./configure make depend cd libraries make -j2 cd ../tests/progs/ make slapd-watcher
- Run from openldap/tests/progs
./slapd-watcher -b “dc=example,dc=com" ldap://host1 ldap://host2
— Shawn
— Frédéric Goudal Ingénieur Système, DSI Bordeaux-INP +33 556 84 23 11
On Thu, Oct 13, 2022 at 07:27:24AM -0500, Shawn McKinney wrote:
On Oct 12, 2022, at 9:25 AM, Frédéric Goudal frederic.goudal@bordeaux-inp.fr wrote:
The behaviour that seems to occurs is that the logging sometimes stop (nothing more to do it seems) and when it stops the servers are in sync. So it seems the sync operations are very slow. Of course I do not log on normal operation.
Where I’d start is to quantify how far out of sync they’re getting. A good tool for that is the slapd watcher. It calculates based on contextCSN’s and also provides stats from the monitor database.
ldap://m02 Entries Bind Unbind Search Compare Modify ModDN Add Delete Abandon Extended Num 578764 502 1 44492 1 1 0 1805517 1515081 0 102 Num/s 0.00 0.00 0.00 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 contextCSN: 20221011141123.463727Z#000000#001#000000 actv@2022-10-10 13:08:15, idle@2022-10-11 19:16:45 contextCSN: 20221011141123.851236Z#000000#002#000000 actv@2022-10-10 13:08:15, ahead 16:17:51, max delta 16:17:51
I would note that slapd-watcher's 'max delta' is *not* the same as replication delay, treating is as such would be committing the same mistake as most replication monitoring tools out there are doing. Watcher's output is for manual realtime monitoring and needs to be interpreted with caution.
To measure replication delays one needs to take out a clock and measure how long it actually took, never calculate the difference of CSNs.
Regards,
openldap-technical@openldap.org