In another instance, this time, the replica restart crashed the master.
Logs from the master: Mar 8 12:55:51 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_op_search: got a persistent search with a cookie=rid=512,csn=20250307175925.825777Z#000000#000#000000 Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_search_response: cookie=rid=512,csn=20250308175550.008761Z#000000#000#000000 Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_sendinfo: refreshPresent cookie=rid=512,csn=20250308175550.008761Z#000000#000#000000 Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_sendresp: cookie=rid=512,csn=20250308175615.323137Z#000000#000#000000 Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_sendresp: cookie=rid=512,csn=20250308175625.704652Z#000000#000#000000 Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_sendresp: cookie=rid=512,csn=20250308175633.255673Z#000000#000#0000
Mar 8 12:55:20 aaa-prod-master-1 kernel: slapd[7650]: segfault at 7f5875e68420 ip 0000000000445cfa sp 00007f5f71dfb560 error 4 in slapd[419000+1b9000] likely on CPU 1 (core 1, socket 0)
Mar 8 12:55:25 aaa-prod-master-1 systemd-coredump[209442]: Process 5036 (slapd) of user 3003 dumped core.#012#012Stack trace of thread 7650:#012#0 0x0000000000445cfa connection_abandon (/var/services/openldap/libexec/slapd + 0x45cfa)#012#1 0x00000000004460d5 connection_closing (/var/services/openldap/libexec/slapd + 0x460d5)#012#2 0x0000000000447d18 connection_read (/var/services/openldap/libexec/slapd + 0x47d18)#012#3 0x000000000044741b connection_read_thread (/var/services/openldap/libexec/slapd + 0x4741b)#012#4 0x00007f792e2f0bed n/a (n/a + 0x0)#012#5 0x00007f792d889d22 n/a (n/a + 0x0)#012ELF object binary architecture: AMD x86-64
Regards, Suresh
On Wed, Mar 5, 2025 at 8:09 AM Suresh Veliveli < Suresh.Veliveli@georgetown.edu> wrote:
Yes. New changes have been sent to other replicas. Only rid=129 is stuck until restart; after that, it catches up. Limits are set to unlimited for the replication dn.
limits dn.exact="uid=syncrepladmin,ou=RepAdmin,dc=georgetown,dc=edu" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited
Thanks, Suresh
On Wed, Mar 5, 2025 at 5:41 AM Ondřej Kuzník ondra@mistotebe.net wrote:
On Tue, Mar 04, 2025 at 11:27:21AM -0500, Suresh Veliveli wrote:
Not for rid=129.
Feb 27 11:59:53 aaa-prod-master-1 slapd[155194]: conn=3685651 op=1
syncprov_sendresp: cookie=rid=129,csn=20250227165948.741563Z#000000#000#000000
Feb 27 11:59:53 aaa-prod-master-1 slapd[155194]: conn=3685651 op=1
syncprov_sendresp: cookie=rid=129,csn=20250227165948.748550Z#000000#000#000000
Feb 27 23:07:07 aaa-prod-master-1 slapd[693294]: conn=8478 op=1
syncprov_sendresp: cookie=rid=129,csn=20250228040634.365834Z#000000#000#000000
For other replicas: Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1129 op=1
syncprov_sendresp: cookie=rid=245,csn=20250227165924.192557Z#000000#000#000000
Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1129 op=1
syncprov_sendresp: sending LDAP_SYNC_MODIFY, dn=uid=mbw86,ou=people,dc=georgetown,dc=edu
Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=3554242 op=1
syncprov_sendresp: cookie=rid=143,csn=20250227165918.446243Z#000000#000#000000
Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=3554242 op=1
syncprov_sendresp: sending LDAP_SYNC_MODIFY, dn=uid=res128,ou=people,dc=georgetown,dc=edu
Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1155 op=1
syncprov_sendresp: cookie=rid=247,csn=20250227165921.534510Z#000000#000#000000
Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1155 op=1
syncprov_sendresp: sending LDAP_SYNC_MODIFY, dn=uid=jh2526,ou=people,dc=georgetown,dc=edu
Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1127 op=1
syncprov_sendresp: cookie=rid=644,csn=20250227165917.751851Z#000000#000#000000
Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1127 op=1
syncprov_sendresp: sending LDAP_SYNC_MODIFY, dn=uid=mpssim,ou=people,dc=georgetown,dc=edu
I can't see any errors or anything interesting in here, but is there anything newer than 20250227165948.748550Z#000000#000#000000 that's sent to the other replicas before you restarted the provider - is rid=129 the only one that got stuck? Can't find the relevant bits in the configuration you've posted so far, are you sure the replication identity has their limits (time and size limits) set to unlimited?
Thanks,
-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP
-- Suresh Veliveli Sr. UNIX Systems Engineer Georgetown University University Information Services | Security Infrastructure and Policy-Identity and Collaboration 202-262-6676 (cell) | 202-687-3108 (work)