In another instance, this time, the replica restart crashed the master.
Logs from the master:
Mar 8 12:55:51 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_op_search: got a persistent search with a cookie=rid=512,csn=20250307175925.825777Z#000000#000#000000
Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_search_response: cookie=rid=512,csn=20250308175550.008761Z#000000#000#000000
Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_sendinfo: refreshPresent cookie=rid=512,csn=20250308175550.008761Z#000000#000#000000
Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_sendresp: cookie=rid=512,csn=20250308175615.323137Z#000000#000#000000
Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_sendresp: cookie=rid=512,csn=20250308175625.704652Z#000000#000#000000
Mar 8 12:56:39 aaa-prod-master-1 slapd[209555]: conn=1081 op=1 syncprov_sendresp: cookie=rid=512,csn=20250308175633.255673Z#000000#000#0000
Mar 8 12:55:20 aaa-prod-master-1 kernel: slapd[7650]: segfault at 7f5875e68420 ip 0000000000445cfa sp 00007f5f71dfb560 error 4 in slapd[419000+1b9000] likely on CPU 1 (core 1, socket 0)
Mar 8 12:55:25 aaa-prod-master-1 systemd-coredump[209442]: Process 5036 (slapd) of user 3003 dumped core.#012#012Stack trace of thread 7650:#012#0 0x0000000000445cfa connection_abandon (/var/services/openldap/libexec/slapd + 0x45cfa)#012#1 0x00000000004460d5 connection_closing (/var/services/openldap/libexec/slapd + 0x460d5)#012#2 0x0000000000447d18 connection_read (/var/services/openldap/libexec/slapd + 0x47d18)#012#3 0x000000000044741b connection_read_thread (/var/services/openldap/libexec/slapd + 0x4741b)#012#4 0x00007f792e2f0bed n/a (n/a + 0x0)#012#5 0x00007f792d889d22 n/a (n/a + 0x0)#012ELF object binary architecture: AMD x86-64
Regards,
Suresh
Yes. New changes have been sent to other replicas. Only rid=129 is stuck until restart; after that, it catches up. Limits are set to unlimited for the replication dn.
limits dn.exact="uid=syncrepladmin,ou=RepAdmin,dc=georgetown,dc=edu" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited
Thanks,
Suresh
On Tue, Mar 04, 2025 at 11:27:21AM -0500, Suresh Veliveli wrote:
> Not for rid=129.
> Feb 27 11:59:53 aaa-prod-master-1 slapd[155194]: conn=3685651 op=1 syncprov_sendresp: cookie=rid=129,csn=20250227165948.741563Z#000000#000#000000
> Feb 27 11:59:53 aaa-prod-master-1 slapd[155194]: conn=3685651 op=1 syncprov_sendresp: cookie=rid=129,csn=20250227165948.748550Z#000000#000#000000
>
> Feb 27 23:07:07 aaa-prod-master-1 slapd[693294]: conn=8478 op=1 syncprov_sendresp: cookie=rid=129,csn=20250228040634.365834Z#000000#000#000000
>
> For other replicas:
> Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1129 op=1 syncprov_sendresp: cookie=rid=245,csn=20250227165924.192557Z#000000#000#000000
> Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1129 op=1 syncprov_sendresp: sending LDAP_SYNC_MODIFY, dn=uid=mbw86,ou=people,dc=georgetown,dc=edu
> Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=3554242 op=1 syncprov_sendresp: cookie=rid=143,csn=20250227165918.446243Z#000000#000#000000
> Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=3554242 op=1 syncprov_sendresp: sending LDAP_SYNC_MODIFY, dn=uid=res128,ou=people,dc=georgetown,dc=edu
> Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1155 op=1 syncprov_sendresp: cookie=rid=247,csn=20250227165921.534510Z#000000#000#000000
> Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1155 op=1 syncprov_sendresp: sending LDAP_SYNC_MODIFY, dn=uid=jh2526,ou=people,dc=georgetown,dc=edu
> Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1127 op=1 syncprov_sendresp: cookie=rid=644,csn=20250227165917.751851Z#000000#000#000000
> Feb 27 12:00:00 aaa-prod-master-1 slapd[155194]: conn=1127 op=1 syncprov_sendresp: sending LDAP_SYNC_MODIFY, dn=uid=mpssim,ou=people,dc=georgetown,dc=edu
I can't see any errors or anything interesting in here, but is there
anything newer than 20250227165948.748550Z#000000#000#000000 that's sent
to the other replicas before you restarted the provider - is rid=129 the
only one that got stuck? Can't find the relevant bits in the
configuration you've posted so far, are you sure the replication
identity has their limits (time and size limits) set to unlimited?
Thanks,
--
Ondřej Kuzník
Senior Software Engineer
Symas Corporation http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP
-- Suresh Veliveli
Sr. UNIX Systems Engineer
Georgetown University
University Information Services | Security Infrastructure and Policy-Identity and Collaboration
202-262-6676 (cell) | 202-687-3108 (work)
--
Suresh Veliveli
Sr. UNIX Systems Engineer
Georgetown University
University Information Services | Security Infrastructure and Policy-Identity and Collaboration
202-262-6676 (cell) | 202-687-3108 (work)