Hi Ondřej,
Thanks for getting back. I do have the logs from a previous replication stall. I'll capture the logs again next time it happens. I checked the logs. I don't see any abandoned connections.
aaa-prod-aws-12:1636 # requesting: contextCSN contextCSN: 20250102015911.702871Z#000000#000#000000
All the relevant logs and info:
dn: cn=Consumer 152,cn=Database 1,cn=Databases,cn=Monitor structuralObjectClass: olmSyncReplInstance creatorsName: modifiersName: createTimestamp: 20241209130653Z modifyTimestamp: 20241209130653Z olmSRProviderURIList: ldaps://aaa-master-1.uis.georgetown.edu:636/ olmSRConnection: IP=172.20.86.12:49880 olmSRSyncPhase: Persist olmSRNextConnect: 00000101000000Z olmSRLastConnect: 20241229203510Z olmSRLastContact: 20250102015934Z olmSRLastCookieRcvd: rid=152,csn=20250102015911.702871Z#000000#000#000000 olmSRLastCookieSent: rid=152,csn=20241229202835.459483Z#000000#000#000000 entryDN: cn=Consumer 152,cn=Database 1,cn=Databases,cn=Monitor subschemaSubentry: cn=Subschema hasSubordinates: FALSE
*Consumer:* netstat -an | grep 49880 tcp 0 0 172.20.86.12:49880 172.17.21.52:636 ESTABLISHED
*Master:* netstat -an | grep 172.20.86.12 tcp 0 0 172.17.21.52:636 172.20.86.12:49880 ESTABLISHED
*Master logs:* Jan 1 20:59:18 aaa-prod-master-1 slapd[3281130]: conn=1035 op=1 syncprov_sendresp: cookie=rid=152,csn=20250102015911.686467Z#000000#000#000000 *Jan 1 20:59:18 aaa-prod-master-1 slapd[3281130]: conn=1035 op=1 syncprov_sendresp: cookie=rid=152,csn=**20250102015911.702871Z#000000#* *000#000000*
*Nothing about rid=152 is logged after the above.*
*Consumer logs:Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: do_syncrep2: rid=152 cookie=rid=152,csn=20250102015911.702871Z#000000#000#000000Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: syncrepl_entry: rid=152 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY) csn=20250102015911.702871Z#000000#000#000000 tid 0x7f7a753fc640Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: slap_queue_csn: queueing 0x7f7a687c6190 20250102015911.702871Z#000000#000#000000Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: slap_graduate_commit_csn: removing 0x7f7a687c6190 20250102015911.702871Z#000000#000#000000Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: slap_queue_csn: queueing 0x7f7a6877d9b0 20250102015911.702871Z#000000#000#000000Jan 1 20:59:34 aaa-prod-aws-12 slapd[1229307]: slap_graduate_commit_csn: removing 0x7f7a6877d9b0 20250102015911.702871Z#000000#000#000000*
*Nothing about replication is logged after the above.*
From the last coredump:
Thread 1 (Thread 0x7f85243fa640 (LWP 192314)): #0 connection_abandon (c=0x7f9eb4ad0078) at connection.c:714 #1 0x00000000004460d5 in connection_closing (c=0x7f9eb4ad0078, why=0x5db380 <conn_lost_str> "connection lost") at connection.c:785 #2 0x0000000000447d18 in connection_read (s=31, cri=0x7f85243f99a0) at connection.c:1453 #3 0x000000000044741b in connection_read_thread (ctx=0x7f85243f99f0, argv=0x1f) at connection.c:1260 #4 0x00007f9ecd406bed in ldap_int_thread_pool_wrapper (xpool=0xac8080) at tpool.c:1059 #5 0x00007f9ecca89c02 in start_thread () from /lib64/libc.so.6 #6 0x00007f9eccb0ec40 in clone3 () from /lib64/libc.so.6 No core file now.
Thanks,
Suresh
On Tue, Mar 4, 2025 at 6:12 AM Ondřej Kuzník ondra@mistotebe.net wrote:
On Mon, Jan 13, 2025 at 10:42:58AM -0500, Suresh Veliveli wrote:
Hi Ondřej,
Attached is the file from the last crash for "thread apply all bt full".
I
built it from the src (openldap.org). The installation is prefixed to /var/services/openldap directory. I do have "stats sync" log level
enabled.
Our logs are huge, I could get the necessary info if you can tell what I need to look for.
Hi Suresh, as I mentioned, you want to see what the provider was doing with the session and the decisions it took along the way. To see that, you want to find where the session starts (where you find this "cookie=rid=..." message) and *then* use the "conn=xxx op=yyy" you find in this message to isolate the messages that correlate with it. That's the first thing you'll need to track down what eventually happened to the session.
If it's related to the crash in any way, it might also show us if something went wrong if we're lucky.
Also just out of interest, are there any Abandon/Cancel requests in the logs?
Thanks,
-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP