On Sun, Dec 29, 2024 at 01:28:42PM -0500, Suresh Veliveli wrote:
Another instance where replication is stuck and not recovering.
# requesting: contextCSN contextCSN: *20241229135907.725117Z#000000#000#000000* aaa-prod-aws-10:2636 # requesting: contextCSN contextCSN:* 20241228185913.665451Z#000000#000#000000*
*Log info:* Dec 28 13:59:21 aaa-prod-aws-10 slapd[1161864]: do_syncrep2: rid=650 cookie=rid=650,csn=20241228185913.665451Z#000000#000#000000 Dec 28 13:59:21 aaa-prod-aws-10 slapd[1161864]: syncrepl_entry: rid=650 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY) csn=20241228185913.665451Z#000000#000#000000 tid 0x7f26ee5fd640 Dec 28 13:59:21 aaa-prod-aws-10 slapd[1161864]: slap_queue_csn: queueing 0x7f26e0dcee50 20241228185913.665451Z#000000#000#000000 Dec 28 13:59:21 aaa-prod-aws-10 slapd[1161864]: slap_graduate_commit_csn: removing 0x7f26e0dcee50 20241228185913.665451Z#000000#000#000000 Dec 28 13:59:21 aaa-prod-aws-10 slapd[1161864]: slap_queue_csn: queueing 0x7f26e0f34360 20241228185913.665451Z#000000#000#000000 Dec 28 13:59:21 aaa-prod-aws-10 slapd[1161864]: slap_graduate_commit_csn: removing 0x7f26e0f34360 20241228185913.665451Z#000000#000#000000
Nothing gets logged about replication after the above.
Am I missing something?
Hi Suresh, anything in the provider logs around that time? All consumers messages will be tagged with a specific "conn=xxx op=yyy" which you can discover e.g. by looking for the cookie it sends at the beginning of the session.
Couple of other questions: - is the TCP connection alive as far as the OS is concerned (I see in the thread you've confirmed TCP keepalive is enabled, correct?) - could you post the cn=monitor info for the consumer? The objectclass to look for is olmSyncReplInstance
Thanks,