Hi!
I had just asked https://serverfault.com/q/1177576/407952, but I'll summarize here: I think I configured delta-syncrepl for a MMR correctly using OpenLDAP >= 2.5.18 (cn=config also synced). However when I offlined one node and updated the main DIT via slapadd, the other node wouldn't update its DIT when the offlined node is online again. I wonder whether I have to empty or delete the corresponding accesslog, or is there some other step to perform? Is delta syncrepl looking at the accesslog only to detect changes?
The syncrepl configs look like this for the DIT: olcSyncrepl: {0}rid=115 provider="ldap://server5/\ " searchbase="dc=..." type=refreshAndPersist \ retry="60 5 300 5 1800 +" logbase="cn=changelog-1" logfilter="(&(objectClass=au\ ditWriteObject)(reqResult=0))" schemachecking=on syncdata=accesslog starttls=cr\ itical tls_reqcert=demand bindmethod=sasl saslmech=external tls_cert="/etc/ssl/\ servercerts/syncrepl.pem" tls_key="/etc/ssl/serverkeys/syncrepl.key" tls_cacert\ ="/etc/ssl/servercerts/ CA-bundle.pem" olcSyncrepl: {1}rid=116 provider="ldap://server6/\ " searchbase="dc=..." type=refreshAndPersist \ retry="60 5 300 5 1800 +" logbase="cn=changelog-1" logfilter="(&(objectClass=au\ ditWriteObject)(reqResult=0))" schemachecking=on syncdata=accesslog starttls=cr\ itical tls_reqcert=demand bindmethod=sasl saslmech=external tls_cert="/etc/ssl/\ servercerts/syncrepl.pem" tls_key="/etc/ssl/serverkeys/syncrepl.key" tls_cacert\ ="/etc/ssl/servercerts/ CA-bundle.pem"
Kind regards, Ulrich Windl
On Tue, Mar 25, 2025 at 09:25:38AM +0000, Windl, Ulrich wrote:
Hi!
I had just asked https://serverfault.com/q/1177576/407952, but I'll summarize here: I think I configured delta-syncrepl for a MMR correctly using OpenLDAP
= 2.5.18 (cn=config also synced). However when I offlined one node
and updated the main DIT via slapadd, the other node wouldn't update its DIT when the offlined node is online again.
Hi Ulrich, nothing there indicates a replication issue, on the contrary. Not that there's much information you've actually given, e.g. you mention specific CSNs but they are missing in the logs you refer to, etc.
I wonder whether I have to empty or delete the corresponding accesslog, or is there some other step to perform?
Yes, in deltasync the provider's accesslog is tightly linked to its main DB, so any time you make offline changes to the main DB, you either wipe accesslog (preferably) or restore it in lockstep.
Is delta syncrepl looking at the accesslog only to detect changes?
It is replicating from it if it can: you can get conflicting writes in a multi-writer environment which is when OpenLDAP has to fall back to plain syncrepl to resolve them and then switches back to pulling them from accesslog.
Regards,
Ondřej,
Still I don't quite understand: I had stopped the outdated node, deleted its accesslog, and restarted it, but still it would not sync. Then I stopped the updated node, removed the accesslog, then started it. Still the nodes would not sync. As the original node where tha data had been exported does not use delta syncrepl, I cannot import the accesslog database. So I wonder what I'll have to do to trigger a sync. To me it looks like some bug or even conceptual problem.
Currently the updated node logs nothing, while the outdated node periodically logs messages like: Mar 27 10:16:41 v06 slapd[30699]: do_syncrep1: rid=115 starting refresh (sending cookie=rid=115,sid=006,csn=20130719093756.074776Z#000000#000#000000;20250217105250.345944Z#000000#001#000000;20250218171739.629994Z#000000#002#000000;20250217065706.238392Z#000000#003#000000;20250227092327.859231Z#000000#005#000000;20250320153500.286773Z#000000#006#000000) Mar 27 10:16:41 v06 slapd[30699]: do_syncrep2: rid=115 LDAP_RES_SEARCH_RESULT Mar 27 10:16:41 v06 slapd[30699]: do_syncrep2: rid=115 LDAP_RES_SEARCH_RESULT (32) No such object Mar 27 10:16:41 v06 slapd[30699]: do_syncrep2: rid=115 (32) No such object Mar 27 10:16:41 v06 slapd[30699]: do_syncrepl: rid=115 rc -101 retrying Mar 27 10:16:41 v06 slapd[30699]: do_syncrep1: rid=105 starting refresh (sending cookie=rid=105,sid=006,csn=20250320000000.000000Z#000000#000#000000;20250321000000.000000Z#000000#001#000000;20200721123717.002866Z#000000#002#000000;20181031083258.073732Z#000000#003#000000;20250325081318.563987Z#000000#005#000000;20250227092006.790591Z#000000#006#000000) Mar 27 10:16:41 v06 slapd[30699]: do_syncrep2: rid=105 LDAP_RES_SEARCH_RESULT Mar 27 10:16:41 v06 slapd[30699]: do_syncrep2: rid=105 LDAP_RES_SEARCH_RESULT (32) No such object Mar 27 10:16:41 v06 slapd[30699]: do_syncrep2: rid=105 (32) No such object Mar 27 10:16:41 v06 slapd[30699]: do_syncrepl: rid=105 rc -101 retrying
Kind regards, Ulrich Windl
-----Original Message----- From: Ondřej Kuzník ondra@mistotebe.net Sent: Tuesday, March 25, 2025 2:34 PM To: Windl, Ulrich u.windl@ukr.de Cc: openldap-technical@openldap.org Subject: [EXT] Re: accesslog with delta syncrepl: Why wouldn't contebnt be synced?
On Tue, Mar 25, 2025 at 09:25:38AM +0000, Windl, Ulrich wrote:
Hi!
I had just asked https://serverfault.com/q/1177576/407952, but I'll summarize here: I think I configured delta-syncrepl for a MMR correctly using OpenLDAP
= 2.5.18 (cn=config also synced). However when I offlined one node
and updated the main DIT via slapadd, the other node wouldn't update its DIT when the offlined node is online again.
Hi Ulrich, nothing there indicates a replication issue, on the contrary. Not that there's much information you've actually given, e.g. you mention specific CSNs but they are missing in the logs you refer to, etc.
I wonder whether I have to empty or delete the corresponding accesslog, or is there some other step to perform?
Yes, in deltasync the provider's accesslog is tightly linked to its main DB, so any time you make offline changes to the main DB, you either wipe accesslog (preferably) or restore it in lockstep.
Is delta syncrepl looking at the accesslog only to detect changes?
It is replicating from it if it can: you can get conflicting writes in a multi-writer environment which is when OpenLDAP has to fall back to plain syncrepl to resolve them and then switches back to pulling them from accesslog.
Regards,
-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP
On Thu, Mar 27, 2025 at 09:31:37AM +0000, Windl, Ulrich wrote:
Ondřej,
Still I don't quite understand: I had stopped the outdated node, deleted its accesslog, and restarted it, but still it would not sync. Then I stopped the updated node, removed the accesslog, then started it. Still the nodes would not sync. As the original node where tha data had been exported does not use delta syncrepl, I cannot import the accesslog database. So I wonder what I'll have to do to trigger a sync. To me it looks like some bug or even conceptual problem.
Hi Ulrich, I'm lost about what you're actually doing and what the actual symptoms are you're seeing. Please outline the starting point, what you did and the relevant logs from both provider and consumer in the equation.
And make sure your ACLs for both main and accesslog DBs allow unrestricted read access for the replicator user (the Admin Guide will be updated to highlight just this with the next release).
Thanks,
Hi!
I think I found some bugs in my version of OpenLDAP 2.5: First I had core-dumps when a sync was expected to happen. Like this: Mar 27 12:45:32 v06 slapd[31077]: conn=-1 op=0 syncprov_matchops: recording uuid for dn=olcDatabase={4}mdb,cn=config on opc=0x7fa7d0001018 Mar 27 12:45:32 v06 slapd[31077]: conn=1001 op=2 syncprov_matchops: skipping relayed sid 005 Mar 27 12:45:32 v06 slapd[31077]: conn=-1 op=0 syncprov_add_slog: adding csn=20250321000000.000000Z#000000#001#000000 to sessionlog, uuid=8f32d7d8-9a95-103f-866e-d9067b62d79b Mar 27 12:45:32 v06 slapd[31077]: slap_queue_csn: queueing 0x7fa7d014e090 20250321000000.000000Z#000000#001#000000 Mar 27 12:45:32 v06 slapd[31077]: slap_graduate_commit_csn: removing 0x7fa7d014e090 20250321000000.000000Z#000000#001#000000 Mar 27 12:45:32 v06 slapd[31077]: conn=-1 op=0 accesslog_response: got result 0x44 adding log entry reqStart=20250327114532.000002Z,cn=audit Mar 27 12:45:32 v06 slapd[31077]: slap_sl_malloc of 93893629956635 bytes failed Mar 27 12:45:32 v06 systemd[1]: Created slice Slice /system/systemd-coredump. Mar 27 12:45:32 v06 systemd[1]: Started Process Core Dump (PID 31217/UID 0). Mar 27 12:45:32 v06 systemd-coredump[31218]: [🡕] Process 31077 (slapd) of user 76 dumped core.
Stack trace of thread 31081: #0 0x00007fa7fb8a941c __pthread_kill_implementation (libc.so.6 + 0xa941c) #1 0x00007fa7fb857842 raise (libc.so.6 + 0x57842) #2 0x00007fa7fb83f5cf abort (libc.so.6 + 0x3f5cf) #3 0x00007fa7fb83f4e7 __assert_fail_base.cold (libc.so.6 + 0x3f4e7) #4 0x00007fa7fb84fb32 __assert_fail (libc.so.6 + 0x4fb32) #5 0x0000555fefe6caca n/a (slapd + 0x9caca) #6 0x0000555fefe6d24e slap_sl_calloc (slapd + 0x9d24e) #7 0x0000555fefe296f4 build_new_dn (slapd + 0x596f4) #8 0x00007fa7fa8287c7 n/a (accesslog.so + 0x67c7) #9 0x00007fa7fa829182 n/a (accesslog.so + 0x7182) #10 0x0000555fefe23158 n/a (slapd + 0x53158) #11 0x0000555fefe2373c n/a (slapd + 0x5373c) #12 0x0000555fefe24294 slap_send_ldap_result (slapd + 0x54294) #13 0x0000555fefdfb823 n/a (slapd + 0x2b823) #14 0x0000555fefe87523 overlay_op_walk (slapd + 0xb7523) #15 0x0000555fefe876ae n/a (slapd + 0xb76ae) #16 0x0000555fefe76ffa n/a (slapd + 0xa6ffa) #17 0x0000555fefe7fd7d n/a (slapd + 0xafd7d) #18 0x0000555fefe13d30 n/a (slapd + 0x43d30) #19 0x00007fa7fbb10da0 n/a (libldap-2.5.releng.so.0 + 0x48da0) #20 0x00007fa7fb8a758c start_thread (libc.so.6 + 0xa758c) #21 0x00007fa7fb92ea28 __clone3 (libc.so.6 + 0x12ea28)
I'm not really surprised that "malloc of 93893629956635 bytes failed". The changelog before the error was: dn: reqStart=20250327114532.000002Z,cn=audit objectClass: auditModify reqStart: 20250327114532.000002Z reqEnd: 20250327114532.000003Z reqType: modify reqSession: 105 reqAuthzID: cn=config reqDN: olcOverlay={0}syncprov,olcDatabase={1}mdb,cn=config reqResult: 0 reqMod: olcSpNoPresent:= TRUE reqMod: olcSpReloadHint:= TRUE reqMod: entryCSN:= 20250327114348.616974Z#000000#005#000000 reqMod: modifiersName:= cn=config reqMod: modifyTimestamp:= 20250327114348Z reqOld: olcSpNoPresent: TRUE reqOld: olcSpReloadHint: TRUE reqOld: entryCSN: 20250325081318.563987Z#000000#005#000000 reqOld: modifiersName: cn=config reqOld: modifyTimestamp: 20250325081318Z reqEntryUUID: db401792-7c0e-1032-81cf-d54356bd918f
Noticing that the objectClass is auditModify, I wonder whether the recommended filter logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" is correct.
The other bug I found is this: The reason for delta syncrepl not starting was my manual editing of the config.ldif used to slapadd: It seems a somehow non-matching contextCSN prevent delta syncrepl to start, or said the other way 'round: After I had deleted contextCSN from the config.ldif, the servers started to sync! However I had used slapadd option -w to load the data.
I don't know whether " conn=-1 op=0 syncprov_findcsn: mode=FIND_MAXCSN csn=" are related to that effect.
Kind regards, Ulrich Windl
-----Original Message----- From: Ondřej Kuzník ondra@mistotebe.net Sent: Thursday, March 27, 2025 11:42 AM To: Windl, Ulrich u.windl@ukr.de Cc: openldap-technical@openldap.org Subject: [EXT] Re: Re: accesslog with delta syncrepl: Why wouldn't contebnt be synced?
On Thu, Mar 27, 2025 at 09:31:37AM +0000, Windl, Ulrich wrote:
Ondřej,
Still I don't quite understand: I had stopped the outdated node, deleted its accesslog, and restarted it, but still it would not sync. Then I stopped the updated node, removed the accesslog, then started it. Still the nodes would not sync. As the original node where tha data had been exported does not use delta syncrepl, I cannot import the accesslog database. So I wonder what I'll have to do to trigger a sync. To me it looks like some bug or even conceptual problem.
Hi Ulrich, I'm lost about what you're actually doing and what the actual symptoms are you're seeing. Please outline the starting point, what you did and the relevant logs from both provider and consumer in the equation.
And make sure your ACLs for both main and accesslog DBs allow unrestricted read access for the replicator user (the Admin Guide will be updated to highlight just this with the next release).
Thanks,
-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP
On Thu, Mar 27, 2025 at 12:55:05PM +0000, Windl, Ulrich wrote:
Mar 27 12:45:32 v06 slapd[31077]: slap_sl_malloc of 93893629956635 bytes failed Mar 27 12:45:32 v06 systemd[1]: Created slice Slice /system/systemd-coredump. Mar 27 12:45:32 v06 systemd[1]: Started Process Core Dump (PID 31217/UID 0). Mar 27 12:45:32 v06 systemd-coredump[31218]: [🡕] Process 31077 (slapd) of user 76 dumped core.
Stack trace of thread 31081: #0 0x00007fa7fb8a941c __pthread_kill_implementation (libc.so.6 + 0xa941c) #1 0x00007fa7fb857842 raise (libc.so.6 + 0x57842) #2 0x00007fa7fb83f5cf abort (libc.so.6 + 0x3f5cf) #3 0x00007fa7fb83f4e7 __assert_fail_base.cold (libc.so.6 + 0x3f4e7) #4 0x00007fa7fb84fb32 __assert_fail (libc.so.6 + 0x4fb32) #5 0x0000555fefe6caca n/a (slapd + 0x9caca) #6 0x0000555fefe6d24e slap_sl_calloc (slapd + 0x9d24e) #7 0x0000555fefe296f4 build_new_dn (slapd + 0x596f4) #8 0x00007fa7fa8287c7 n/a (accesslog.so + 0x67c7) #9 0x00007fa7fa829182 n/a (accesslog.so + 0x7182) #10 0x0000555fefe23158 n/a (slapd + 0x53158) #11 0x0000555fefe2373c n/a (slapd + 0x5373c) #12 0x0000555fefe24294 slap_send_ldap_result (slapd + 0x54294) #13 0x0000555fefdfb823 n/a (slapd + 0x2b823) #14 0x0000555fefe87523 overlay_op_walk (slapd + 0xb7523) #15 0x0000555fefe876ae n/a (slapd + 0xb76ae) #16 0x0000555fefe76ffa n/a (slapd + 0xa6ffa) #17 0x0000555fefe7fd7d n/a (slapd + 0xafd7d) #18 0x0000555fefe13d30 n/a (slapd + 0x43d30) #19 0x00007fa7fbb10da0 n/a (libldap-2.5.releng.so.0 + 0x48da0) #20 0x00007fa7fb8a758c start_thread (libc.so.6 + 0xa758c) #21 0x00007fa7fb92ea28 __clone3 (libc.so.6 + 0x12ea28)
I'm not really surprised that "malloc of 93893629956635 bytes failed". The changelog before the error was:
Hard to tell, can you rebuild OpenLDAP without stripping the debug symbols out? And then open the core in gdb and issue a 'thread apply all bt full' to get a useable backtrace?
Noticing that the objectClass is auditModify, I wonder whether the recommended filter logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" is correct.
Yes, auditModify is a subclass of auditWriteObject so it matches.
The other bug I found is this: The reason for delta syncrepl not starting was my manual editing of the config.ldif used to slapadd: It seems a somehow non-matching contextCSN prevent delta syncrepl to start, or said the other way 'round: After I had deleted contextCSN from the config.ldif, the servers started to sync! However I had used slapadd option -w to load the data.
If you're modifying configuration by hand you *must* slapcat the cn=config DB, edit the LDIF and then load it back with slapadd -n0.
Since you're also replicating it, you *must* also load the hand-modified configuration with "-w". Then if you're replacing it on other servers, slapcat the resulting server configuration and load this on the other servers *without* "-w" as the replication metadata (entryCSN, contextCSN) are already populated and have to stay consistent.
I would assume the crash stems from the above not being followed but if you can still reproduce it, please file a bug with the backtrace and logs and any other information that will help us reproduce and fix the problem.
Thanks,
openldap-technical@openldap.org