https://bugs.openldap.org/show_bug.cgi?id=9538
Issue ID: 9538 Summary: Accesslog entryCSN ordering is not always monotonous Product: OpenLDAP Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: --- Component: overlays Assignee: bugs@openldap.org Reporter: ondra@mistotebe.net Target Milestone: ---
For delta-sync replication, we rely on CSN order being preserved at all times.
When logops includes anything more than "writes", we don't use li_op_rmutex to maintain serialisation. A concurrent write and another operation (like bind) can have the write have its csn assigned before the bind. But before the write finishes on the main DB and is logged, the bind has already hit accesslog.
This entry out of order doesn't match the usual filter, so non-persist sessions will not notice, however running persist sessions could get a new cookie sent, depending on how things are ordered when they hit syncprov.
A sample: ---- 8< ---- Apr 26 19:56:04 localhost slapd[43930]: conn=1003 op=41 ADD dn="uid=dm01-R1H2-41660,ou=People,dc=example,dc=com" Apr 26 19:56:04 localhost slapd[43930]: conn=1003 op=41 syncprov_matchops: recording uuid for dn=reqStart=20210426195604.000350Z,cn=accesslog on opc=0x7d3d2800dc50 Apr 26 19:56:04 localhost slapd[43930]: slap_get_csn: conn=1003 op=42 generated new csn=20210426195604.556053Z#000000#002#000000 manage=1 Apr 26 19:56:04 localhost slapd[43930]: slap_queue_csn: queueing 0x7d3d38148f60 20210426195604.556053Z#000000#002#000000 Apr 26 19:56:04 localhost slapd[43930]: conn=1005 op=1 BIND dn="dc=example,dc=com" method=128 Apr 26 19:56:04 localhost slapd[43930]: conn=1005 op=1 BIND dn="dc=example,dc=com" mech=SIMPLE bind_ssf=0 ssf=256 Apr 26 19:56:04 localhost slapd[43930]: conn=1003 op=41 RESULT tag=105 err=0 qtime=0.005874 etime=0.015182 text= Apr 26 19:56:04 localhost slapd[43930]: slap_get_csn: conn=1005 op=1 generated new csn=20210426195604.558683Z#000000#002#000000 manage=1 Apr 26 19:56:04 localhost slapd[43930]: slap_queue_csn: queueing 0x561f09113f80 20210426195604.558683Z#000000#002#000000 Apr 26 19:56:04 localhost slapd[43930]: conn=1005 op=1 syncprov_matchops: recording uuid for dn=reqStart=20210426195604.000364Z,cn=accesslog on opc=0x561f0900fcf0 Apr 26 19:56:04 localhost slapd[43930]: conn=1002 op=2 syncprov_qresp: set up a new syncres mode=4 csn=20210426195604.558683Z#000000#002#000000 Apr 26 19:56:04 localhost slapd[43930]: conn=1001 op=2 syncprov_qresp: set up a new syncres mode=4 csn=20210426195604.558683Z#000000#002#000000 Apr 26 19:56:04 localhost slapd[43930]: conn=1000 op=2 syncprov_qresp: set up a new syncres mode=4 csn=20210426195604.558683Z#000000#002#000000 Apr 26 19:56:04 localhost slapd[43930]: slap_graduate_commit_csn: removing 0x561f09113f80 20210426195604.558683Z#000000#002#000000 Apr 26 19:56:04 localhost slapd[43930]: conn=1005 op=1 RESULT tag=97 err=0 qtime=0.000020 etime=0.004297 text= Apr 26 19:56:04 localhost slapd[43930]: slap_queue_csn: queueing 0x7d3d38148250 20210426195604.556053Z#000000#002#000000 Apr 26 19:56:04 localhost slapd[43930]: slap_graduate_commit_csn: removing 0x7d3d38148250 20210426195604.556053Z#000000#002#000000 Apr 26 19:56:04 localhost slapd[43930]: slap_graduate_commit_csn: removing 0x7d3d38148f60 20210426195604.556053Z#000000#002#000000 ---- 8< ----
Example entries in the order slapcat sees them: ---- 8< ---- dn: reqStart=20210426195604.000364Z,cn=accesslog objectClass: auditBind reqStart: 20210426195604.000364Z reqEnd: 20210426195604.000365Z reqType: bind entryCSN: 20210426195604.558683Z#000000#002#000000
dn: reqStart=20210426195604.000351Z,cn=accesslog objectClass: auditAdd reqStart: 20210426195604.000351Z reqEnd: 20210426195604.000366Z reqType: add entryCSN: 20210426195604.556053Z#000000#002#000000 ---- 8< ----
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |needs_review
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Accesslog entryCSN ordering |Accesslog entryCSN ordering |is not always monotonous |is not always monotonically | |increasing
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs_review | Target Milestone|--- |2.5.5
https://bugs.openldap.org/show_bug.cgi?id=9538
--- Comment #1 from Ondřej Kuzník ondra@mistotebe.net --- This is an example of what can happen, accesslog is set up to log all ops:
1. op A, a write, comes in and locks accesslog 2. op B, like a Bind, comes in, it's not a write can proceed even though accesslog is locked 3. op B finishes, we decide to log it, create the accesslog entry and call be_add() (there's still no CSN assigned for it) 4. op A is now scheduled gets a CSN assigned (csnA) and gets committed into main DB 5. be_add() adding the log entry for op B gets a CSN assigned (csnB) and gets stored in the log DB 6. accesslog_response() is called for op A and unlocks accesslog, logs the write, which is now out of CSN order (csnA is before csnB already committed)
Right now that is what looks the most challenging sequence of events to sort.
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|2.5.5 |2.5.6
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |replication
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|2.5.6 |2.5.7
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|2.5.7 |2.6.0
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs@openldap.org |ondra@mistotebe.net
https://bugs.openldap.org/show_bug.cgi?id=9538
--- Comment #2 from Ondřej Kuzník ondra@mistotebe.net --- Turns out we can't serialise all operations in a 'logops all' scenario. If we did, it would make the server useless as a provider for refreshAndPersist (that syncrepl op would have to finish before anything else could be started, making it somewhat pointless).
We may need to either: - find a way to log entries that will not touch contextCSN (and will also be ignored by syncprov so the fact they're out of CSN order doesn't affect anyone) - prevent a CSN from being allocated to a non-write operation (even by overlays?) so we can safely do that ourselves
Unless we can do the latter, noone interested in more than successful writes is ever safe replicating accesslog, which so far has been suggested as a way to monitor the live DB traffic by external tools for auditing, etc.
https://bugs.openldap.org/show_bug.cgi?id=9538
--- Comment #3 from Ondřej Kuzník ondra@mistotebe.net --- Discussing this with Howard, the intention is that non-write ops never get a CSN associated with them. If they cause a write, e.g. account lockouts from ppolicy, that should be self-contained in its own operation. We should be able to proceed on that basis, looking to fix any other code that violates this if issues come up.
I'll see if that gets us any further.
https://bugs.openldap.org/show_bug.cgi?id=9538
Ondřej Kuzník ondra@mistotebe.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |IN_PROGRESS Ever confirmed|0 |1
--- Comment #4 from Ondřej Kuzník ondra@mistotebe.net --- https://git.openldap.org/openldap/openldap/-/merge_requests/377
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |TEST
--- Comment #5 from Quanah Gibson-Mount quanah@openldap.org --- • c4ac5b31 by Ondřej Kuzník at 2021-08-12T20:48:09+00:00 ITS#9538 Serialise CSN assignment in slapo-accesslog
https://bugs.openldap.org/show_bug.cgi?id=9538
--- Comment #6 from Ondřej Kuzník ondra@mistotebe.net --- This caused a regression in 2.6: https://git.openldap.org/openldap/openldap/-/merge_requests/385
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CONFIRMED Resolution|TEST |---
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|CONFIRMED |IN_PROGRESS
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED
--- Comment #7 from Quanah Gibson-Mount quanah@openldap.org --- Commits: • 8f8674fc by Ondřej Kuzník at 2021-09-05T14:33:59+00:00 ITS#9538 Populate op2.o_hdr earlier
• 781a6c25 by Ondřej Kuzník at 2021-09-05T14:33:59+00:00 ITS#9538 Only queue csns for ops that don't get skipped
https://bugs.openldap.org/show_bug.cgi?id=9538
--- Comment #8 from Quanah Gibson-Mount quanah@openldap.org --- RE26:
Commits: • 093acc3a by Ondřej Kuzník at 2021-09-06T15:24:04+00:00 ITS#9538 Populate op2.o_hdr earlier
• 7477e1eb by Ondřej Kuzník at 2021-09-06T15:24:11+00:00 ITS#9538 Only queue csns for ops that don't get skipped
https://bugs.openldap.org/show_bug.cgi?id=9538
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED
https://bugs.openldap.org/show_bug.cgi?id=9538
--- Comment #9 from Quanah Gibson-Mount quanah@openldap.org --- head:
• 95df37d1 by Ondřej Kuzník at 2022-01-10T22:25:33+00:00 ITS#9538 Populate op2.o_hdr earlier
• 87b78f20 by Ondřej Kuzník at 2022-01-10T22:25:33+00:00 ITS#9538 Do not attach csn on (internal) searches
• bc28e795 by Ondřej Kuzník at 2022-01-10T22:25:33+00:00 ITS#9538 Actually skip internal searches
RE26 (2.6.1)
• 507f64ff by Ondřej Kuzník at 2022-01-12T22:43:50+00:00 ITS#9538 Populate op2.o_hdr earlier
• c09db6d6 by Ondřej Kuzník at 2022-01-12T22:43:56+00:00 ITS#9538 Do not attach csn on (internal) searches
• c226e178 by Ondřej Kuzník at 2022-01-12T22:44:01+00:00 ITS#9538 Actually skip internal searches