https://bugs.openldap.org/show_bug.cgi?id=9652
Issue ID: 9652 Summary: Add "tee" capability to load balancer Product: OpenLDAP Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: --- Component: lloadd Assignee: bugs@openldap.org Reporter: mhardin@symas.com Target Milestone: ---
This is a request for an enhancement that would add a "tee" or "fan-out" capability to load balancer, where received operations are sent to two or more destinations simultaneously.
The primary goal or the enhancement is to make it possible to keep multiple independent and likely dissimilar directory systems in lock-step with each other over hours, days, or possibly even weeks.
The enhancement would not necessarily need to include a mechanism for converging the target systems should they become out of sync.
This is not intended to be a replication solution, rather it is viewed more as a "copy" solution intended to be used for specific short-term tasks that need multiple directory systems to be exactly synchronized but where replication is not desirable or even possible.
At least two uses come to mind: 1. Test harnesses, evaluating side-by-side operation of separate directory systems over time
2. Directory system transition validation harnesses
3. (maybe) Part of a test harness to record or replay LDAP workloads
* Other uses?
https://bugs.openldap.org/show_bug.cgi?id=9652
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |needs_review
https://bugs.openldap.org/show_bug.cgi?id=9652
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs_review | Target Milestone|--- |2.7.0 Assignee|bugs@openldap.org |ondra@mistotebe.net
https://bugs.openldap.org/show_bug.cgi?id=9652
--- Comment #1 from Ondřej Kuzník ondra@mistotebe.net --- On Thu, Aug 26, 2021 at 05:19:35PM +0000, openldap-its@openldap.org wrote:
This is a request for an enhancement that would add a "tee" or "fan-out" capability to load balancer, where received operations are sent to two or more destinations simultaneously.
The primary goal or the enhancement is to make it possible to keep multiple independent and likely dissimilar directory systems in lock-step with each other over hours, days, or possibly even weeks.
The enhancement would not necessarily need to include a mechanism for converging the target systems should they become out of sync.
This is not intended to be a replication solution, rather it is viewed more as a "copy" solution intended to be used for specific short-term tasks that need multiple directory systems to be exactly synchronized but where replication is not desirable or even possible.
At least two uses come to mind:
- Test harnesses, evaluating side-by-side operation of separate directory
systems over time
Directory system transition validation harnesses
(maybe) Part of a test harness to record or replay LDAP workloads
First thoughts:
Assuming all backends will react identically (do we need to serialise *all* operations to do that?), there are two approaches to this: - we send the operation to the first backend, when it's processed, second, etc. (what happens if the client drops dead or abandons the request?) - we send the operation to all backends in parallel
Some of this could be implemented as a new "tier" implementation (invalidating the name "tier", but never mind, maybe we can rename that to "group" or something eventually). AFAIK both options would still require some changes in how we process operation handling on the main path. In the former, we need to hook into response processing to redirect the operation to the next backend in list, in the latter, we need to duplicate the operation received before sending it out and be able to record all the duplicates on the client for Abandon processing etc.
Irrespective of these, when we send the response back can vary too, assuming the first configured backend is the one we care about: - forward the response as soon as we have the response from the first configured backend - wait until all backends have finished with the request
Obvious limitations: - operations that change the state of the connection, especially where the server is in charge of how that happens, are not going to work (SASL multi-step binds, TXNs, paging, etc. come to mind) - different server capacity could mean we get to send the request to one server but the others will be over their configured limits/unavailable, we need to decide whether we (even can) go ahead with sending the request
https://bugs.openldap.org/show_bug.cgi?id=9652
--- Comment #2 from Matthew Hardin mhardin@symas.com --- I'll take a stab at this.
On Aug 31, 2021, at 9:25 AM, openldap-its@openldap.org wrote:
https://bugs.openldap.org/show_bug.cgi?id=9652
--- Comment #1 from Ondřej Kuzník ondra@mistotebe.net --- On Thu, Aug 26, 2021 at 05:19:35PM +0000, openldap-its@openldap.org wrote:
This is a request for an enhancement that would add a "tee" or "fan-out" capability to load balancer, where received operations are sent to two or more destinations simultaneously.
The primary goal or the enhancement is to make it possible to keep multiple independent and likely dissimilar directory systems in lock-step with each other over hours, days, or possibly even weeks.
The enhancement would not necessarily need to include a mechanism for converging the target systems should they become out of sync.
This is not intended to be a replication solution, rather it is viewed more as a "copy" solution intended to be used for specific short-term tasks that need multiple directory systems to be exactly synchronized but where replication is not desirable or even possible.
At least two uses come to mind:
- Test harnesses, evaluating side-by-side operation of separate directory
systems over time
Directory system transition validation harnesses
(maybe) Part of a test harness to record or replay LDAP workloads
First thoughts:
Assuming all backends will react identically (do we need to serialise *all* operations to do that?), there are two approaches to this:
- we send the operation to the first backend, when it's processed,
second, etc.
That would mean that each incoming (upstream) operation takes as long to complete as the sum of all of the downstream operations, adding in timeouts for unreachable/failing downstream nodes. I don't see how that could be very useful.
(what happens if the client drops dead or abandons the request?)
I think it goes without saying that if the client sends an abandon request it's forwarded to each of the downstream servers and whatever, if anything, comes back from the primary server is sent back to the client.
If the client silently abandons a request, the behavior should be the same as it would be in any other case: the client must be prepared to discard any outstanding data that might be returned in by the request.
- we send the operation to all backends in parallel
This seems to be more useful because an upstream operation only takes as long as the longest downstream operation takes to complete.
Some of this could be implemented as a new "tier" implementation (invalidating the name "tier", but never mind, maybe we can rename that to "group" or something eventually).
Makes sense.
AFAIK both options would still require some changes in how we process operation handling on the main path. In the former, we need to hook into response processing to redirect the operation to the next backend in list,
Unless someone comes up with a real need for the serial approach I think we can dispense with it completely. Any takers?
in the latter, we need to duplicate the operation received before sending it out and be able to record all the duplicates on the client for Abandon processing etc.
I understand.
Irrespective of these, when we send the response back can vary too, assuming the first configured backend is the one we care about:
- forward the response as soon as we have the response from the first
configured backend
- wait until all backends have finished with the request
I think that it's simpler to hold any responses until all backends have finished with the request. If the other case becomes interesting in practice, add it as a feature.
Obvious limitations:
- operations that change the state of the connection, especially where
the server is in charge of how that happens, are not going to work (SASL multi-step binds, TXNs, paging, etc. come to mind)
I don't see why that should be the case. Explain?
- different server capacity could mean we get to send the request to one
server but the others will be over their configured limits/unavailable, we need to decide whether we (even can) go ahead with sending the request
I think that, at least for the first release, it's an exercise for the person setting things up to make sure things are reasonably well matched and we shouldn't worry about fixing that situation- perhaps just detecting and flagging it in the log.
-Matt
-- You are receiving this mail because: You reported the issue.
https://bugs.openldap.org/show_bug.cgi?id=9652
--- Comment #3 from Ondřej Kuzník ondra@mistotebe.net --- On Tue, Aug 31, 2021 at 11:17:17PM +0000, openldap-its@openldap.org wrote:
I'll take a stab at this.
On Aug 31, 2021, at 9:25 AM, openldap-its@openldap.org wrote:
First thoughts:
Assuming all backends will react identically (do we need to serialise *all* operations to do that?), there are two approaches to this:
- we send the operation to the first backend, when it's processed, second, etc.
That would mean that each incoming (upstream) operation takes as long to complete as the sum of all of the downstream operations, adding in timeouts for unreachable/failing downstream nodes. I don't see how that could be very useful.
Fair enough. We might still need to serialise operations to make sure conflicting (add/add conflicts, ...) requests are run in the same order everywhere?
(what happens if the client drops dead or abandons the request?)
I think it goes without saying that if the client sends an abandon request it's forwarded to each of the downstream servers and whatever, if anything, comes back from the primary server is sent back to the client.
Yes, this was mostly about the sequential case where the other operations have not been issued yet, not the simultaneous one.
- we send the operation to all backends in parallel
This seems to be more useful because an upstream operation only takes as long as the longest downstream operation takes to complete.
Irrespective of these, when we send the response back can vary too, assuming the first configured backend is the one we care about:
- forward the response as soon as we have the response from the first configured backend
- wait until all backends have finished with the request
I think that it's simpler to hold any responses until all backends have finished with the request. If the other case becomes interesting in practice, add it as a feature.
I agree that from the client's perspective it makes most sense, however we'd need to buffer the response at the junction point. That might get tricky but we'll have to reshape things so they do cope I guess.
Obvious limitations:
- operations that change the state of the connection, especially where the server is in charge of how that happens, are not going to work (SASL multi-step binds, TXNs, paging, etc. come to mind)
I don't see why that should be the case. Explain?
- in SASL binds, each server is likely to issue a different challenge, we have no way of transforming client's response to suit - similarly in other cases, the backends generate their cookies, this is high enough on the LDAP layer that we stand no chance of massaging those with just PDU reframing (lloadd's approach)
- different server capacity could mean we get to send the request to one server but the others will be over their configured limits/unavailable, we need to decide whether we (even can) go ahead with sending the request
I think that, at least for the first release, it's an exercise for the person setting things up to make sure things are reasonably well matched and we shouldn't worry about fixing that situation- perhaps just detecting and flagging it in the log.
Maybe, but this *will* come up in practice so we should keep it in mind and explain what that means in practice. Just like the fact we have to buffer upstream responses until clients are ready to receive them (leading to unbounded memory footprint) since there is no way to do per-operation pacing in LDAP - something that's proven really hard to explain without people understanding the principal design first.
https://bugs.openldap.org/show_bug.cgi?id=9652
Ondřej Kuzník ondra@mistotebe.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement
https://bugs.openldap.org/show_bug.cgi?id=9652
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Add "tee" capability to |Fanout frontend overlay for |load balancer |slapd Component|lloadd |overlays