So, while we can fully replicate cn=config for the case where all syncrepl participants are masters/peers, things are still a bit sticky if we only want the replicas to remain in slave mode.
Instead of going thru complicated mapping/virtual directory redirections, it seems to me that all we need is to tag certain config objects with the serverIDs to which they apply. As such, I'm considering adding an olcServerMatch attribute to the olcDatabaseConfig and olcOverlayConfig objectclasses. This would take a regexp to match against the current server ID; if the pattern matches then the config entry is processed otherwise it is ignored. This attribute would be absent/empty by default, making the entry always enabled.
Likewise it may be useful to add a boolean olcDisabled attribute to these classes, to allow databases and overlays to be switched on and off without needing to delete them. Again, it would default to absent/FALSE...
We'd also want these controls for syncrepl stanzas. (Too bad the patch to turn syncrepl into an overlay was never committed....)
For example, we may have a cluster of servers in MMR with a pool of other servers operating as slaves. We'd want the syncprov overlay active on all of the masters, the syncrepl consumer active on all of the servers, and the chain overlay active on all of the slaves. Setting olcServerMatch on the syncprov and chain overlays would allow things to behave as desired, without needing to create a parallel config tree just for the slaves.
Comments?
--On Tuesday, May 26, 2009 8:35 PM -0700 Howard Chu hyc@symas.com wrote:
For example, we may have a cluster of servers in MMR with a pool of other servers operating as slaves. We'd want the syncprov overlay active on all of the masters, the syncrepl consumer active on all of the servers, and the chain overlay active on all of the slaves. Setting olcServerMatch on the syncprov and chain overlays would allow things to behave as desired, without needing to create a parallel config tree just for the slaves.
Comments?
I like it. One thing we've needed to do in the past is drop the replication configuration portions of a master (taking it to single-server mode). This would allow that. I would note it may be common (I certainly do so) to run the syncprov overlay on the replica as well, at least in a glue'd environment.
It sounds like it gracefully solves the ability of keeping both master and replica configurations around for the most part. What still remains sticky is ACLs. There are plenty of valid reasons for the master to have very different ACLs than the replicas do.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
----- "Quanah Gibson-Mount" quanah@zimbra.com wrote:
--On Tuesday, May 26, 2009 8:35 PM -0700 Howard Chu hyc@symas.com wrote:
For example, we may have a cluster of servers in MMR with a pool of
other
servers operating as slaves. We'd want the syncprov overlay active
on all
of the masters, the syncrepl consumer active on all of the servers,
and
the chain overlay active on all of the slaves. Setting
olcServerMatch on
the syncprov and chain overlays would allow things to behave as
desired,
without needing to create a parallel config tree just for the
slaves.
Comments?
I like it. One thing we've needed to do in the past is drop the replication configuration portions of a master (taking it to single-server mode). This would allow that. I would note it may be common (I certainly do so) to run the syncprov overlay on the replica as well, at least in a glue'd environment.
It sounds like it gracefully solves the ability of keeping both master and replica configurations around for the most part. What still remains sticky is ACLs. There are plenty of valid reasons for the master to have very different ACLs than the replicas do.
I agree with Quanah too. Would these be a 2.5 features?
Gavin Henry wrote:
----- "Quanah Gibson-Mount"quanah@zimbra.com wrote:
--On Tuesday, May 26, 2009 8:35 PM -0700 Howard Chuhyc@symas.com wrote:
For example, we may have a cluster of servers in MMR with a pool of
other
servers operating as slaves. We'd want the syncprov overlay active
on all
of the masters, the syncrepl consumer active on all of the servers,
and
the chain overlay active on all of the slaves. Setting
olcServerMatch on
the syncprov and chain overlays would allow things to behave as
desired,
without needing to create a parallel config tree just for the
slaves.
Comments?
I like it. One thing we've needed to do in the past is drop the replication configuration portions of a master (taking it to single-server mode). This would allow that. I would note it may be common (I certainly do so) to run the syncprov overlay on the replica as well, at least in a glue'd environment.
It sounds like it gracefully solves the ability of keeping both master and replica configurations around for the most part. What still remains sticky is ACLs. There are plenty of valid reasons for the master to have very different ACLs than the replicas do.
Agreed, it's still not a perfect solution for ACLs and such. Although you could use servermatch on the database itself, and have two separate olcDatabase entries, one for the master and one for the slave. That would also allow you to use separate indexing on the master vs the slaves.
I agree with Quanah too. Would these be a 2.5 features?
If we can get it working right away, no, I'd release it into 2.4 ASAP since we have an immediate need for this ability. Of course, it may not be a candidate for 2.4.17 - we need to focus on stability there still. (Moving the syncrepl consumer into an overlay would not happen in 2.4...)
I ought to try syncrepl before talking too much, but anyway:
Quanah Gibson-Mount writes:
It sounds like it gracefully solves the ability of keeping both master and replica configurations around for the most part. What still remains sticky is ACLs. There are plenty of valid reasons for the master to have very different ACLs than the replicas do.
And things like <authz-policy> and <allow>. Security settings, if you run a master inside a well protected subnet and partial slaves on more open ones.
Master and slave can share some but not all databases, and you might want to replicate config of those they share - but this way the database numbers will differ. Might not share all related schema either.
<threads>, cache settings, <argsfile>, etc. if your servers run on different OSes. Which can be useful so that if an OS-specific problem hits one server, others are in no danger.
I'm sure there are good reasons for plenty of other things to differ, while config-replication could still be useful. Depends on how flexible partial config-replication is intended to be.
Hallvard B Furuseth wrote:
I ought to try syncrepl before talking too much, but anyway:
Quanah Gibson-Mount writes:
It sounds like it gracefully solves the ability of keeping both master and replica configurations around for the most part. What still remains sticky is ACLs. There are plenty of valid reasons for the master to have very different ACLs than the replicas do.
And things like<authz-policy> and<allow>. Security settings, if you run a master inside a well protected subnet and partial slaves on more open ones.
It would be a mistake to run any servers with lesser security settings than any other server, if all of them are sharing data.
Master and slave can share some but not all databases, and you might want to replicate config of those they share - but this way the database numbers will differ. Might not share all related schema either.
It would most likely be a mistake not to share schema. Of course, there's nothing preventing us from putting ServerMatch on schema entries too. And no, the database numbers would be the same - all of the database configs would be replicated, but some of them would be inactive on various servers.
<threads>, cache settings,<argsfile>, etc. if your servers run on different OSes. Which can be useful so that if an OS-specific problem hits one server, others are in no danger.
Threads, maybe. cache settings, perhaps. argsfile is just a command line parameter, so not relevant. Most of the sites we work with deploy identical hardware for their pools of servers. The most common form of load balancing is round-robin DNS, which is only "fair" if all of the servers are equivalent in their load handling capability.
I'm sure there are good reasons for plenty of other things to differ, while config-replication could still be useful. Depends on how flexible partial config-replication is intended to be.
Since there is currently no support at all, I think it's important to get something usable first, and worry about those other cases later.
Howard Chu wrote:
Since there is currently no support at all, I think it's important to get something usable first, and worry about those other cases later.
The other alternative, which is much simpler to implement, is just to add a suffixmassage/rewrite keyword to the syncrepl config, allowing it to pull from a particular remote base and map it to the local base. Then it's up to the sysadmin to create a complete cn=config hierarchy somewhere else on the master server and let the slaves pick it up. That would address all of the issues of differentiation, at the cost of a little bit of redundancy on the master.
Howard Chu wrote:
Howard Chu wrote:
Since there is currently no support at all, I think it's important to get something usable first, and worry about those other cases later.
The other alternative, which is much simpler to implement, is just to add a suffixmassage/rewrite keyword to the syncrepl config, allowing it to pull from a particular remote base and map it to the local base. Then it's up to the sysadmin to create a complete cn=config hierarchy somewhere else on the master server and let the slaves pick it up. That would address all of the issues of differentiation, at the cost of a little bit of redundancy on the master.
I'm not too fond of the proposed olcServerMatch, it appears to me to create a cluttered config database that requires you to match these attribute values to see the currently active configuration. Should it be added though, then I would prefer it to be defined as range(s) of serverIDs rather than a pattern to match. Regexp matching of integer ranges is always awkward..
So, I would definitely prefer this syncrepl solution, especially if syncrepl was also extended with a (list of?) locally evaluated ldap URLs that an entry must match (or not match?) for the syncrepl stanza to accept it. Entries received from the producer, or found in the local database upon startup, should be ignored by the syncrepl stanza unless it do (not) match any of these URLs.
This would allow a syncrepl stanza to maintain only parts of a local database, and one stanza could be used to pull in the specialized database configuration from one location, others to fetch common parts like the acl and schema configurations from other locations.
Hm, it might be sufficient with a syncrepl option that makes it match entries against its configured searchbase and filter, and ignore entries that fails to match. But not equally flexible.
Rein
Rein Tollevik writes:
(...) Regexp matching of integer ranges is always awkward..
As is regexp matching of any complexity.
Hm, it might be sufficient with a syncrepl option that makes it match entries against its configured searchbase and filter, and ignore entries that fails to match. But not equally flexible.
olcSyncrepl already has options searchbase, filter, scope, attrs.
Hallvard B Furuseth wrote:
Rein Tollevik writes:
Hm, it might be sufficient with a syncrepl option that makes it match entries against its configured searchbase and filter, and ignore entries that fails to match. But not equally flexible.
olcSyncrepl already has options searchbase, filter, scope, attrs.
Yes, but they are evaluated on the producer, not on the syncrepl consumer.
Rein
Rein Tollevik wrote:
Howard Chu wrote:
Howard Chu wrote:
Since there is currently no support at all, I think it's important to get something usable first, and worry about those other cases later.
The other alternative, which is much simpler to implement, is just to add a suffixmassage/rewrite keyword to the syncrepl config, allowing it to pull from a particular remote base and map it to the local base. Then it's up to the sysadmin to create a complete cn=config hierarchy somewhere else on the master server and let the slaves pick it up. That would address all of the issues of differentiation, at the cost of a little bit of redundancy on the master.
I'm not too fond of the proposed olcServerMatch, it appears to me to create a cluttered config database that requires you to match these attribute values to see the currently active configuration. Should it be added though, then I would prefer it to be defined as range(s) of serverIDs rather than a pattern to match. Regexp matching of integer ranges is always awkward..
Yes, I agree, it would make things cluttered and the complexity could easily get out of hand. In retrospect I'm not so fond of the idea.
So, I would definitely prefer this syncrepl solution, especially if syncrepl was also extended with a (list of?) locally evaluated ldap URLs that an entry must match (or not match?) for the syncrepl stanza to accept it. Entries received from the producer, or found in the local database upon startup, should be ignored by the syncrepl stanza unless it do (not) match any of these URLs.
This would allow a syncrepl stanza to maintain only parts of a local database, and one stanza could be used to pull in the specialized database configuration from one location, others to fetch common parts like the acl and schema configurations from other locations.
That sounds even more complex. I think you're also touching on ITS#5990, allowing a single entry to receive portions of its attributes from multiple providers. I'd like to add that feature at some point, but I don't think now is the right time. (Unless someone shows us a working patch already...)
Hm, it might be sufficient with a syncrepl option that makes it match entries against its configured searchbase and filter, and ignore entries that fails to match. But not equally flexible.
Howard Chu writes:
The other alternative, which is much simpler to implement, is just to add a suffixmassage/rewrite keyword to the syncrepl config, allowing it to pull from a particular remote base and map it to the local base. Then it's up to the sysadmin to create a complete cn=config hierarchy somewhere else on the master server and let the slaves pick it up. That would address all of the issues of differentiation, at the cost of a little bit of redundancy on the master.
Can translucent be made to combine with back-relay and rwm? Then the other database need only maintain differences from the main config, and there's less problem with keeping the two config databases in sync.
All this reminds me of another wish/gripe of mine, though not one to address in RE24: cn=config modifies input data and adds default values to the user-provided data. I'd much prefer it to only do ordinary attribute normalization.
I've suggested ;x-original attribute options or something to show what was really written, but a cleaner alternative would be to have two config databases: One database with just the data stored by the admin, and one read-only in-memory which back-config builds from the stored data. Changes to inherited data would be iffy though, since they'll apply to several databases and may need to be reverted in early databases if they fail in late ones.
Back to this discussion, that would also allow for variables and simple conditionals to be stored in the read-write database, which would be expanded when copied to the read-only config database. Also if back-config builds the real configuration from several entries through inheritance anyway, that might be expanded to build the config from multiple config trees - the main config + the server-specific config as above. No, definitely not for RE24:-)
Hallvard B Furuseth wrote:
Howard Chu writes:
The other alternative, which is much simpler to implement, is just to add a suffixmassage/rewrite keyword to the syncrepl config, allowing it to pull from a particular remote base and map it to the local base. Then it's up to the sysadmin to create a complete cn=config hierarchy somewhere else on the master server and let the slaves pick it up. That would address all of the issues of differentiation, at the cost of a little bit of redundancy on the master.
Can translucent be made to combine with back-relay and rwm? Then the other database need only maintain differences from the main config, and there's less problem with keeping the two config databases in sync.
Translucent is currently a bit too limited here; it only allows customizing of entries that exist in the master. It doesn't allow creating new entries that only exist in the local/translucent DB.
All this reminds me of another wish/gripe of mine, though not one to address in RE24: cn=config modifies input data and adds default values to the user-provided data. I'd much prefer it to only do ordinary attribute normalization.
I would have preferred not to generate the default values either, but as Ando frequently reminds me, relying on unstated defaults is error-prone...
I've suggested ;x-original attribute options or something to show what was really written, but a cleaner alternative would be to have two config databases: One database with just the data stored by the admin, and one read-only in-memory which back-config builds from the stored data. Changes to inherited data would be iffy though, since they'll apply to several databases and may need to be reverted in early databases if they fail in late ones.
Back to this discussion, that would also allow for variables and simple conditionals to be stored in the read-write database, which would be expanded when copied to the read-only config database. Also if back-config builds the real configuration from several entries through inheritance anyway, that might be expanded to build the config from multiple config trees - the main config + the server-specific config as above. No, definitely not for RE24:-)
Now that sounds like overkill... We also want something that we can easily document and explain to people....
Howard Chu writes:
Hallvard B Furuseth wrote:
Can translucent be made to combine with back-relay and rwm? Then the other database need only maintain differences from the main config, and there's less problem with keeping the two config databases in sync.
Translucent is currently a bit too limited here; it only allows customizing of entries that exist in the master. It doesn't allow creating new entries that only exist in the local/translucent DB.
Oh well. One could set up a cron job which merges data "by hand" then.
All this reminds me of another wish/gripe of mine, though not one to address in RE24: cn=config modifies input data and adds default values to the user-provided data. I'd much prefer it to only do ordinary attribute normalization.
I would have preferred not to generate the default values either, but as Ando frequently reminds me, relying on unstated defaults is error-prone...
Unstated in which regard? We've done fine with defaults in slapd.conf, except some defaults needed to be better documented.
Hmm... X-DEFAULT 'value' and X-INHERIT 'source database' extensions to config attribute descriptions would be nice. Spells out defaults more explicitly, instead of having them somewhere in C code.
Though it's a nice feature to be able to read the defaults from cn=config, which is why I suggested the dynamic read-only database with defaults and inherited attributes included. (A control to ask for them might be yet another way, with the X-* options above.)
One of the problems with _storing_ defaults is that the sensible value changes over time, like 'threads'. back-config can end up storing defaults that made sense last decade.
I've suggested ;x-original attribute options or something to show what was really written, but a cleaner alternative would be to have two config databases: One database with just the data stored by the admin, and one read-only in-memory which back-config builds from the stored data. Changes to inherited data would be iffy though, since they'll apply to several databases and may need to be reverted in early databases if they fail in late ones.
Back to this discussion, that would also allow for variables and simple conditionals to be stored in the read-write database, which would be expanded when copied to the read-only config database. Also if back-config builds the real configuration from several entries through inheritance anyway, that might be expanded to build the config from multiple config trees - the main config + the server-specific config as above. No, definitely not for RE24:-)
Now that sounds like overkill... We also want something that we can easily document and explain to people....
Seems simple enough to me except the "you don't need to know/use this" parts. Simpler than a "show defaults" control. And simpler than to figure out where a value comes from - the sysadmin, back-config default, or back-config cleverness like slapd's removing inherited defaults from stored limits.
Howard Chu writes:
Hallvard B Furuseth wrote:
[Rearranging a little]
I'm sure there are good reasons for plenty of other things to differ, while config-replication could still be useful. Depends on how flexible partial config-replication is intended to be.
Since there is currently no support at all, I think it's important to get something usable first, and worry about those other cases later.
Yes, I'm not suggesting to delay everything until something wonderfully flexible can be implemented all at once. Though if something flexible can be done just as easily, that's of course nice. Otherwise, just consider these cases to keep in mind. An inflexible design might be cumbersome to extend, except by making a new and independent feature. (Such as the cn=config + suffixmassage you just suggested:-)
Mostly I'm also not talking about scenarios which are relevant for our site today, though some might become relevant someday.
And things like<authz-policy> and<allow>. Security settings, if you run a master inside a well protected subnet and partial slaves on more open ones.
It would be a mistake to run any servers with lesser security settings than any other server, if all of them are sharing data.
Which they might not do.
But when they do: Machines can still have different physical protection, or may be set up specially, thus you may be able to trust entities on some machines you'd rather not trust on others. In particular trust with write/admin access on the master. E.g. authz-regexp mapping an ldapi:// uid/gid to an admin DN. Or, I expect, (a particular user on) a particular network address, when both peers are inside a safe subnet.
Master and slave can share some but not all databases, and you might want to replicate config of those they share - but this way the database numbers will differ. Might not share all related schema either.
It would most likely be a mistake not to share schema. Of course, there's nothing preventing us from putting ServerMatch on schema entries too. And no, the database numbers would be the same - all of the database configs would be replicated, but some of them would be inactive on various servers.
Unless you have different servers for different purposes, which share _some_ data - e.g. a database with user/group info. Though then it may be about time to give up the idea of replicating config. It might only be feasible to replicate the config of the shared database anyway.
But a simple case is slave = master + schema/data under development. An inactive database in the master solves the numbering, but I'd likely prefer to keep schema under development out of the master.
<threads>, cache settings,<argsfile>, etc. if your servers run on different OSes. Which can be useful so that if an OS-specific problem hits one server, others are in no danger.
Threads, maybe. cache settings, perhaps. argsfile is just a command line parameter, so not relevant.
Sorry, I meant pidfile. It can be OS-specific where to write it - e.g. /var/run/(openldap/)slapd.pid for RedHat Linux's /etc/rc scripts.
Most of the sites we work with deploy identical hardware for their pools of servers. The most common form of load balancing is round-robin DNS, which is only "fair" if all of the servers are equivalent in their load handling capability.
Yes, then the poorest server needs to be good enough for its task. Which might or might not be a problem.
In our case, failover is the primary reason for multiple servers, since I think a single server is currently good enough as far as load is concerned. We'd solve load problems by throwing more hardware at the problem. But then, this is not relevant for us currently anyway. We've considered multiple platform and not (yet) bothered, it's just on the nice-to-have list.
Hallvard B Furuseth wrote:
Unless you have different servers for different purposes, which share _some_ data - e.g. a database with user/group info. Though then it may be about time to give up the idea of replicating config. It might only be feasible to replicate the config of the shared database anyway.
Indeed. Most of the cases you're talking about are cases where it makes no sense to talk about shared config. Let's acknowledge that those cases exist, and are not the cases of interest here, and ignore them. If you want to have distinct settings on each server, then go manage them distinctly; there's nothing else to talk about there.
But if you want to have a number of servers with nearly identical layout, as I already described in the example in my first post, then replication of cn=config is attractive because we can leverage the uniformity and reduce administrative overhead. If there is no uniformity to leverage, then too bad.
Howard Chu writes:
Hallvard B Furuseth wrote:
Unless you have different servers for different purposes, which share _some_ data - e.g. a database with user/group info. Though then it may be about time to give up the idea of replicating config. It might only be feasible to replicate the config of the shared database anyway.
Indeed. Most of the cases you're talking about are cases where it makes no sense to talk about shared config.
I disagree with that, the configs can still be mostly identical. However:
Let's acknowledge that those cases exist, and are not the cases of interest here, and ignore them. If you want to have distinct settings on each server, then go manage them distinctly; there's nothing else to talk about there.
Absolutely. In particular since I'm not volunteering to implement it.
BTW, I can think of one other use of replicated config: Support. The site admin could do at least some config updates without having to log in on each server host. Your suffixmassage suggestion should be perfect for that.
Howard Chu wrote:
We'd also want these controls for syncrepl stanzas. (Too bad the patch to turn syncrepl into an overlay was never committed....)
Probably that was a good thing, instead, since that patch created a lot of overhead to allow slurpd logging as well (which was its original purpose, i.e. abstracting from the replication mechanism). It could be easily revitalized with only syncrepl in mind (and it would probably be beneficial to slapd anyway).
p.
Pierangelo Masarati wrote:
Howard Chu wrote:
We'd also want these controls for syncrepl stanzas. (Too bad the patch to turn syncrepl into an overlay was never committed....)
Probably that was a good thing, instead, since that patch created a lot of overhead to allow slurpd logging as well (which was its original purpose, i.e. abstracting from the replication mechanism). It could be easily revitalized with only syncrepl in mind (and it would probably be beneficial to slapd anyway).
We should consider this for 2.5, I think.
On 27.05.2009 05:35, Howard Chu wrote:
So, while we can fully replicate cn=config for the case where all syncrepl participants are masters/peers, things are still a bit sticky if we only want the replicas to remain in slave mode.
Instead of going thru complicated mapping/virtual directory redirections, it seems to me that all we need is to tag certain config objects with the serverIDs to which they apply. As such, I'm considering adding an olcServerMatch attribute to the olcDatabaseConfig and olcOverlayConfig objectclasses. This would take a regexp to match against the current server ID; if the pattern matches then the config entry is processed otherwise it is ignored. This attribute would be absent/empty by default, making the entry always enabled.
I like the sound of this :)
Just one comment : consider automating configuration updates (via an administration interface or a script) - it would be easier to automate should the olcServerMatch attribute accept multiple values, to just add a value per serverID we want to match, instead of munging a single regexp.
Clearly, this could lead to a long list of values for setups with many different serverIDs, but lots of setups don't go that far.
My $0.02.
Regards, Jonathan
Likewise it may be useful to add a boolean olcDisabled attribute to these classes, to allow databases and overlays to be switched on and off without needing to delete them. Again, it would default to absent/FALSE...
We'd also want these controls for syncrepl stanzas. (Too bad the patch to turn syncrepl into an overlay was never committed....)
For example, we may have a cluster of servers in MMR with a pool of other servers operating as slaves. We'd want the syncprov overlay active on all of the masters, the syncrepl consumer active on all of the servers, and the chain overlay active on all of the slaves. Setting olcServerMatch on the syncprov and chain overlays would allow things to behave as desired, without needing to create a parallel config tree just for the slaves.
Comments?