Re: (ITS#6641) Syncrepl failure with 'overlay unique' - openldap-bugs

7 Sep 2010


      On Tue, Sep 07, 2010 at 05:09:07AM -0700, Howard Chu wrote:
...
...
We've talked about doing this isolation in the first refresh upon slapd
startup. That might still be a good idea.
It would certainly help to keep the apparent promises made by things
like the uniqueness overlay. Alternatively you could take the view
that the data will converge eventually and that is all that the LDAP
standards promise.
...
But that reminds me of the joys of being a Sun sysadmin back in the 1980s, 
when Sun's boot scripts always started their NFS client before starting 
their NFS server. If two machines cross-mounted each other's filesystems 
and both were booting at the same time they would hang, each waiting for 
the other's NFS server to respond to their mount request.
I remember that - one of the many reasons for switching to
automounters (along with their own set of problems)... The alternative
was 'soft' mounts, which may be a better model for solving the mirrormode
problem.
...
Mirrormode and multimaster bootstrapping becomes a lot harder if you 
implement this type of isolation during startup refresh.
I was originally going to suggest that servers should not listen for
connections until the first refresh completes, but that would indeed
cause the deadlock you describe. How about having master servers
listen on an extra port which is used purely for replication and *is*
available immediately? The main LDAP port would thus remain closed
until the server is synced-up, making it much easier for
load-balancers to do the right thing. [This should really be a
separate discussion as it is not directly related to the bug.]
...
...
Doing it on every refresh seems far more problematic, because without some
type of multi-version concurrency control, that means making the server
non-responsive until the refresh completes.
That may not be a problem with refresh-and-persist, as in normal
circumstances I would expect updates to arrive at the consumer in the
same order they hit the supplier (so this bug could not trigger). More
difficult for scheduled refresh mode though. Could the consumer server
simply write-lock every entry involved in the refresh while it processes
the list, and then commit the whole lot in one DB transaction?
Andrew
-- 
-----------------------------------------------------------------------
|                 From Andrew Findlay, Skills 1st Ltd                 |
| Consultant in large-scale systems, networks, and directory services |
|     http://www.skills-1st.co.uk/                +44 1628 782565     |
-----------------------------------------------------------------------