Full_Name: Quanah Gibson-Mount
Version: 2.3.37
OS: Linux 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (38.104.138.6)
Several customers have found delta-syncrepl will lock up after a time. Today it
occurred again, and this time some logging data was able to be gathered. The
last operation logged was a MOD op. This matches past lockups, which were also
either MOD or ADD operations.
The following files will be uploaded to the ftp site, where # will be the
assigned ITS number.
#-dbstat.delta.out.2007-10-01
which is the db_stat information for the accesslog DB
#-db_stat.out.2007-10-01
which is the db_stat information for the main DB
#-pstak.out.2007-10-01
which is the pstack information for the slapd process
Unfortunately no GDB info was retrieved this time, and reportedly gcore hung.
The most interesting part is I see no WRITE locks held in either DB, and all the
client threads are hung in a mutex.
jclarke(a)linagora.com wrote:
> I have tested this on both 2.3.38 and HEAD (same version on all 3 servers), and
> behaviour is quite different, though the end result is the same.
Due to multiple limitations, this is not expected to work in 2.3 at all.
> On HEAD, things are quite different:
> request done: ld 0x82c2228 msgid 2
> do_syncrep2: rid=7 LDAP_RES_SEARCH_RESULT
> nonpresent_callback: rid=7 got UUID b5797e8c-0486-102c-83e0-79137da6179f, dn
> dc=ossa,dc=linagora,dc=org
> nonpresent_callback: rid=7 got UUID b5a2f834-0486-102c-83e1-79137da6179f, dn
> uid=root,dc=ossa,dc=linagora,dc=org
> nonpresent_callback: rid=7 got UUID b5a31526-0486-102c-83e2-79137da6179f, dn
> uid=replicator,dc=ossa,dc=linagora,dc=org
> adresse de be_modify : 80c8840
> null_callback : error code 0x32
> syncrepl_updateCookie: rid=7 be_modify failed (50)
This is Insufficient Access. You have not configured the glue databases with
identical rootDN's, which is clearly documented as a requirement for glued databases.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Full_Name: Jonathan Clarke
Version: 2.3.38 and HEAD (but with slightly different results)
OS: Linux
URL: ftp://ftp.openldap.org/incoming/unwanted-deletes-syncrepl-glue.tar.gz
Submission from: (NULL) (213.41.243.192)
Hi folks,
I've come across an issue with a server using the glue overlay with one of it's
subordinate databases syncrepl'd. There are two problems:
1) updating the root database's contextCSN
2) when replicating this whole server with syncrepl (to a 3rd server), certain
updates cause many entries to be deleted from the consumer.
The following "schema" should describe this setup more clearly (names TOP,
MIDDLE and BOTTOM are for easy reference):
TOP:
|----------------------------|
| One bdb backend: |
| dc=ossa,dc=linagora,dc=org |
|----------------------------|
|
MIDDLE: |
|-------------------------------|
| Two bdb backends + glue: |
| 1) dc=ossa,dc=linagora,dc=org |
| subordinate |
| syncrepl from above server |
| 2) dc=linagora,dc=org |
| 'master' for this branch |
| |
| syncprov overlay |
|-------------------------------|
|
BOTTOM: |
|-------------------------------|
| One bdb backend: |
| dc=linagora,dc=org |
| syncrepl from above server |
|-------------------------------|
All config files, and some sample data sets, are in the archive at the URL
above.
I have tested this on both 2.3.38 and HEAD (same version on all 3 servers), and
behaviour is quite different, though the end result is the same.
On 2.3.38:
1) Set up all three servers, make sure they're sync'ed.
2) Modify some attribute on the TOP server (I add a description to the root DN,
dc=ossa,dc=linagora,dc=org)
3) Watch this modification propagate to the middle server. the contextCSN in
dc=linagora,dc=org is not updated, but the one in dc=ossa,dc=linagora,dc=org is
(equals to the entryCSN of the entry I modified). The output is the following
with loglevel=stats+sync:
8>---------------------------------------------------------------
request done: ld 0x822ec20 msgid 1
do_syncrep2: rid 007 LDAP_RES_INTERMEDIATE - SYNC_ID_SET
syncrepl_entry: rid 007 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid 007 be_search (0)
syncrepl_entry: rid 007 dc=ossa,dc=linagora,dc=org
syncrepl_entry: rid 007 be_modify (0)
request done: ld 0x822ec20 msgid 2
do_syncrep2: rid 007 LDAP_RES_SEARCH_RESULT
8>---------------------------------------------------------------
4) Watch the BOTTOM server (see schema above) do it's syncrepl and delete some
entries below the glued database (glued on MIDDLE server, not on this one). The
output is the following with loglevel=stats+sync:
8>---------------------------------------------------------------
request done: ld 0x822a320 msgid 1
request done: ld 0x822a320 msgid 2
do_syncrep2: rid 888 LDAP_RES_SEARCH_RESULT
request done: ld 0x822a320 msgid 1
do_syncrep2: rid 888 LDAP_RES_INTERMEDIATE - SYNC_ID_SET
syncrepl_entry: rid 888 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid 888 be_search (0)
syncrepl_entry: rid 888 dc=linagora,dc=org
syncrepl_entry: rid 888 be_modify (0)
syncrepl_entry: rid 888 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid 888 be_search (0)
syncrepl_entry: rid 888 dc=ossa,dc=linagora,dc=org
syncrepl_entry: rid 888 be_modify (0)
request done: ld 0x822a320 msgid 2
do_syncrep2: rid 888 LDAP_RES_SEARCH_RESULT
syncrepl_del_nonpresent: rid 888 be_delete
uid=replicator,dc=ossa,dc=linagora,dc=org (0)
syncrepl_del_nonpresent: rid 888 be_delete uid=root,dc=ossa,dc=linagora,dc=org
(0)
8>---------------------------------------------------------------
On HEAD, things are quite different:
1) Start the TOP server.
2) Start the MIDDLE server. Errors happen immediatly, on first sync attempt. The
output is the following with loglevel=stats+sync:
8>---------------------------------------------------------------
slapd starting
request done: ld 0x82c2228 msgid 1
syncrepl_entry: rid=7 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid=7 inserted UUID b5797e8c-0486-102c-83e0-79137da6179f
syncrepl_entry: rid=7 be_search (32)
syncrepl_entry: rid=7 dc=ossa,dc=linagora,dc=org
syncrepl_entry: rid=7 be_add (0)
syncrepl_entry: rid=7 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid=7 inserted UUID b5a2f834-0486-102c-83e1-79137da6179f
syncrepl_entry: rid=7 be_search (0)
syncrepl_entry: rid=7 uid=root,dc=ossa,dc=linagora,dc=org
syncrepl_entry: rid=7 be_add (0)
syncrepl_entry: rid=7 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid=7 inserted UUID b5a31526-0486-102c-83e2-79137da6179f
syncrepl_entry: rid=7 be_search (0)
syncrepl_entry: rid=7 uid=replicator,dc=ossa,dc=linagora,dc=org
syncrepl_entry: rid=7 be_add (0)
request done: ld 0x82c2228 msgid 2
do_syncrep2: rid=7 LDAP_RES_SEARCH_RESULT
nonpresent_callback: rid=7 got UUID b5797e8c-0486-102c-83e0-79137da6179f, dn
dc=ossa,dc=linagora,dc=org
nonpresent_callback: rid=7 got UUID b5a2f834-0486-102c-83e1-79137da6179f, dn
uid=root,dc=ossa,dc=linagora,dc=org
nonpresent_callback: rid=7 got UUID b5a31526-0486-102c-83e2-79137da6179f, dn
uid=replicator,dc=ossa,dc=linagora,dc=org
adresse de be_modify : 80c8840
null_callback : error code 0x32
syncrepl_updateCookie: rid=7 be_modify failed (50)
request done: ld 0x82c2228 msgid 1
syncrepl_entry: rid=7 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid=7 inserted UUID b5797e8c-0486-102c-83e0-79137da6179f
dn_callback : entries have identical CSN dc=ossa,dc=linagora,dc=org ours
20071001162546.703481Z#000000#000#000000, new
20071001162546.703481Z#000000#000#000000
syncrepl_entry: rid=7 be_search (0)
syncrepl_entry: rid=7 dc=ossa,dc=linagora,dc=org
syncrepl_entry: rid=7 entry unchanged, ignored (dc=ossa,dc=linagora,dc=org)
syncrepl_entry: rid=7 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid=7 inserted UUID b5a2f834-0486-102c-83e1-79137da6179f
dn_callback : entries have identical CSN uid=root,dc=ossa,dc=linagora,dc=org
ours 20071001162546.975377Z#000000#000#000000, new
20071001162546.975377Z#000000#000#000000
syncrepl_entry: rid=7 be_search (0)
syncrepl_entry: rid=7 uid=root,dc=ossa,dc=linagora,dc=org
syncrepl_entry: rid=7 entry unchanged, ignored
(uid=root,dc=ossa,dc=linagora,dc=org)
syncrepl_entry: rid=7 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid=7 inserted UUID b5a31526-0486-102c-83e2-79137da6179f
dn_callback : entries have identical CSN
uid=replicator,dc=ossa,dc=linagora,dc=org ours
20071001162546.976133Z#000000#000#000000, new
20071001162546.976133Z#000000#000#000000
syncrepl_entry: rid=7 be_search (0)
syncrepl_entry: rid=7 uid=replicator,dc=ossa,dc=linagora,dc=org
syncrepl_entry: rid=7 entry unchanged, ignored
(uid=replicator,dc=ossa,dc=linagora,dc=org)
8>---------------------------------------------------------------
Obviously, the desired result is that entries are not deleted from the BOTTOM
server when replication happens. I'm a bit at a loss as to the logic behind
these updates, and how to go about correcting.
I tried applying a patch backported from HEAD to 2.3.38 that makes syncrepl
update the contextCSN in the real root (not the bdb database root). It works in
that the contextCSN is updated correctly, but replication to BOTTOM still has
unwanted deletes. The patch is in the archive attached
(update-root-contextCSN.diff) and corresponds to revisions 1.308 and 1.309 in
syncrepl.c CVS log.
I am completely available to provide any more information necessary: logs,
testing, gdb output, etc. Any help or pointers most welcome!
Thanks in advance,
Jon