Hello,
I am having a very odd problem after upgrading from openldap 2.4.16 (stable)
I have a syncrepl provider/ consumer setup using openldap 2.4.19 (stable) and when i start an empty consumer, in the provider logs i am getting:
Nov 4 17:07:51 producer slapd[7250]: [ID 702911 local4.debug] @(#) $OpenLDAP: slapd 2.4.19 (Nov 4 2009 12:53:47) $ Nov 4 17:07:51 producer @qgdevpro:/home/govops/build.local/openldap-2.4.19/servers/slapd Nov 4 17:07:51 producer slapd[7286]: [ID 100111 local4.debug] slapd starting Nov 4 17:08:04 producer slapd[7286]: [ID 848112 local4.debug] conn=0 fd=16 ACCEPT from IP=10.0.0.2:53951 (IP=10.0.0.1:389) Nov 4 17:08:04 producer slapd[7286]: [ID 215403 local4.debug] conn=0 op=0 BIND dn="cn=replicator,dc=example,dc=org" method=128 Nov 4 17:08:04 producer slapd[7286]: [ID 600343 local4.debug] conn=0 op=0 BIND dn="cn=replicator,dc=example,dc=org" mech=SIMPLE ssf=0 Nov 4 17:08:04 producer slapd[7286]: [ID 588225 local4.debug] conn=0 op=0 RESULT tag=97 err=0 text= Nov 4 17:08:04 producer slapd[7286]: [ID 469902 local4.debug] conn=0 op=1 SRCH base="dc=example,dc=org" scope=2 deref=0 filter="(objectClass=*)" Nov 4 17:08:04 producer slapd[7286]: [ID 744844 local4.debug] conn=0 op=1 SRCH attr=* + Nov 4 17:08:04 producer slapd[7286]: [ID 832699 local4.debug] conn=0 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text= Nov 4 17:08:04 producer slapd[7286]: [ID 218904 local4.debug] conn=0 op=2 UNBIND Nov 4 17:08:04 producer slapd[7286]: [ID 952275 local4.debug] conn=0 fd=16 closed
on the consumer i get a lot of (one set after each refresh attempt) :
Nov 4 17:41:04 consumer slapd[7660]: [ID 365351 local4.debug] do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT Nov 4 17:41:04 consumer slapd[7660]: [ID 664938 local4.debug] do_syncrepl: rid=001 rc -2 retrying
Important part being "nentries=0", i run the equivalent command at the command propmt of the consumer, ie:
ldapsearch -b dc=example,dc=org -D 'cn=replicator,dc=example,dc=org' -w <password> -s sub -x '(objectclass=*) ' '* +'
I get the result i would expect above, ie:
Nov 4 17:20:14 producer slapd[7286]: [ID 848112 local4.debug] conn=16 fd=16 ACCEPT from IP=10.0.0.2:54049 (IP=10.0.0.1:389) Nov 4 17:20:14 producer slapd[7286]: [ID 215403 local4.debug] conn=16 op=0 BIND dn="cn=replicator,dc=example,dc=org" method=128 Nov 4 17:20:14 producer slapd[7286]: [ID 600343 local4.debug] conn=16 op=0 BIND dn="cn=replicator,dc=example,dc=org" mech=SIMPLE ssf=0 Nov 4 17:20:14 producer slapd[7286]: [ID 588225 local4.debug] conn=16 op=0 RESULT tag=97 err=0 text= Nov 4 17:20:14 producer slapd[7286]: [ID 469902 local4.debug] conn=16 op=1 SRCH base="dc=example,dc=org" scope=2 deref=0 filter="(objectClass=*)" Nov 4 17:20:14 producer slapd[7286]: [ID 744844 local4.debug] conn=16 op=1 SRCH attr=* + Nov 4 17:21:03 producer slapd[7286]: [ID 832699 local4.debug] conn=16 op=1 SEARCH RESULT tag=101 err=0 nentries=85611 text= Nov 4 17:21:03 producer slapd[7286]: [ID 218904 local4.debug] conn=16 op=2 UNBIND Nov 4 17:21:03 producer slapd[7286]: [ID 952275 local4.debug] conn=16 fd=16 closed
Note here i get nentries=85611 (with a phole bunch of results) for what is essentialy the same query.
I'd appreciate any feedback, surely i must be missing something really obvious?
My config is below.
Cheers Brett
<< begin of provider slapd >>
###################################################################### # global options ######################################################################
include /usr/local/openldap/etc/openldap/schema/core.schema include /usr/local/openldap/etc/openldap/schema/cosine.schema include /usr/local/openldap/etc/openldap/schema/inetorgperson.schema
modulepath /usr/local/openldap/libexec/openldap #moduleload back_ldbm.la #moduleload back_monitor.la
pidfile /var/openldap/run/slapd.pid argsfile /var/openldap/run/slapd.args
# threads for faster concurrent slapadd tool-threads 4
###################################################################### # global database ACLs ######################################################################
# allow replicator to read all access to * by dn.exact="cn=replicator,dc=example,dc=org" read by * break
[ ..etc.. ]
# default rules access to * by self write by * read
###################################################################### # logging configuration ######################################################################
# testing loglevel stats sync
###################################################################### # primary database ######################################################################
database hdb suffix "dc=example,dc=org"
directory /var/openldap/data rootdn "cn=Manager, dc=example,dc=org" rootpw <password>
checkpoint 2000 15 cachesize 20000 idlcachesize 60000 cachefree 4000
# unlimited dn cache (openldap 2.4.16 and above) dncachesize 0
# General Indexes (there is more than this - but they are all the same form) index default pres,eq index objectClass,uid,mail pres,eq index cn,sn,ou,streetAddress,givenName,title,telephoneNumber eq,sub
# Indices for Syncrepl index entryCSN,entryUUID eq
# allow replicator DN have unlimited searches (per-database) limits dn.exact="cn=replicator,dc=example,dc=org" time=unlimited size=unlimited
###################################################################### # replication information - monitor backend ######################################################################
database monitor
<< end of provider slapd >>
<< below snipit added to above on the consumer only, just before "database monitor", but after the rest of the config >>
###################################################################### # replication information - only for consumer ######################################################################
# Where we pull data from syncrepl rid=001 provider=ldap://provider.example.org:389 bindmethod=simple binddn="cn=replicator,dc=example,dc=org" credentials=<password> searchbase="dc=example,dc=org" filter=(objectclass=*) attrs="*,+" schemachecking=off scope=sub type=refreshAndPersist retry="60 +"
# not using accesslog atm - debugging initial refresh # logbase="cn=accesslog" # logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" # syncdata=accesslog
# Refer all rights to master updateref ldap://provider.example.org:389
On Wed, Nov 4, 2009 at 5:48 PM, Brett @Google brett.maxfield@gmail.comwrote:
Hello,
I am having a very odd problem after upgrading from openldap 2.4.16 (stable)
In the interests of full disclosure, I just did a slapcat / slapadd and the problem has gone away.
I'm not sure is the problem was created because of starting a 2.4.19 slapd, over the top of a database loaded by 2.4.16, or due to the obscure bug, occasionally mentioned, where syncrepl can gets itself into a odd state and wont converge.
The syncrepl state information on the producer would have been created with the slapadd -w option under 2.4.16, but executed under 2.4.19. I thought originally because the berkeleydb version (4.8) had not changed, this would be ok.
Out of interest, has the syncrepl UUID / CSN format changed much between 2.4.16 stable and 2.4.19 stable ?
Cheers Brett
Brett @Google wrote:
On Wed, Nov 4, 2009 at 5:48 PM, Brett @Google <brett.maxfield@gmail.com mailto:brett.maxfield@gmail.com> wrote:
Hello, I am having a very odd problem after upgrading from openldap 2.4.16 (stable)
In the interests of full disclosure, I just did a slapcat / slapadd and the problem has gone away.
I'm not sure is the problem was created because of starting a 2.4.19 slapd, over the top of a database loaded by 2.4.16, or due to the obscure bug, occasionally mentioned, where syncrepl can gets itself into a odd state and wont converge.
The syncrepl state information on the producer would have been created with the slapadd -w option under 2.4.16, but executed under 2.4.19. I thought originally because the berkeleydb version (4.8) had not changed, this would be ok.
Out of interest, has the syncrepl UUID / CSN format changed much between 2.4.16 stable and 2.4.19 stable ?
There have been no format changes. You should have been able to run 2.4.19 directly on the original database. No idea what issue you ran into.
On Thu, Nov 5, 2009 at 4:20 AM, Howard Chu hyc@symas.com wrote:
Out of interest, has the syncrepl UUID / CSN format changed much between 2.4.16 stable and 2.4.19 stable ?
There have been no format changes. You should have been able to run 2.4.19 directly on the original database. No idea what issue you ran into.
The problem came back, but I think i found the root cause. We have a batch script which starts slapd, it has always been the same on both the producers and consumers. The "-w" slapadd option was being provided in this script, for both the consumer and the producer. This has been the case for quite some time (2.3.x and 2.4.16), but it only recently became an issue after the upgrade to 2.4.19. The sync problem would then appear or disappear randomly, depending of which of the provider or consumer was loaded first.
Is it patently wrong to provide the "-w" option on the consumer, or is this a bug ?
At the very least it is probably a usability "gotcha", if not a bug.
If this is an issue perhaps slapadd should refuse to do "-w" on a shadow context, or warn and not actually perform the option ?
Cheers Brett
Brett @Google wrote:
On Thu, Nov 5, 2009 at 4:20 AM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:
> Out of interest, has the syncrepl UUID / CSN format changed much between > 2.4.16 stable and 2.4.19 stable ? There have been no format changes. You should have been able to run 2.4.19 directly on the original database. No idea what issue you ran into.
The problem came back, but I think i found the root cause. We have a batch script which starts slapd, it has always been the same on both the producers and consumers. The "-w" slapadd option was being provided in this script, for both the consumer and the producer. This has been the case for quite some time (2.3.x and 2.4.16), but it only recently became an issue after the upgrade to 2.4.19. The sync problem would then appear or disappear randomly, depending of which of the provider or consumer was loaded first.
Is it patently wrong to provide the "-w" option on the consumer, or is this a bug ?
Yes, it's generally wrong to use the -w option.
slapadd is intended for use with LDIF that was produced with slapcat. When you are slapcat'ing a database that has already been in use as a sync provider, then there should already be a valid contextCSN in the database and it will already be present in the LDIF. Once you have this LDIF from slapcat you should be able to load it on any other server without using -w.
The only correct usage of -w is when you have an LDIF produced by slapcat on a database that was not being used with replication before, and so is missing the contextCSN. But all the other operational attributes (entryUUID and entryCSN in particular) are present and valid.
If you have an LDIF file that contains no operational attributes, it is very likely that you ought to load it using ldapadd. If your LDIF file did not come from a known good slapd server, then most likely it needs all of the additional validation checks that are performed through ldapadd.
Note that slapadd was changed in 2.4.19 to warn if you're loading a replica and entryCSN or entryUUID are missing.
At the very least it is probably a usability "gotcha", if not a bug.
If this is an issue perhaps slapadd should refuse to do "-w" on a shadow context, or warn and not actually perform the option ?
If the contextCSN was already present, and actually matches the max entryCSN present in the LDIF, then the write that -w performs should be effectively a no-op. If there were no entryCSNs in the LDIF then the value written by -w will just depend on the current time of day when the last entry got processed by slapadd.
On 04/11/2009 08:48, Brett @Google wrote:
Hello,
I am having a very odd problem after upgrading from openldap 2.4.16 (stable)
I have a syncrepl provider/ consumer setup using openldap 2.4.19 (stable) and when i start an empty consumer, in the provider logs i am getting:
Nov 4 17:07:51 producer slapd[7250]: [ID 702911 local4.debug] @(#) $OpenLDAP: slapd 2.4.19 (Nov 4 2009 12:53:47) $ Nov 4 17:07:51 producer @qgdevpro:/home/govops/build.local/openldap-2.4.19/servers/slapd Nov 4 17:07:51 producer slapd[7286]: [ID 100111 local4.debug] slapd starting Nov 4 17:08:04 producer slapd[7286]: [ID 848112 local4.debug] conn=0 fd=16 ACCEPT from IP=10.0.0.2:53951 http://10.0.0.2:53951 (IP=10.0.0.1:389 http://10.0.0.1:389) Nov 4 17:08:04 producer slapd[7286]: [ID 215403 local4.debug] conn=0 op=0 BIND dn="cn=replicator,dc=example,dc=org" method=128 Nov 4 17:08:04 producer slapd[7286]: [ID 600343 local4.debug] conn=0 op=0 BIND dn="cn=replicator,dc=example,dc=org" mech=SIMPLE ssf=0 Nov 4 17:08:04 producer slapd[7286]: [ID 588225 local4.debug] conn=0 op=0 RESULT tag=97 err=0 text= Nov 4 17:08:04 producer slapd[7286]: [ID 469902 local4.debug] conn=0 op=1 SRCH base="dc=example,dc=org" scope=2 deref=0 filter="(objectClass=*)" Nov 4 17:08:04 producer slapd[7286]: [ID 744844 local4.debug] conn=0 op=1 SRCH attr=* + Nov 4 17:08:04 producer slapd[7286]: [ID 832699 local4.debug] conn=0 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text= Nov 4 17:08:04 producer slapd[7286]: [ID 218904 local4.debug] conn=0 op=2 UNBIND Nov 4 17:08:04 producer slapd[7286]: [ID 952275 local4.debug] conn=0 fd=16 closed
on the consumer i get a lot of (one set after each refresh attempt) :
Nov 4 17:41:04 consumer slapd[7660]: [ID 365351 local4.debug] do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT Nov 4 17:41:04 consumer slapd[7660]: [ID 664938 local4.debug] do_syncrepl: rid=001 rc -2 retrying
Important part being "nentries=0", i run the equivalent command at the command propmt of the consumer, ie:
ldapsearch -b dc=example,dc=org -D 'cn=replicator,dc=example,dc=org' -w <password> -s sub -x '(objectclass=*) ' '* +'
I get the result i would expect above, ie:
Nov 4 17:20:14 producer slapd[7286]: [ID 848112 local4.debug] conn=16 fd=16 ACCEPT from IP=10.0.0.2:54049 http://10.0.0.2:54049 (IP=10.0.0.1:389 http://10.0.0.1:389) Nov 4 17:20:14 producer slapd[7286]: [ID 215403 local4.debug] conn=16 op=0 BIND dn="cn=replicator,dc=example,dc=org" method=128 Nov 4 17:20:14 producer slapd[7286]: [ID 600343 local4.debug] conn=16 op=0 BIND dn="cn=replicator,dc=example,dc=org" mech=SIMPLE ssf=0 Nov 4 17:20:14 producer slapd[7286]: [ID 588225 local4.debug] conn=16 op=0 RESULT tag=97 err=0 text= Nov 4 17:20:14 producer slapd[7286]: [ID 469902 local4.debug] conn=16 op=1 SRCH base="dc=example,dc=org" scope=2 deref=0 filter="(objectClass=*)" Nov 4 17:20:14 producer slapd[7286]: [ID 744844 local4.debug] conn=16 op=1 SRCH attr=* + Nov 4 17:21:03 producer slapd[7286]: [ID 832699 local4.debug] conn=16 op=1 SEARCH RESULT tag=101 err=0 nentries=85611 text= Nov 4 17:21:03 producer slapd[7286]: [ID 218904 local4.debug] conn=16 op=2 UNBIND Nov 4 17:21:03 producer slapd[7286]: [ID 952275 local4.debug] conn=16 fd=16 closed
Note here i get nentries=85611 (with a phole bunch of results) for what is essentialy the same query.
I'd appreciate any feedback, surely i must be missing something really obvious?
My config is below.
Cheers Brett
<< begin of provider slapd >>
###################################################################### # global options ######################################################################
include /usr/local/openldap/etc/openldap/schema/core.schema include /usr/local/openldap/etc/openldap/schema/cosine.schema include /usr/local/openldap/etc/openldap/schema/inetorgperson.schema
modulepath /usr/local/openldap/libexec/openldap #moduleload back_ldbm.la http://back_ldbm.la #moduleload back_monitor.la http://back_monitor.la
pidfile /var/openldap/run/slapd.pid argsfile /var/openldap/run/slapd.args
# threads for faster concurrent slapadd tool-threads 4
###################################################################### # global database ACLs ######################################################################
# allow replicator to read all access to * by dn.exact="cn=replicator,dc=example,dc=org" read by * break
[ ..etc.. ]
# default rules access to * by self write by * read
###################################################################### # logging configuration ######################################################################
# testing loglevel stats sync
###################################################################### # primary database ######################################################################
database hdb suffix "dc=example,dc=org"
directory /var/openldap/data rootdn "cn=Manager, dc=example,dc=org" rootpw <password>
checkpoint 2000 15 cachesize 20000 idlcachesize 60000 cachefree 4000
# unlimited dn cache (openldap 2.4.16 and above) dncachesize 0
# General Indexes (there is more than this - but they are all the same form) index default pres,eq index objectClass,uid,mail pres,eq index cn,sn,ou,streetAddress,givenName,title,telephoneNumber eq,sub
# Indices for Syncrepl index entryCSN,entryUUID eq
# allow replicator DN have unlimited searches (per-database) limits dn.exact="cn=replicator,dc=example,dc=org" time=unlimited size=unlimited
###################################################################### # replication information - monitor backend ######################################################################
database monitor
<< end of provider slapd >>
You don't seem to have "overlay syncprov" on your provider.
Regards, Jonathan
openldap-software@openldap.org