On Friday 27 June 2008 19:26:22 Liutauras Adomaitis wrote:
hello everybody,
I'm quite new to OpenLdap. Actually i've been using it for a few years, but I have no deep knowlege. The problem I'm facing is my cosumer replicas are segfaulting.
There were a number of fixes to syncrepl in 2.4.9.
My design: I have one master with several o=BranchX,dc=example,dc=com This is provider. I have several (the number is X-1) replicas, consumers. All consumers are replicating its branch o=BranchX,dc=example,dc=com and one common branch o=BranchMain,dc=example,dc=com. The picture is like this:
Provider o=BranchMain,dc=example,dc=com o=Branch1,dc=example,dc=com o=Branch2,dc=example,dc=com ..... o=BranchX,dc=example,dc=com
Consumer 1: o=BranchMain,dc=example,dc=com o=Branch1,dc=example,dc=com
Consumer 2: o=BranchMain,dc=example,dc=com o=Branch2,dc=example,dc=com
But it seems you have implemented this by using a single database at dc=example,dc=com, with multiple syncrepl statements (one for each subtree that you replicate). As far as I know, this in not supported. Instead, you should consider using a separate database for each syncrepl statement, and glue the databases together by using the 'subordinate' statement in each sub-tree database.
This would look something like this:
database bdb suffix o=BranchMain,dc=example,dc=com subordinate syncrepl ... [...]
database bdb suffix o=Branch1,dc=example,dc=com subordinate syncrepl ... [...]
database bdb suffix dc=example,dc=com syncrepl ...
At the begining I had one consumer, which was segfaultin just randomly once or twice a day. I decided to comment out my syncrepl directives in conf file and now it is running for a day and half.
I'm running 2.4.9 or 2.4.10 on my own systems at present, but I didn't see any problems like this on small databases with 2.4.8.
I should mention, that after cosumer segfaults I cannot start slapd any more. The only solution I have is to delete ol /var/lib/ldap (all database) directory contents and then restarting slapd.
Did running database recovery (/etc/init.d/ldap recover) make any difference here?
If restarting slapd on the old database - segfaulti is happening.
Since this was a smaill branch and only one branch I thought to debug the problem later. Today I faced the same situation on a biger consumer. The same situation. slapd just crashed and only deleting database helped me to start it again.
For a larger database, you should have some database cache specified in the DB_CONFIG file in the database directory.
My systems are Mandriva 2008.1 with slapd version: @(#) $OpenLDAP: slapd 2.4.8 (Mar 23 2008 16:49:39) $ mandrake@klodia.mandriva.com: /home/mandrake/rpm/BUILD/openldap-2.4.8/servers/slapd
I am considering shipping an official update, most likely to 2.4.10. In the meantime, I have released a 2.4.10 to backports for 2008.1. If fixing your configuration doesn't address all your stability problems, you may want to consider upgrading to that package.
I have one branch runing old slapd versions (the ones comming with Mandriva 2007.0), but they seem to work except that I can have replicated only one branch (one rid).
See above, the multiple-database (one syncrepl statement per database) would work in 2.3 as well.
Seems old slapd doesn't support several rids.
And the new slapd mainly supports multiple syncrepl statements in the same database for multi-master replication, not for the design you've chosen.
Can anybody help me to debug this situation? This configuration is rather new but I was thinking to build all infrastructure on such a configuration, so segfaulting is very big issue. Provider (master) configuration is:
include /usr/share/openldap/schema/core.schema include /usr/share/openldap/schema/cosine.schema include /usr/share/openldap/schema/corba.schema include /usr/share/openldap/schema/inetorgperson.schema include /usr/share/openldap/schema/nis.schema include /usr/share/openldap/schema/openldap.schema include /usr/share/openldap/schema/samba.schema include /usr/share/openldap/schema/qmail.schema include /etc/openldap/schema/local.schema include /etc/openldap/slapd.access.conf access to dn.subtree="dc=example,dc=com" by group="cn=Replicator,ou=Group,dc=example,dc=com" by users read by anonymous read pidfile /var/run/ldap/slapd.pid argsfile /var/run/ldap/slapd.args modulepath /usr/lib64/openldap moduleload syncprov.la TLSRandFile /dev/random TLSCipherSuite HIGH:MEDIUM:+SSLv2+SSLv3 TLSCertificateFile /etc/pki/tls/certs/slapd.pem TLSCertificateKeyFile /etc/pki/tls/certs/slapd.pem TLSCACertificatePath /etc/pki/tls/certs/ TLSCACertificateFile /etc/pki/tls/certs/ca-bundle.crt TLSVerifyClient never # ([never]|allow|try|demand) database bdb suffix "dc=example,dc=com" rootdn "cn=Manager,dc=example,dc=com" rootpw secret directory /var/lib/ldap checkpoint 256 5 index mailAlternateAddress eq,sub index accountStatus,mailHost,deliveryMode eq index default sub index objectClass eq index cn,mail,surname,givenname eq,subinitial index uidNumber,gidNumber,memberuid,member,uniqueMember eq index uid eq,subinitial index sambaSID,sambaDomainName,displayName eq index entryCSN,entryUUID eq limits group="cn=Replicator,dc=infosaitas,dc=lt" size=unlimited time=unlimited
access to * by group="cn=Replicator,dc=infosaitas,dc=lt" write by * read overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 10
Consumers configuration (all the same): include /usr/share/openldap/schema/core.schema include /usr/share/openldap/schema/cosine.schema include /usr/share/openldap/schema/corba.schema include /usr/share/openldap/schema/inetorgperson.schema include /usr/share/openldap/schema/nis.schema include /usr/share/openldap/schema/openldap.schema include /usr/share/openldap/schema/samba.schema include /usr/share/openldap/schema/qmail.schema include /etc/openldap/schema/local.schema include /etc/openldap/slapd.access.conf include /etc/openldap/slapd.access.ldapauth.conf access to dn.subtree="dc=example,dc=com" by group="cn=Replicator,ou=Group,dc=example,dc=com" by users read by anonymous read pidfile /var/run/ldap/slapd.pid argsfile /var/run/ldap/slapd.args modulepath /usr/lib64/openldap moduleload back_ldap.la TLSCertificateFile /etc/ssl/openldap/ldap.pem TLSCertificateKeyFile /etc/ssl/openldap/ldap.pem TLSCACertificateFile /etc/ssl/openldap/ldap.pem overlay chain chain-uri "ldap://master.server" chain-idassert-bind bindmethod="simple" binddn="cn=Manager,dc=example,dc=com" credentials=secret mode="none" chain-tls start chain-return-error TRUE database bdb suffix "dc=example,dc=com" rootdn "cn=Manager,dc=example,dc=com" rootpw secret directory /var/lib/ldap checkpoint 256 5 index objectClass eq index mailAlternateAddress eq,sub index accountStatus,mailHost,deliveryMode eq index default sub index cn,mail,surname,givenname eq,subinitial index uidNumber,gidNumber,memberuid,member,uniqueMember eq index uid eq,subinitial index sambaSID,sambaDomainName,displayName eq limits group="cn=Replicator,ou=Group,dc=example,dc=com" size=unlimited time=unlimited
syncrepl rid=1 provider=ldap://master.server:389 type=refreshAndPersist retry="60 +" searchbase="o=BranchMain,dc=example,dc=com" filter="(objectClass=*)" scope=sub attrs=*
Using attrs=* will mean you don't replicate operational attributes, either leave attrs unspecified, or use the default ('attrs=*,+').
schemachecking=off bindmethod=simple binddn="cn=Manager,dc=example,dc=com" credentials=secret starttls=yes
syncrepl rid=2 provider=ldap://master.server:389 type=refreshAndPersist retry="60 +" searchbase="o=Branch1,dc=example,dc=com" filter="(objectClass=*)" scope=sub attrs=* schemachecking=off bindmethod=simple binddn="cn=Manager,dc=example,dc=com" credentials=secret starttls=yes updateref ldap://master.server
Regards, Buchan