Hi ,
I am trying to configure and use the openldap 2-WAY Multimaster Replication setup for high availability(HA), but the HA solution used to freeze very often. Using ansible tool I have automated the installation and configuration steps. I tried with manual steps too before using the ansible for the actual deployment
Following are the env. details used for installation and configuration
Two nodes ldap-test1 and ldap-test2 are running with base OS RHEL7
LDAP rpms installed:
openldap-clients-2.4.39-6.el7.x86_64 openldap-servers-2.4.39-6.el7.x86_64 openldap-2.4.39-6.el7.x86_64
Configuration steps:
1. Added the nis, cosine schemas ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f cosine.ldif ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f nis.ldif
2. Send the new global configuration settings to slapd 'ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/global_config.ldif
# cat global_config.ldif
dn: cn=module{0},cn=config objectClass: olcModuleList cn: module{0} olcModuleLoad: syncprov
dn: olcOverlay=syncprov,olcDatabase={0}config,cn=config changetype: add objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: syncprov
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcSuffix olcSuffix: dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcRootDN olcRootDN: cn=Manager,dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcRootPW olcRootPW: {SSHA}vAp3OToPGMYnWEkh+76RJEVyfCIdnsDg
dn: cn=config changetype: modify replace: olcTLSCACertificateFile olcTLSCACertificateFile: /etc/openldap/certs/cacert.pem
dn: cn=config changetype: modify replace: olcTLSCertificateFile olcTLSCertificateFile: /etc/openldap/certs/slapdcert.pem
dn: cn=config changetype: modify replace: olcTLSCertificateKeyFile olcTLSCertificateKeyFile: /etc/openldap/certs/slapdkey.pem
dn: cn=config changetype: modify replace: olcLogLevel olcLogLevel: -1
dn: olcDatabase={1}monitor,cn=config changetype: modify replace: olcAccess olcAccess: {0}to * by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" read by dn.base="cn=Manager,dc=example,dc=com" read by * none
dn: cn=config changeType: modify add: olcServerID olcServerID: 1
dn: olcDatabase={0}config,cn=config changetype: modify add: olcRootDN olcRootDN: cn=admin,cn=config
dn: cn=config changetype: modify
dn: olcDatabase={0}config,cn=config changetype: modify replace: olcRootPW olcRootPW: {SSHA}vAp3OToPGMYnWEkh+76RJEVyfCIdnsDg
dn: cn=config changetype: modify replace: olcServerID olcServerID: 1 ldaps://ldap-test1 olcServerID: 2 ldaps://ldap-test2
3. Load base.ldif ldapadd -x -w redhat7 -D cn=Manager,dc=example,dc=com -f /etc/openldap/base.ldif
# cat base.ldif dn: dc=example,dc=com dc: example objectClass: top objectClass: domain
dn: ou=People,dc=example,dc=com ou: People objectClass: top objectClass: organizationalUnit
dn: ou=Group,dc=example,dc=com ou: Group objectClass: top objectClass: organizationalUnit
4. Load hdb_config.ldif ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/hdb_config.ldif
#cat hdb_config.ldif
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcSuffix olcSuffix: dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcRootDN olcRootDN: cn=Manager,dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcRootPW olcRootPW: {{ ldap_root_password.stdout }}
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcDbIndex olcDbIndex: entryCSN eq olcDbIndex: entryUUID eq
5. Load replication.ldif ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/replication.ldif
cat replication.ldif
dn: olcOverlay=syncprov,olcDatabase={2}hdb,cn=config changetype: add objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: syncprov
dn: olcDatabase={2}hdb,cn=config changetype: modify replace: olcSyncRepl olcSyncRepl: rid=101 provider=ldaps://ldap-test1 binddn="cn=Manager,dc=example,dc=com" bindmethod=simple credentials=redhat7 searchbase="dc=example,dc=com" type=refreshAndPersist interval=00:00:00:10 retry="5 5 300 5" timeout=1 olcSyncRepl: rid=102 provider=ldaps://ldap-test2 binddn="cn=Manager,dc=example,dc=com" bindmethod=simple credentials=redhat7 searchbase="dc=example,dc=com" type=refreshAndPersist interval=00:00:00:10 retry="5 5 300 5" timeout=1 - replace: olcMirrorMode olcMirrorMode: TRUE
dn: olcDatabase={0}config,cn=config changetype: modify replace: olcSyncRepl olcSyncRepl: rid=101 provider=ldaps://ldap-test1 binddn="cn=admin,cn=config" bindmethod=simple credentials=redhat7 searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=1 olcSyncRepl: rid=102 provider=ldaps://ldap-test2 binddn="cn=admin,cn=config" bindmethod=simple credentials=redhat7 searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=1 - replace: olcMirrorMode olcMirrorMode: TRUE
Above configuration steps 1 to 4 are executed parallely on both nodes , only step 5 ie. replication.ldif was executed serially one node after the other because parallel execution of step 5 causing the solution to freeze
Parallel execution on both nodes : 1. Added the nis, cosine schemas 2. Send the new global configuration settings to slapd 3. Load base.ldif
4. Load hdb_config.ldif , executed on any one of two nodes assuming that content will replicated automatically on other node once the servers are replicated
Serial execution : Executed on both nodes one after the other 5. Load replication.ldif
Sometimes LDAP replication is causing the solution to freeze and it may hung during the deployment , after deployment or while executing some basic ldap operations like ldapadd/modify/delete etc
1. First thing I would like to know Is there any specific order we need to follow to avoid solution freeze or ldapadd command hung with two nodes configuring parallely ?
2. Anything wrong with the configuration attributes used ? If so what attributes I need to add/update to avoid the command/service hung during configuration or after the deployment ?
3. To verify the high availability I used to stop ldap service any one of the two nodes and send the ldap requests to other node, but sometimes the restart of the service not bringing back the two nodes in Sync. And for replication I am verifying based on number of connections established between two servers ( min. 4 , 2 for config replication and 2 for db replication ) And unittest to verify db replication by creating an ldap user on one node and search & delete operations on the other node.
4. Whenever the basic ldap commands add/modify etc hung on any one node , I used to restart the slapd service on the corresponding node, Is it the right way ? HA solution is working fine by configuring above steps in the given order, ldap services are restarted and in Sync, but sometimes the solution used to freeze when executing basic ldapadd commond on any one node
There are no specific error messages in the logs to understand the reason for hung. May 15 22:20:17 ldap-test2 slapd[8538]: daemon: epoll: listen=7 active_threads=0 tvp=zero May 15 22:20:17 ldap-test2 slapd[8538]: daemon: epoll: listen=8 active_threads=0 tvp=zero
Always a single restart is also not helping to bring the servers back to sync and available, sometimes the number of tcp connections are 3 instead of required 4 connections
[root@ldap-test1 openldap]# netstat -a | grep ldaps tcp 0 0 ldap-test1:ldaps 0.0.0.0:* LISTEN tcp 0 0 ldap-test1:34854 ldap-test2:ldaps ESTABLISHED tcp 0 0 ldap-test1:ldaps ldap-test2:48493 ESTABLISHED tcp 0 0 ldap-test1:34856 ldap-test2:ldaps ESTABLISHED
Used to restart couple of times until 4 tcp connections are established and servers are replicated properly
5. Can someone please help me to get rid of this situation ? I didn't find any clue in the archived messages etc for similar kind of problems.
Thanks & Regards, Shashi
openldap-technical@openldap.org