Hi ,
I am trying to configure and use the openldap 2-WAY Multimaster Replication setup for high availability(HA), but the HA solution used to freeze very often. Using ansible tool I have automated the installation and configuration steps. I tried with manual steps too before using the ansible for the actual deployment
Following are the env. details used for installation and configuration
Two nodes ldap-test1 and ldap-test2 are running with base OS RHEL7
LDAP rpms installed:
openldap-clients-2.4.39-6.el7.x86_64
openldap-servers-2.4.39-6.el7.x86_64
openldap-2.4.39-6.el7.x86_64
Configuration steps:
1. Added the nis, cosine schemas
ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f cosine.ldif
ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f nis.ldif
2. Send the new global configuration settings to slapd
'ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/global_config.ldif
# cat global_config.ldif
dn: cn=module{0},cn=config
objectClass: olcModuleList
cn: module{0}
olcModuleLoad: syncprov
dn: olcOverlay=syncprov,olcDatabase={0}config,cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcRootDN
olcRootDN: cn=Manager,dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcRootPW
olcRootPW: {SSHA}vAp3OToPGMYnWEkh+76RJEVyfCIdnsDg
dn: cn=config
changetype: modify
replace: olcTLSCACertificateFile
olcTLSCACertificateFile: /etc/openldap/certs/cacert.pem
dn: cn=config
changetype: modify
replace: olcTLSCertificateFile
olcTLSCertificateFile: /etc/openldap/certs/slapdcert.pem
dn: cn=config
changetype: modify
replace: olcTLSCertificateKeyFile
olcTLSCertificateKeyFile: /etc/openldap/certs/slapdkey.pem
dn: cn=config
changetype: modify
replace: olcLogLevel
olcLogLevel: -1
dn: olcDatabase={1}monitor,cn=config
changetype: modify
replace: olcAccess
olcAccess: {0}to * by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" read by dn.base="cn=Manager,dc=example,dc=com" read by * none
dn: cn=config
changeType: modify
add: olcServerID
olcServerID: 1
dn: olcDatabase={0}config,cn=config
changetype: modify
add: olcRootDN
olcRootDN: cn=admin,cn=config
dn: cn=config
changetype: modify
dn: olcDatabase={0}config,cn=config
changetype: modify
replace: olcRootPW
olcRootPW: {SSHA}vAp3OToPGMYnWEkh+76RJEVyfCIdnsDg
dn: cn=config
changetype: modify
replace: olcServerID
olcServerID: 1 ldaps://ldap-test1
olcServerID: 2 ldaps://ldap-test2
3. Load base.ldif
ldapadd -x -w redhat7 -D cn=Manager,dc=example,dc=com -f /etc/openldap/base.ldif
# cat base.ldif
dn: dc=example,dc=com
dc: example
objectClass: top
objectClass: domain
dn: ou=People,dc=example,dc=com
ou: People
objectClass: top
objectClass: organizationalUnit
dn: ou=Group,dc=example,dc=com
ou: Group
objectClass: top
objectClass: organizationalUnit
4. Load hdb_config.ldif
ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/hdb_config.ldif
#cat hdb_config.ldif
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcRootDN
olcRootDN: cn=Manager,dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcRootPW
olcRootPW: {{ ldap_root_password.stdout }}
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcDbIndex
olcDbIndex: entryCSN eq
olcDbIndex: entryUUID eq
5. Load replication.ldif
ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/replication.ldif
cat replication.ldif
dn: olcOverlay=syncprov,olcDatabase={2}hdb,cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcSyncRepl
olcSyncRepl: rid=101 provider=ldaps://ldap-test1 binddn="cn=Manager,dc=example,dc=com" bindmethod=simple credentials=redhat7 searchbase="dc=example,dc=com" type=refreshAndPersist interval=00:00:00:10 retry="5 5 300 5" timeout=1
olcSyncRepl: rid=102 provider=ldaps://ldap-test2 binddn="cn=Manager,dc=example,dc=com" bindmethod=simple credentials=redhat7 searchbase="dc=example,dc=com" type=refreshAndPersist interval=00:00:00:10 retry="5 5 300 5" timeout=1
-
replace: olcMirrorMode
olcMirrorMode: TRUE
dn: olcDatabase={0}config,cn=config
changetype: modify
replace: olcSyncRepl
olcSyncRepl: rid=101 provider=ldaps://ldap-test1 binddn="cn=admin,cn=config" bindmethod=simple credentials=redhat7 searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=1
olcSyncRepl: rid=102 provider=ldaps://ldap-test2 binddn="cn=admin,cn=config" bindmethod=simple credentials=redhat7 searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=1
-
replace: olcMirrorMode
olcMirrorMode: TRUE
Above configuration steps 1 to 4 are executed parallely on both nodes , only step 5 ie. replication.ldif was executed serially one node after the other
because parallel execution of step 5 causing the solution to freeze
Parallel execution on both nodes :
1. Added the nis, cosine schemas
2. Send the new global configuration settings to slapd
3. Load base.ldif
4. Load hdb_config.ldif , executed on any one of two nodes assuming that content will replicated automatically on other node once the servers are replicated
Serial execution : Executed on both nodes one after the other
5. Load replication.ldif
Sometimes LDAP replication is causing the solution to freeze and it may hung during the deployment , after deployment or while executing some basic ldap operations like ldapadd/modify/delete etc
1. First thing I would like to know Is there any specific order we need to follow to avoid solution freeze or ldapadd command hung with two nodes configuring parallely ?
2. Anything wrong with the configuration attributes used ? If so what attributes I need to add/update to avoid the command/service hung during configuration or after the deployment ?
3. To verify the high availability I used to stop ldap service any one of the two nodes and send the ldap requests to other node, but sometimes the restart of the service not bringing back the two nodes in Sync. And for replication I am verifying based on number of connections established between two servers ( min. 4 , 2 for config replication and 2 for db replication )
And unittest to verify db replication by creating an ldap user on one node and search & delete operations on the other node.
4. Whenever the basic ldap commands add/modify etc hung on any one node , I used to restart the slapd service on the corresponding node, Is it the right way ?
HA solution is working fine by configuring above steps in the given order, ldap services are restarted and in Sync, but sometimes the solution used to freeze when executing basic ldapadd commond on any one node
There are no specific error messages in the logs to understand the reason for hung.
May 15 22:20:17 ldap-test2 slapd[8538]: daemon: epoll: listen=7 active_threads=0 tvp=zero
May 15 22:20:17 ldap-test2 slapd[8538]: daemon: epoll: listen=8 active_threads=0 tvp=zero
Always a single restart is also not helping to bring the servers back to sync and available,
sometimes the number of tcp connections are 3 instead of required 4 connections
[root@ldap-test1 openldap]# netstat -a | grep ldaps
tcp 0 0 ldap-test1:ldaps 0.0.0.0:* LISTEN
tcp 0 0 ldap-test1:34854 ldap-test2:ldaps ESTABLISHED
tcp 0 0 ldap-test1:ldaps ldap-test2:48493 ESTABLISHED
tcp 0 0 ldap-test1:34856 ldap-test2:ldaps ESTABLISHED
Used to restart couple of times until 4 tcp connections are established and servers are replicated properly
5. Can someone please help me to get rid of this situation ?
I didn't find any clue in the archived messages etc for similar kind of problems.
Thanks & Regards,
Shashi