Hi ,
I am trying to configure and use the openldap 2-WAY Multimaster Replication
setup for high availability(HA), but the HA solution used to freeze very
often. Using ansible tool I have automated the installation and
configuration steps. I tried with manual steps too before using the ansible
for the actual deployment
Following are the env. details used for installation and configuration
Two nodes ldap-test1 and ldap-test2 are running with base OS RHEL7
LDAP rpms installed:
openldap-clients-2.4.39-6.el7.x86_64
openldap-servers-2.4.39-6.el7.x86_64
openldap-2.4.39-6.el7.x86_64
Configuration steps:
1. Added the nis, cosine schemas
ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f cosine.ldif
ldapadd -Y EXTERNAL -H ldapi:/// -D "cn=config" -f nis.ldif
2. Send the new global configuration settings to slapd
'ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/global_config.ldif
# cat global_config.ldif
dn: cn=module{0},cn=config
objectClass: olcModuleList
cn: module{0}
olcModuleLoad: syncprov
dn: olcOverlay=syncprov,olcDatabase={0}config,cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcRootDN
olcRootDN: cn=Manager,dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcRootPW
olcRootPW: {SSHA}vAp3OToPGMYnWEkh+76RJEVyfCIdnsDg
dn: cn=config
changetype: modify
replace: olcTLSCACertificateFile
olcTLSCACertificateFile: /etc/openldap/certs/cacert.pem
dn: cn=config
changetype: modify
replace: olcTLSCertificateFile
olcTLSCertificateFile: /etc/openldap/certs/slapdcert.pem
dn: cn=config
changetype: modify
replace: olcTLSCertificateKeyFile
olcTLSCertificateKeyFile: /etc/openldap/certs/slapdkey.pem
dn: cn=config
changetype: modify
replace: olcLogLevel
olcLogLevel: -1
dn: olcDatabase={1}monitor,cn=config
changetype: modify
replace: olcAccess
olcAccess: {0}to * by
dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth" read by
dn.base="cn=Manager,dc=example,dc=com" read by * none
dn: cn=config
changeType: modify
add: olcServerID
olcServerID: 1
dn: olcDatabase={0}config,cn=config
changetype: modify
add: olcRootDN
olcRootDN: cn=admin,cn=config
dn: cn=config
changetype: modify
dn: olcDatabase={0}config,cn=config
changetype: modify
replace: olcRootPW
olcRootPW: {SSHA}vAp3OToPGMYnWEkh+76RJEVyfCIdnsDg
dn: cn=config
changetype: modify
replace: olcServerID
olcServerID: 1 ldaps://ldap-test1
olcServerID: 2 ldaps://ldap-test2
3. Load base.ldif
ldapadd -x -w redhat7 -D cn=Manager,dc=example,dc=com -f
/etc/openldap/base.ldif
# cat base.ldif
dn: dc=example,dc=com
dc: example
objectClass: top
objectClass: domain
dn: ou=People,dc=example,dc=com
ou: People
objectClass: top
objectClass: organizationalUnit
dn: ou=Group,dc=example,dc=com
ou: Group
objectClass: top
objectClass: organizationalUnit
4. Load hdb_config.ldif
ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/hdb_config.ldif
#cat hdb_config.ldif
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcSuffix
olcSuffix: dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcRootDN
olcRootDN: cn=Manager,dc=example,dc=com
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcRootPW
olcRootPW: {{ ldap_root_password.stdout }}
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcDbIndex
olcDbIndex: entryCSN eq
olcDbIndex: entryUUID eq
5. Load replication.ldif
ldapadd -Y EXTERNAL -H ldapi:/// -f /etc/openldap/replication.ldif
cat replication.ldif
dn: olcOverlay=syncprov,olcDatabase={2}hdb,cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov
dn: olcDatabase={2}hdb,cn=config
changetype: modify
replace: olcSyncRepl
olcSyncRepl: rid=101 provider=ldaps://ldap-test1
binddn="cn=Manager,dc=example,dc=com" bindmethod=simple credentials=redhat7
searchbase="dc=example,dc=com" type=refreshAndPersist interval=00:00:00:10
retry="5 5 300 5" timeout=1
olcSyncRepl: rid=102 provider=ldaps://ldap-test2
binddn="cn=Manager,dc=example,dc=com" bindmethod=simple credentials=redhat7
searchbase="dc=example,dc=com" type=refreshAndPersist interval=00:00:00:10
retry="5 5 300 5" timeout=1
-
replace: olcMirrorMode
olcMirrorMode: TRUE
dn: olcDatabase={0}config,cn=config
changetype: modify
replace: olcSyncRepl
olcSyncRepl: rid=101 provider=ldaps://ldap-test1
binddn="cn=admin,cn=config" bindmethod=simple credentials=redhat7
searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=1
olcSyncRepl: rid=102 provider=ldaps://ldap-test2
binddn="cn=admin,cn=config" bindmethod=simple credentials=redhat7
searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=1
-
replace: olcMirrorMode
olcMirrorMode: TRUE
Above configuration steps 1 to 4 are executed parallely on both nodes ,
only step 5 ie. replication.ldif was executed serially one node after the
other
because parallel execution of step 5 causing the solution to freeze
Parallel execution on both nodes :
1. Added the nis, cosine schemas
2. Send the new global configuration settings to slapd
3. Load base.ldif
4. Load hdb_config.ldif , executed on any one of two nodes assuming that
content will replicated automatically on other node once the servers are
replicated
Serial execution : Executed on both nodes one after the other
5. Load replication.ldif
Sometimes LDAP replication is causing the solution to freeze and it may
hung during the deployment , after deployment or while executing some
basic ldap operations like ldapadd/modify/delete etc
1. First thing I would like to know Is there any specific order we need to
follow to avoid solution freeze or ldapadd command hung with two nodes
configuring parallely ?
2. Anything wrong with the configuration attributes used ? If so what
attributes I need to add/update to avoid the command/service hung during
configuration or after the deployment ?
3. To verify the high availability I used to stop ldap service any one of
the two nodes and send the ldap requests to other node, but sometimes the
restart of the service not bringing back the two nodes in Sync. And for
replication I am verifying based on number of connections established
between two servers ( min. 4 , 2 for config replication and 2 for db
replication )
And unittest to verify db replication by creating an ldap user on one
node and search & delete operations on the other node.
4. Whenever the basic ldap commands add/modify etc hung on any one node , I
used to restart the slapd service on the corresponding node, Is it the
right way ?
HA solution is working fine by configuring above steps in the given order,
ldap services are restarted and in Sync, but sometimes the solution used to
freeze when executing basic ldapadd commond on any one node
There are no specific error messages in the logs to understand the reason
for hung.
May 15 22:20:17 ldap-test2 slapd[8538]: daemon: epoll: listen=7
active_threads=0 tvp=zero
May 15 22:20:17 ldap-test2 slapd[8538]: daemon: epoll: listen=8
active_threads=0 tvp=zero
Always a single restart is also not helping to bring the servers back to
sync and available,
sometimes the number of tcp connections are 3 instead of required 4
connections
[root@ldap-test1 openldap]# netstat -a | grep ldaps
tcp 0 0 ldap-test1:ldaps 0.0.0.0:* LISTEN
tcp 0 0 ldap-test1:34854 ldap-test2:ldaps ESTABLISHED
tcp 0 0 ldap-test1:ldaps ldap-test2:48493 ESTABLISHED
tcp 0 0 ldap-test1:34856 ldap-test2:ldaps ESTABLISHED
Used to restart couple of times until 4 tcp connections are established and
servers are replicated properly
5. Can someone please help me to get rid of this situation ?
I didn't find any clue in the archived messages etc for similar kind of
problems.
Thanks & Regards,
Shashi