Dear OpenLDAP Administrators,
Not sure if you get time to look into this issue yet. This issue only happens when power-off/power cut-off one of the mirror servers, and could be probably prevented by “sending heart beat” to verify the established connections.
Thanks for your time looking at this email thread and your effort :)
Thanks, Eric ________________________________ From: owner-qdlcp-security@LIST.ALCATEL-LUCENT.COM [mailto:owner-qdlcp-security@LIST.ALCATEL-LUCENT.COM] On Behalf Of ZHOU Eric JP Sent: 2012年1月4日 15:56 To: openldap-bugs@openldap.org; info@OpenLDAP.org; openldap-technical@openldap.org; openldap-devel@openldap.org Cc: qdlcp-security@list.alcatel-lucent.com; ANTHONY Michael; HO Yao; VAN RANGELROOIJ Ardo Subject: OpenLDAP replciation issue with MirrorMode
Dear OpenLDAP Administrators,
Recently we come across an OpenLDAP replication issue with OpenLDAP Mirror Mode.
After configuring MIRROR-A and MIRROR-B in mirror mode with below configuration, it worked pretty well for a long period. But an issue comes up after MIRROR-A reboot, MIRROR-B could not get modification from MIRROR-A any more. After investigating the issue we find the original socket on MIRROR-B (consumer) is not reconnected.
==================================== ## MIRROR-A ---------------------------------------------------------------------- ## ---------------------------------------------------------------------- serverID 1 ## Consumer syncrepl rid=001 provider=ldap://10.207.131.1:389 bindmethod=simple binddn="uid=PrivDirUsr,o=CSOSSO" credentials=mypassword searchbase="o=CSOSSO" schemachecking=on type=refreshAndPersist interval=00:00:01:00 retry="10 +" mirrormode on
## MIRROR-A ---------------------------------------------------------------------- ## ---------------------------------------------------------------------- serverID 2 ## Consumer syncrepl rid=001 provider=ldap://10.207.130.1:389 bindmethod=simple binddn="uid=PrivDirUsr,o=CSOSSO" credentials=mypassword searchbase="o=CSOSSO" schemachecking=on type=refreshAndPersist interval=00:00:01:00 retry="10 +" mirrormode on
Below is the socket information after MIRROR-A reboot, ==================================== ## MIRROR-A ---------------------------------------------------------------------- # lsof -i :389 | grep ldap | grep -v sshd | grep -v localhost | grep 10 slapd 16842 root 14u IPv4 36160 TCP ln007-cnfg-p00m000-d0:51114->10.207.131.1:ldap (ESTABLISHED) ## MIRROR-B ---------------------------------------------------------------------- # lsof -i :389 | grep ldap | grep -v sshd | grep -v localhost | grep 10 slapd 4825 root 14u IPv4 168497 TCP ln007-cnfg-p00m001-d0:52239->10.207.131.0:ldap (ESTABLISHED) slapd 4825 root 18u IPv4 193403 TCP ln007-mi-p00m001-d0:ldap->10.207.130.0:51114 (ESTABLISHED)
Normally it should be, ## MIRROR-A ---------------------------------------------------------------------- # lsof -i :389 | grep ldap | grep -v sshd | grep -v localhost | grep 10 slapd 16842 root 14u IPv4 36160 TCP ln007-cnfg-p00m000-d0:51114->10.207.131.1:ldap (ESTABLISHED) slapd 4825 root 18u IPv4 193403 TCP ln007-mi-p00m000-d0:ldap->10.207.130.1: 52239 (ESTABLISHED) // This link is missing ## MIRROR-B ---------------------------------------------------------------------- # lsof -i :389 | grep ldap | grep -v sshd | grep -v localhost | grep 10 slapd 4825 root 14u IPv4 168497 TCP ln007-cnfg-p00m001-d0:52239->10.207.131.0:ldap (ESTABLISHED) slapd 4825 root 18u IPv4 193403 TCP ln007-mi-p00m001-d0:ldap->10.207.130.0:51114 (ESTABLISHED)
I would greatly apreicate if you could provide some suggestions/comments upon this or improve OpenLDAP functionality to avoid this.
For me I think this is normal TCP server down scenario, but probably you people could prohibit this from happening in below two methods? 1. Let OpenLDAP send mutual heart beat so that client knows when server is dead. 2. Let OpenLDAP send message to all its client when it is dying (e.g. receiving SIGTERM) // this does not work when MIRROR-A power cycle.
Sincerely, Eric Zhou Jianping P please save a tree by not printing this e-mail.
________________________________ To unsubscribe: qdlcp-security-unsubscribe-request@list.alcatel-lucent.commailto:qdlcp-security-unsubscribe-request@list.alcatel-lucent.com
On Mon, 9 Jan 2012, ZHOU Eric JP wrote:
This issue only happens when power-off/power cut-off one of the mirror servers, and could be probably prevented by ?sending heart beat? to verify the established connections.
[cutting out openldap-devel; this is usage]
Sure; see the associated ITS and the thread "No replication after power failure":
http://www.openldap.org/lists/openldap-bugs/200710/msg00044.html
The only thing I'd note is that "2 hours...is too long" doesn't seem like that strong an excuse. It's easily tunable in all modern OS I can think of, and in some cases (Linux comes to mind) you can just setsockopt() within the application itself so as to not alter the system-wide settings.
Actually, I'm starting to wonder if this should be an option you can pass as a syncrepl (really libldap) configuration directive. There's something to be said for having aggressive-ish keepalives on server-to-server communication (with, hopefully, good bandwidth/latency/communication costs/etc.) and keeping your site defaults for "normal" clients. OpenLDAP already inched in this direction with the ability to set different idletimeout, etc., for the syncrepl client. Is anybody interested in this? Maybe I'll do it (or have a student do it) if a few people think it'd help...
--On Thursday, January 12, 2012 9:14 AM -0500 Aaron Richton richton@nbcs.rutgers.edu wrote:
Actually, I'm starting to wonder if this should be an option you can pass as a syncrepl (really libldap) configuration directive. There's something to be said for having aggressive-ish keepalives on server-to-server communication (with, hopefully, good bandwidth/latency/communication costs/etc.) and keeping your site defaults for "normal" clients. OpenLDAP already inched in this direction with the ability to set different idletimeout, etc., for the syncrepl client. Is anybody interested in this? Maybe I'll do it (or have a student do it) if a few people think it'd help...
I'm guessing you haven't read the syncrepl section of the man pages in a while. See the "keepalive" option.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
if you use an openldap version which doesn't support this on linux, you should modify the systems parameters :
/proc/sys/net/ipv4/tcp_keepalive_intvl /proc/sys/net/ipv4/tcp_keepalive_probes /proc/sys/net/ipv4/tcp_keepalive_time
Regards,
Olivier
________________________________________ De : openldap-technical-bounces@OpenLDAP.org [openldap-technical-bounces@OpenLDAP.org] de la part de Quanah Gibson-Mount [quanah@zimbra.com] Date d'envoi : jeudi 12 janvier 2012 19:01 À : Aaron Richton; ZHOU Eric JP Cc : ANTHONY Michael; qdlcp-security@list.alcatel-lucent.com; HO Yao; VAN RANGELROOIJ Ardo; openldap-technical@openldap.org Objet : RE: OpenLDAP replciation issue with MirrorMode
--On Thursday, January 12, 2012 9:14 AM -0500 Aaron Richton richton@nbcs.rutgers.edu wrote:
Actually, I'm starting to wonder if this should be an option you can pass as a syncrepl (really libldap) configuration directive. There's something to be said for having aggressive-ish keepalives on server-to-server communication (with, hopefully, good bandwidth/latency/communication costs/etc.) and keeping your site defaults for "normal" clients. OpenLDAP already inched in this direction with the ability to set different idletimeout, etc., for the syncrepl client. Is anybody interested in this? Maybe I'll do it (or have a student do it) if a few people think it'd help...
I'm guessing you haven't read the syncrepl section of the man pages in a while. See the "keepalive" option.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
Aaron Richton wrote:
On Mon, 9 Jan 2012, ZHOU Eric JP wrote:
This issue only happens when power-off/power cut-off one of the mirror servers, and could be probably prevented by ?sending heart beat? to verify the established connections.
[cutting out openldap-devel; this is usage]
Sure; see the associated ITS and the thread "No replication after power failure":
http://www.openldap.org/lists/openldap-bugs/200710/msg00044.html
The only thing I'd note is that "2 hours...is too long" doesn't seem like that strong an excuse. It's easily tunable in all modern OS I can think of, and in some cases (Linux comes to mind) you can just setsockopt() within the application itself so as to not alter the system-wide settings.
Actually, I'm starting to wonder if this should be an option you can pass as a syncrepl (really libldap) configuration directive. There's something to be said for having aggressive-ish keepalives on server-to-server communication (with, hopefully, good bandwidth/latency/communication costs/etc.) and keeping your site defaults for "normal" clients. OpenLDAP already inched in this direction with the ability to set different idletimeout, etc., for the syncrepl client. Is anybody interested in this? Maybe I'll do it (or have a student do it) if a few people think it'd help...
Already done, see the keepalive= keyword in syncrepl config.
openldap-technical@openldap.org