This may be problem that's been fixed from 2.4.6 to 2.4.11. I'm hoping someone recognizes it and can confirm it's been fixed. (We committed to 2.4.6 for the short term future.)
We've got basic refreshAndPersist replication working fine.
Replication declarations from the master slapd.conf...
overlay syncprov syncprov-checkpoint 100 10 syncprov-sessionlog 100
Replication declarations from the slave slapd.conf....
syncrepl rid=123 provider=ldap://<masterIP>:389 type=refreshAndPersist retry="120 +" searchbase="o=replDB" bindmethod=simple binddn="cn=replman,o=replDB" credentials=password
When we shut down the slave, the slave hangs. Here's the debug log...
daemon: closing 12582954+ slapd shutdown: waiting for 1 threads to terminate+ =.do_syncrepl rid=123+ connection_get(12582955)+ connection_get(12582955): got connid=0+ daemon: removing 12582955r+ ldap_free_request (origid 2, msgid 2)+ ldap_free_connection 1 1+ ldap_send_unbind+ ber_flush2: 7 bytes to sd 12582955+ 0000: 30 05 02 01 03 42 00 0....B. + ldap_write: want=7, written=7+ 0000: 30 05 02 01 03 42 00 0....B. + ldap_free_connection: actually freed+
We think the problem is with the following code in tpool.c. Specifically, ltp_open_count has a value of 1, so we're stuck in a loop. If we set ltp_open_count to 0, the server comes down properly. At that point we can restart it again.
while (pool->ltp_open_count) { if (!pool->ltp_pause) ldap_pvt_thread_cond_broadcast(&pool->ltp_cond); ldap_pvt_thread_cond_wait(&pool->ltp_cond, &pool-> ltp_mutex); }
Does this problem look familiar to anyone?
Once again, we apologize for being on a backlevel release... but we committed to 2.4.6 for the first release, which is coming up shortly. We'll be upgrading for future releases... and we're hoping this problem (if it is a real problem) has been fixed.
Thanks in advance...
Brad T Waldorf wrote:
This may be problem that's been fixed from 2.4.6 to 2.4.11. I'm hoping someone recognizes it and can confirm it's been fixed. (We committed to 2.4.6 for the short term future.)
We think the problem is with the following code in tpool.c. Specifically, ltp_open_count has a value of 1, so we're stuck in a loop. If we set ltp_open_count to 0, the server comes down properly. At that point we can restart it again.
while (pool->ltp_open_count) { if (!pool->ltp_pause) ldap_pvt_thread_cond_broadcast(&pool->ltp_cond); ldap_pvt_thread_cond_wait(&pool->ltp_cond,&pool-> ltp_mutex); }
Does this problem look familiar to anyone?
Sounds like this: OpenLDAP 2.4.9 Release (2008/05/07) Fixed libldap_r tpool pause checks (ITS#5364, #5407)
You can find this sort of thing yourself http://www.openldap.org/software/release/changes.html
Once again, we apologize for being on a backlevel release... but we committed to 2.4.6 for the first release, which is coming up shortly. We'll be upgrading for future releases... and we're hoping this problem (if it is a real problem) has been fixed.
Thanks in advance...
openldap-technical@openldap.org