I haven't looked at this part of back-monitor. Someone else care to respond?
Our study shows a possible deadlock in OpenLDAP 2.4.8. Hope you could help explain.
- monitor_cache_get at servers/slapd/back-monitor/cache.c:163 waiting for mp_mutex while holding mi_cache_mutex
- monitor_cache_release at servers/slapd/back-monitor/cache.c:366 waiting for mi_cache_mutex while holding mp_mutex
We have not been able to verify this in a real running environment, which could be difficult, if not impossible. Therefore your comments would be extremely valuable. Any help would be greatly appreciated.
======= At 2008-01-12, 19:48:38 you wrote: =======
Yin Wang wrote:
Sorry I am replying a very old email below that you send in last April. Terence is a colleague of mine and we are still working on the project. I hope to understand the problem better.
When you said "While we can control the order of lock acquisition in the OpenLDAP code, we have no control over it in the BerkeleyDB layer", do you mean the (possible) deadlock comes from BerkeleyDB or it is because of the interaction of OpenLDAP and BerkeleyDB? If it is the latter case and I assume BerkeleyDB is deadlock-free, I don't understand why using such a library could cause deadlocks.
Your help would be greatly appreciated.
Since you say you're working on a research project, you shouldn't assume anything. You should do some actual research. The BerkeleyDB lock system is fully described in their documentation. Read it.
If you have questions that aren't addressed by the BerkeleyDB docs you can ask those, but I don't have time to answer questions about things that are already well documented.
Yin Wang Research Assistant EECS Department, University of Michigan
-----Original Message----- From: Howard Chu [mailto:email@example.com] Sent: Thursday, April 19, 2007 5:32 PM To: Kelly, Terence P Cc: Project@openldap.org Subject: Re: deadlocks in OpenLDAP
Kelly, Terence P wrote:
I'm a researcher with interests in concurrent programming issues. I'm writing with a question about deadlocks in OpenLDAP code.
Based on the OpenLDAP issue tracking system, I gather that deadlocks involving circular wait for locks have occurred or have been suspected in slapd.
In principle it's possible to avoid deadlock by consistently acquiring locks in a defined order, but in practice this can be inconvenient or impossible.
Can you give me some intuition for why it's hard to prevent deadlocks in slapd? Has your experience with deadlocks in OpenLDAP software given you any generic insights into deadlock and how (not) to avoid it? Would your insights apply to other software in addition to slapd?
Many thanks in advance for any wisdom you can share! Long editorials and brain dumps are particularly welcome.
For OpenLDAP the problem is that there are two layers of locking systems
in use - the OpenLDAP code and the BerkeleyDB code. While we can control
the order of lock acquisition in the OpenLDAP code, we have no control over it in the BerkeleyDB layer. As such, the usual approach of strictly
ordering locks doesn't work here.
= = = = = = = = = = = = = = = = = = = =