Current mdb.master can change MDB_meta.mm_mapsize when another
process has the database open. That can break when one MDB_env
grows the DB file beyond another MDB_env's me_mapsize.
Coexisting MDB_env's must use the same mapsize. Any mm_mapsize
change must be written while mdb_env_open() has the exlusive lock.
For writing the mapsize:
- I presume the cleanest way wouldbe to write it to the not-current meta
page. But mdb_page_alloc() and mdb_txn_commit()'s "Delete IDLs" loop
look like they expect each recent txnid to have a freelist entry. Is
that right? page_alloc uses MDB_SET, should maybe be MDB_SET_RANGE.
I think older libmdb's seeing such a missing entry would grow the map
instead of grabbing the next freelist entry.
- Or overwrite (current meta).mm_mapsize, but then I don't know what to
do at a failed write. env_open cannot get around that by writing back
an older txnid like mdb_env_write_meta() does. Maybe just try to
write back the old mapsize before returning failure.
BTW, write_meta with WRITEMAP should undo the change if msync fails.
- Or do mdb_txn_begin(); mdb_txn_commit(); with the new mm_mapsize and
a dummy freelist entry. Simple enough, if excessive.
As for determining the mapsize, I think it goes something like branch
"mdb/mapsize" in <http://folk.uio.no/hbf/OpenLDAP/openldap.git>.
--
Hallvard
In openldap-technical, Quanah Gibson-Mount wrote:
> As per Howard Chu (Author of MDB, Primary OpenLDAP Developer):
>
> -------------------------------------------------------------
> Full details are in the paper.
> http://www.openldap.org/pub/hyc/mdm-paper.pdf
>
> MDB assumes a unified buffer cache. See section 3.1, references 17, 18, and
> 19.
>
> Note that this requirement can be relaxed in the current version of the
> library. If you create the environment with the MDB_WRITEMAP option then
> all reads and writes are performed using mmap, so the file buffer cache is
> irrelevant. Of course then you lose the protection that the read-only map
> offers.
That's not quite true. mdb_env_open() does a read() of the meta pages.
I presume this can only be a problem when other processes have the
database open? In that situation, I think the read() can be avoided
by maintaining a copy of the relevant MDB_meta information in the
lock file. The read() only needs enough info to know how to map
the data file.
This would require a version increase for the lock file and programs
using it, but not for the database file.
--
Hallvard
/* There are two approaches to the search function:
* 1) walk the list of filter candidates, and see which are in scope.
* 2) walk the scope tree, and see which are in the filter candidates.
* (1) is faster if the filter candidate list is smaller than the scope tree,
* and vice versa. Currently only (1) is implemented.
*
* We don't know the actual size of the subtree, we only know the count of
* onelevel children. For subtree searches we take a guess.
*
* Due to the fact pagedResults cookie is only big enough to store a single
* entryID, if pagedResults is in use we will always do (1). For (2) to work
* reliably we need to use the parent's entryID, to avoid losing our place
* if the target entry is moved or deleted. We'd also need to store the
* count of where we were under the parent. I.e., we need the cookie to be
* two words long. We can examine that in the future, but the pagedResults
* cookie definition is global to all of slapd so the change would impact
* other backends and overlays.
*/
We could alter the dn2id index format, and maintain a numSubordinates counter
there. That would avoid the need to guess. Otherwise I figure we fudge a guess
based on the number of onelevel children and the depth of the baseDN (relative
to the suffix). I.e., the longer the DN, the smaller the guess, relative to
the total number of entries in the DB. Naturally I'd prefer to avoid altering
the dn2id index format.
Suggestions?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Currently we have a single select(or epoll) loop in daemon.c that lists for
all readable and writable sockets, then passes events off to the thread pool
for processing.
We listen for writable sockets if a write attempt returns incomplete. There's
a pair of mutexes and condition variables used to synch up here between the
writing threads and the listener thread. It's quite a lot of lock overhead. As
far as I can tell the main reason we do this is so that we can stop a writer
thread on demand instead of having it just block forever in write().
We could make the listener's job a lot easier if we only have it listen for
readable sockets, and make each writer thread do its own poll. It would need
to poll on two descriptors - the one it's waiting to write on, and a pipe used
by the listener to terminate the poll. That pipe could be signalled by e.g.
the listener writing a byte to it; all writer threads could poll for its read
status. (One question here - if multiple threads are polling the same
descriptor, do they all receive the wakeup event?)
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
The attached patch avoids grabbing conn->c_mutex unnecessarily. Testing on my
laptop with test008 and SLAPD_DEBUG=0 shows a 33% speedup. Before moving ahead
with this patch, I'd like to verify that ITS#5835 doesn't reappear. Anyone
able to test and report back?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Richard Silverman wrote:
> I’ve done this, and it may well be epoll-specific; the test has now run
> over twice as long as the longest it has ever required to produce the
> deadlock. With this sort of bug no amount of waiting would make me sure,
> but it seems likely. I’ll leave it running.
>
> epoll(7) specifically mentions the possibility of epoll_wait hanging even
> though there is outstanding unread data on a socket, when using
> edge-triggered operation, and I notice in daemon.c that you switch to
> edge-triggered mode in the event that the client closes the connection (at
> least that’s what the comment suggests):
>
> /* Don't keep reporting the hangup
> */
> if ( SLAP_SOCK_IS_ACTIVE( tid, fd )) {
> SLAP_EPOLL_SOCK_SET( tid, fd, EPOLLET );
> }
>
> Perhaps related?
Indeed. Seems like ITS#5886 has resurfaced. You can get some insight into this
looking at about commit 96192064f3a3daea994eb8293f0413def5379958 forward. I
don't have time to dig further into it at the moment, Christmas dinner(s)
calling...
>
>> Also, what kernel version(s) are you testing on?
>
> 2.6.32 (Red Hat Enterprise Linux)
I see a lot of Linux-kernel email traffic about epoll bugs as well, but not
sure which are relevant to this version. Just something to stay aware of.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Richard Silverman wrote:
>
>> OK, so thread 13 is waiting on conn 1370, sd 39. There are actually no error
>> messages associated with this connection or socket anywhere in your logs.
>>
>> Likewise for thread 16: conn 1351, sd 20. No errors anywhere.
>>
>> This looks more like a problem with epoll() than anything else. Are the
>> connections associated with socket 20 and 39 actually dead?
>
> The process which initiated those connections is gone; the test harness
> runs
slamd (a load generator) repeatedly and then kills it while it is running, to
simulate the connections going down in a variety of states. Interestingly,
though, here’s the lsof output for the slapd file descriptors:
>
> Most of the TCP connections are in CLOSE_WAIT, which is what you’d expect
> if
the client sent a FIN but the server is frozen and cannot close the socket.
However, note the odd state of the two sockets you asked about, 20 and 39:
“can’t identify protocol.” There are 25 IPv4 connections in CLOSE_WAIT shown
above; by comparison, netstat shows all those, plus one more on port 43362 not
shown above.
> A little googling produced this and similar comments:
>
> https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/
I'm not sure this applies here; perhaps there's more than one way for the
socket to end up in this state.
> ... which I verified using the sample program provided in that blog post; apparently, after a TCP connection is fully closed, Linux just drops the information about it and it shows up like that thereafter. This is the network trace of a connection which ends up in this state:
>
> 03:11:01.875853 IP client.52176 > server.9918: Flags [S], seq 4090195389, win 14600, options [mss 1460,sackOK,TS val 3524517096 ecr 0,nop,wscale 9], length 0
> 03:11:01.875870 IP server.9918 > client.52176: Flags [S.], seq 2724560604, ack 4090195390, win 14480, options [mss 1460,sackOK,TS val 1101728398 ecr 3524517096,nop,wscale 9], length 0
> 03:11:01.876773 IP client.52176 > server.9918: Flags [.], ack 1, win 29, options [nop,nop,TS val 3524517096 ecr 1101728398], length 0
> 03:11:01.876852 IP server.9918 > client.52176: Flags [F.], seq 1, ack 1, win 29, options [nop,nop,TS val 1101728399 ecr 3524517096], length 0
> 03:11:01.877894 IP client.52176 > server.9918: Flags [.], ack 2, win 29, options [nop,nop,TS val 3524517098 ecr 1101728399], length 0
> 03:11:01.878978 IP client.52176 > server.9918: Flags [F.], seq 1, ack 2, win 29, options [nop,nop,TS val 3524517099 ecr 1101728399], length 0
> 03:11:01.878993 IP server.9918 > client.52176: Flags [.], ack 2, win 29, options [nop,nop,TS val 1101728401 ecr 3524517099], length 0
>
> ... which is just a normal server-initiated close. I wonder if this odd socket state is not producing the expected/necessary error indication somewhere (e.g. ber_flush2) and thus mucking things up.
slapd definitely is not initiating the close though.
Something else to try would be to disable the use of epoll() and go back to
select(), to see if this problem is epoll-specific. In slapd/daemon.c, after
the #include "portable.h" add a #undef HAVE_EPOLL and then recompile and test.
Also, what kernel version(s) are you testing on?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Richard Silverman wrote:
> On Thu, 20 Dec 2012, Howard Chu wrote:
>
>> Please post the entire stack trace for all threads. Since we have
>> c_writers = 8, c_writing = 1, c_writewaiter = 1
>>
>> then one of the threads ought to be blocked on the write2_cv (result.c:373).
>> That implies that they're still waiting for epoll() to say that the socket is
>> writable.
>
> Sure; here is a dump of the stacks of all 18 executing threads:
> [Switching to thread 2 (Thread 0x7f21de7fc700 (LWP 8482))]#0 pthread_cond_wait@@GLIBC_2.3.2 ()
> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> 162 62: movl (%rsp), %edi
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> #1 0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f2210697720, mutex=0x7f22106976d0) at thr_posix.c:277
> #2 0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f2210697720, mutex=0x7f22106976d0) at thr_debug.c:954
> #3 0x000000000043e487 in send_ldap_ber (op=0x7f21e0140260, ber=0x7f21de67a460) at result.c:309
> #4 0x000000000044160b in slap_send_search_entry (op=0x7f21e0140260, rs=0x7f21de7fb920) at result.c:1435
> #5 0x00000000004c5c27 in bdb_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at search.c:1014
> #6 0x000000000042dd6a in fe_op_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at search.c:402
> #7 0x000000000042d5ae in do_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at search.c:247
> #8 0x000000000042a776 in connection_operation (ctx=0x7f21de7fba00, arg_v=0x7f21e0140260) at connection.c:1218
> #9 0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688
> #10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21ec136ad0) at thr_debug.c:770
> #11 0x00000039e3407851 in start_thread (arg=0x7f21de7fc700) at pthread_create.c:301
> #12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
OK, most of the threads look like thread 2.
Thread 13 and thread 16 are waiting for their respective sockets to be writable.
> [Switching to thread 13 (Thread 0x7f21f27fc700 (LWP 8470))]#0 pthread_cond_wait@@GLIBC_2.3.2 ()
> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> 162 62: movl (%rsp), %edi
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> #1 0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f221069b9a8, mutex=0x7f221069b958) at thr_posix.c:277
> #2 0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f221069b9a8, mutex=0x7f221069b958) at thr_debug.c:954
> #3 0x000000000043e716 in send_ldap_ber (op=0x7f21cc191420, ber=0x7f21f267a460) at result.c:379
> #4 0x000000000044160b in slap_send_search_entry (op=0x7f21cc191420, rs=0x7f21f27fb920) at result.c:1435
> #5 0x00000000004c5c27 in bdb_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at search.c:1014
> #6 0x000000000042dd6a in fe_op_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at search.c:402
> #7 0x000000000042d5ae in do_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at search.c:247
> #8 0x000000000042a776 in connection_operation (ctx=0x7f21f27fba00, arg_v=0x7f21cc191420) at connection.c:1218
> #9 0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688
> #10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21f4000f20) at thr_debug.c:770
> #11 0x00000039e3407851 in start_thread (arg=0x7f21f27fc700) at pthread_create.c:301
> #12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> [Switching to thread 16 (Thread 0x7f21f3fff700 (LWP 8467))]#0 pthread_cond_wait@@GLIBC_2.3.2 ()
> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> 162 62: movl (%rsp), %edi
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> #1 0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f22106977c0, mutex=0x7f2210697770) at thr_posix.c:277
> #2 0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f22106977c0, mutex=0x7f2210697770) at thr_debug.c:954
> #3 0x000000000043e716 in send_ldap_ber (op=0x7f21c8111140, ber=0x7f21f3e7d460) at result.c:379
> #4 0x000000000044160b in slap_send_search_entry (op=0x7f21c8111140, rs=0x7f21f3ffe920) at result.c:1435
> #5 0x00000000004c5c27 in bdb_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at search.c:1014
> #6 0x000000000042dd6a in fe_op_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at search.c:402
> #7 0x000000000042d5ae in do_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at search.c:247
> #8 0x000000000042a776 in connection_operation (ctx=0x7f21f3ffea00, arg_v=0x7f21c8111140) at connection.c:1218
> #9 0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688
> #10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21f4000a60) at thr_debug.c:770
> #11 0x00000039e3407851 in start_thread (arg=0x7f21f3fff700) at pthread_create.c:301
> #12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
So the question is why are these threads still waiting on their sockets.
Looking at your logs, it's clear that the error occurs and
connection_closing() is called, which will call connection_abandon() to set
the o_abandon flag on all of the connection's ops.
Can you please provide the output for "print *op" and "print *op->o_hdr"
in thread 13 and thread 16, frame 3?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Richard Silverman wrote:
> As for the rest of the conversation: your insulting and snarky comments
> like
“use grep” and “you haven’t read enough” have killed any interest I had in it.
Let’s just restrict ourselves to something you presumably *are* interested in:
fixing this showstopper bug which locks up your entire server if just a few
client TCP connections fail in a particular way. I will be happy to provide
more information on reproducing the problem, if you need it.
In case you hadn't noticed, I've been investigating and trying to help you.
"Use grep" is nothing more than standard practice for a C programmer. it is
what it is, you're reading an attitude into it that doesn't exist.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/