Full_Name: Ali Pouya Version: 2.3.36 OS: Linux 2.6 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (145.242.11.4)
My directory is formed with 4 servers syncronized through syncrepl (RefreshAndPersist). THe directory contains about 10 million entries in the BDB back-end. My slapd servers are configured with the default 16 threads. I have a JAVA injector for write operations. The injector establishes 28 connections to the master and writes about 1400 entries per minute.
After several hours of activity the master hangs. It accepts TCP connections but does not handle LDAP operations. LDAP requests remain hanged until interruption by the client. The replicas are OK.
Is this problem already known ? I checked the CHANGES file for the 2.3.40 release and did not find any trace of a known problem.
I used strace to see what slapd does at hang (freeze) situation. It shows an infinite epoll loop in one of the threads.
I reproduce the result at the end of this ITS.
Thanks for your help Best regards Ali
====================
The strace result :
Process 16970 attached with 18 threads - interrupt to quit [pid 16601] futex(0xa341abf8, FUTEX_WAIT, 16625, NULL <unfinished ...> [pid 16625] time( <unfinished ...> [pid 16626] futex(0xb7f4d3d0, FUTEX_WAIT, 9, NULL <unfinished ...> [pid 16627] futex(0xb7f4c110, FUTEX_WAIT, 21, NULL <unfinished ...> [pid 20161] futex(0xb7f53708, FUTEX_WAIT, 3, NULL <unfinished ...> [pid 20162] futex(0xb7f4c9a8, FUTEX_WAIT, 19, NULL <unfinished ...> [pid 31353] futex(0xb7f4b5bc, FUTEX_WAIT, 11, NULL <unfinished ...> [pid 31354] futex(0xb7f4980c, FUTEX_WAIT, 1, NULL <unfinished ...> [pid 31355] futex(0xb7f4d178, FUTEX_WAIT, 11, NULL <unfinished ...> [pid 2229] futex(0xb7f4fd9c, FUTEX_WAIT, 1, NULL <unfinished ...> [pid 2241] futex(0xb7f48cb8, FUTEX_WAIT, 5, NULL <unfinished ...> [pid 2242] futex(0xb7f50440, FUTEX_WAIT, 3, NULL <unfinished ...> [pid 2243] futex(0xb7f49d20, FUTEX_WAIT, 17, NULL <unfinished ...> [pid 16228] futex(0xb7f523e4, FUTEX_WAIT, 17, NULL <unfinished ...> [pid 16899] futex(0xb7f500bc, FUTEX_WAIT, 19, NULL <unfinished ...> [pid 16916] futex(0xb7f4aa04, FUTEX_WAIT, 29, NULL <unfinished ...> [pid 16969] futex(0xb7f52060, FUTEX_WAIT, 23, NULL <unfinished ...> [pid 16970] futex(0xb7f4c87c, FUTEX_WAIT, 27, NULL <unfinished ...> [pid 16625] <... time resumed> NULL) = 1203063506 [pid 16625] epoll_wait(6, {{EPOLLERR|EPOLLHUP, {u32=153710248, u64=153710248}}, {EPOLLERR|EPOLLHUP, {u32=153710244, u64=15371 0244}}, {EPOLLERR|EPOLLHUP, {u32=153710232, u64=153710232}}, {EPOLLERR|EPOLLHUP, {u32=153710220, u64=153710220}}, {EPOLLERR|E POLLHUP, {u32=153710204, u64=153710204}}, {EPOLLERR|EPOLLHUP, {u32=153710196, u64=153710196}}, {EPOLLERR|EPOLLHUP, {u32=15371 0224, u64=153710224}}, {EPOLLERR|EPOLLHUP, {u32=153710192, u64=153710192}}, {EPOLLERR|EPOLLHUP, {u32=153710180, u64=153710180 }}, {EPOLLERR|EPOLLHUP, {u32=153710176, u64=153710176}}, {EPOLLERR|EPOLLHUP, {u32=153710164, u64=153710164}}, {EPOLLERR|EPOLL HUP, {u32=153710160, u64=153710160}}, {EPOLLERR|EPOLLHUP, {u32=153710144, u64=153710144}}, {EPOLLERR|EPOLLHUP, {u32=153710140 , u64=153710140}}, {EPOLLERR|EPOLLHUP, {u32=153710128, u64=153710128}}, {EPOLLERR|EPOLLHUP, {u32=153710116, u64=153710116}}, {EPOLLERR|EPOLLHUP, {u32=153710108, u64=153710108}}, {EPOLLERR|EPOLLHUP, {u32=153710104, u64=153710104}}, {EPOLLERR|EPOLLHUP, {u32=153710100, u64=153710100}}, {EPOLLERR|EPOLLHUP, {u32=153710096, u64=153710096}}, {EPOLLERR|EPOLLHUP, {u32=153710092, u6 4=153710092}}, {EPOLLERR|EPOLLHUP, {u32=153710088, u64=153710088}}, {EPOLLERR|EPOLLHUP, {u32=153710080, u64=153710080}}, {EPO LLERR|EPOLLHUP, {u32=153710068, u64=153710068}}, {EPOLLERR|EPOLLHUP, {u32=153710056, u64=153710056}}, {EPOLLERR|EPOLLHUP, {u3 2=153710052, u64=153710052}}, {EPOLLERR|EPOLLHUP, {u32=153710048, u64=153710048}}, {EPOLLERR|EPOLLHUP, {u32=153710044, u64=15 3710044}}, {EPOLLERR|EPOLLHUP, {u32= ..........