syncrepl consumer is slow
by Howard Chu
One thing I just noticed, while testing replication with 3 servers on my
laptop - during a refresh, the provider gets blocked waiting to write to
the consumers after writing about 4000 entries. I.e., the consumers
aren't processing fast enough to keep up with the search running on the
provider.
(That's actually not too surprising since reads are usually faster than
writes anyway.)
The consumer code has lots of problems as it is, just adding this note
to the pile.
I'm considering adding an option to the consumer to write its entries
with dbnosync during the refresh phase. The rationale being, there's
nothing to lose anyway if the refresh is interrupted. I.e., the consumer
can't update its contextCSN until the very end of the refresh, so any
partial refresh that gets interrupted is wasted effort - the consumer
will always have to start over from the beginning on its next refresh
attempt. As such, there's no point in safely/synchronously writing any
of the received entries - they're useless until the final contextCSN update.
The implementation approach would be to define a new control e.g. "fast
write" for the consumer to pass to the underlying backend on any write
op. We would also have to e.g. add an MDB_TXN_NOSYNC flag to
mdb_txn_begin() (BDB already has the equivalent flag).
This would only be used for writes that are part of a refresh phase. In
persist mode the provider and consumers' write speeds should be more
closely matched so it wouldn't be necessary or useful.
Comments?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
8 years, 6 months
Enhancing back-sock to use JSON
by Dagobert Michelsen
Hi,
I have made some enhancements to back-sock to use JSON for the passed data and JSON-RPC
to map LDAP calls to method invocations. The function signatures of the JSON-RPC calls
are modeled to be similar to the ones used in json2ldap (which does just the opposite
direction to talk LDAP via JSON-RPC) [1]. The previous hand-crafted format passed on the
socket was harder to parse and needed a manually built parser whereas now a standard
library can be used. However, handling the JSON data structures now imposes an additional
dependency to Jansson (a JSON access library in C) [2]. Jansson itself is leightweight and
has no dependencies itself. Due to the limited use of back-sock and the enhanced ease of
use I think it would be acceptable to add this dependency.
I would be glad if the modification would be possible to be applied to OpenLDAP and I
happily submit a patch.
Best regards
-- Dago
[1] JSON2LDAP interface from JSON-RPC to LDAP
http://connect2id.com/products/json2ldap/web-api#ldap-compare
[2] Jansson, a C library for reading and writing JSON data structures
http://www.digip.org/jansson/
--
"You don't become great by trying to be great, you become great by wanting to do something,
and then doing it so hard that you become great in the process." - xkcd #896
8 years, 8 months
RE24 testing call #3 (2.4.41), LMDB RE0.9 testing call #3 (0.9.15)
by Quanah Gibson-Mount
OpenLDAP 2.4.41 Engineering
Fixed libldap double free of request during abandon (ITS#7967)
Fixed libldap segfault in ldap_sync_initialize (ITS#8001)
Fixed libldap ldif-wrap off by one error (ITS#8003)
Fixed libldap handling of TLS in async mode (ITS#8022)
Fixed libldap null pointer dereference (ITS#8028)
Fixed libldap mutex handling with LDAP_OPT_SESSION_REFCNT (ITS#8050)
Fixed slapd slapadd onetime leak with -w (ITS#8014)
Fixed slapd syncrepl delta-mmr issue with overlays and slapd.conf
(ITS#7976)
Fixed slapd syncrepl mutex for cookie state (ITS#7968)
Fixed slapd syncrepl memory leaks (ITS#8035)
Fixed slapd syncrepl to free presentlist at end of refresh mode
(ITS#8038)
Fixed slapd syncrepl to streamline presentlist (ITS#8042)
Fixed slapd segfault when using matched values control (ITS#8046)
Fixed slapd-mdb minor case typo (ITS#8049)
Fixed slapd-mdb one-level search (ITS#7975)
Fixed slapd-mdb heap corruption (ITS#7965)
Fixed slapd-mdb crash after deleting in-use schema (ITS#7995)
Fixed slapd-mdb minor code cleanup (ITS#8011)
Fixed slapd-mdb to return errors when using incorrect env flags
(ITS#8016)
Fixed slapd-mdb to correctly update search candidates (ITS#8036,
ITS#7904)
Fixed slapd-meta TLS initialization with ldaps URIs (ITS#8022)
Fixed slapo-collect segfault (ITS#7797)
Fixed slapo-constraint with 0 count constraint (ITS#7780,ITS#7781)
Fixed slapo-deref with empty attribute list (ITS#8027)
Fixed slapo-sock result parser for CONTINUE (ITS#8048)
Fixed slapo-syncprov synprov_matchops usage of test_filter
(ITS#8013)
Fixed slapo-syncprov segfault on disconnect/abandon
(ITS#5452,ITS#8012)
Fixed slapo-syncprov memory leak (ITS#8039)
Fixed slapo-syncprov segfault on disconnect/abandon (ITS#8043)
Fixed slapo-unique enforcement of uniqueness with manageDSAit
control (ITS#8057)
Build Environment
Fixed libdb detection with gcc 5.x (ITS#8056)
Enhanced contrib modules build paths (ITS#7782)
Fixed contrib/autogroup internal operation identity
(ITS#8006)
Fixed contrib/passwd/sha2 compiler warning (ITS#8000)
Fixed contrib/noopsrch compiler warning (ITS#7998)
Fixed contrib/dupent compiler warnings (ITS#7997)
Test suite: Added vrFilter test (ITS#8046)
Contrib
Added pbkdf2 sha256 and sha512 schemes (ITS#7977)
Fixed autogroup modification callback responses (ITS#6970)
Documentation
Added ldap_get_option(3) LDAP_FEATURE_INFO_VERSION
information (ITS#8032)
Added ldap_get_option(3) LDAP_OPT_API_INFO_VERSION
information (ITS#8032)
LMDB 0.9.15 Release Engineering
Fix txn init (ITS#7961,#7987)
Fix MDB_PREV_DUP (ITS#7955,#7671)
Fix compact of empty env (ITS#7956)
Added workaround for fdatasync bug in ext3fs
Build
Don't use -fPIC for static lib
Update .gitignore (ITS#7952,#7953)
Cleanup for "make test" (ITS#7841)
Misc. Android/Windows cleanup
Documentation
Fix MDB_APPEND doc
Clarify mdb_dbi_open doc
Thanks!
--Quanah
--
Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra :: the leader in open source messaging and collaboration
8 years, 9 months
Fwd: multiple sequential lmdb readers + spinning media = slow / thrashes?
by Matthew Moskewicz
warnings: new to list, first post, lmdb noob.
i'm a caffe user:
https://github.com/BVLC/caffe
in one use case, caffe sequentially streams though >100GB lmdbs at a rate
of ~30MB/s in blocks of about 40MB. however, if multiple caffe processes
are reading the same lmdb (opened with MDB_RDONLY), read performance
becomes limiting (i.e. the processes become IO bound), even though the disk
has sufficient read bandwidth (say ~180MB/s). some of the relevant caffe
lmdb code is here:
https://github.com/BVLC/caffe/blob/master/src/caffe/util/db.cpp
however, if i *both*
1) run blockdev --setra 65536 --setfra 65536 /dev/sdwhatever
2) modify lmdb to call posix_madvise(env->me_map, env->me_mapsize,
POSIX_MADV_SEQUENTIAL);
then i can get >1 reader to run without being IO limited.
for (2), see https://github.com/moskewcz/scratch/tree/lmdb_seq_read_opt
similarly, using a sequential read microbenchmark designed to model the
caffe reads from here:
https://github.com/moskewcz/boda/blob/master/src/lmdbif.cc
if i run one reader, i get 180MB/s bandwidth.
with two readers, but neither (1) nor (2) above, each gets ~30MB/s
bandwidth.
with (1) and (2) enabled, and two readers, each gets ~90MB/s bandwidth.
any advice?
mwm
PS: backstory (skippable):
caffe originally used LevelDB to get better read performance for
sequentially loading sets of ~1M 227x227x3 raw images (~200GB data).
typically processing time is ~2 hours for this data set size, yielding a
read BW need of 30MB/s or so. it's not really clear if/why LevelDB was uses
aside from the fact that the caffe author was a google intern at the time
he wrote it, but anecdotally i think the claim is that reading the raw
.jpgs had perf. issues, although it's unclear exactly what or why. i guess
it was the usual story about not getting sequential reads without using
LevelDB. they switched to lmdb a while back.
<openldap-devel(a)openldap.org>
8 years, 9 months
syncrepl multicast MMR
by Howard Chu
Been thinking this would be worth trying for a while now. Set a config
option for syncprov to send Persist messages to a multicast group
instead of the original TCP session. All the consumers would also join
the group and listen for updates. This would also exercise the cldap://
support in libldap.
Implementation details: since datagrams are unreliable, we need to
include sequence numbers on each message, which the consumer can check
to make sure it hasn't missed an update. Moreover, it should be able to
send a request to the provider to resend (over the TCP session) the
message corresponding to a given sequence number.
(Currently I envision using a small circular array in the provider to
remember the last N messages for potential retransmit.)
Config: both consumer and provider will need to be configured with a
particular multicast group ID. It should be possible to participate in
more than 1 group at a time (in which case, an update must be explicitly
sent to each active group) but in general, I expect a cluster of
cooperating MMR servers to all use a single multicast group, and so any
update will only need to be forwarded to the network once.
Thoughts?
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
8 years, 9 months
Re: syncrepl multicast MMR
by Emmanuel Lécharny
Le 09/02/15 05:15, Howard Chu a écrit :
> Emmanuel Lécharny wrote:
>> Le 08/02/15 13:52, Howard Chu a écrit :
>>> Been thinking this would be worth trying for a while now. Set a config
>>> option for syncprov to send Persist messages to a multicast group
>>> instead of the original TCP session. All the consumers would also join
>>> the group and listen for updates. This would also exercise the
>>> cldap:// support in libldap.
>>>
>>> Implementation details: since datagrams are unreliable, we need to
>>> include sequence numbers on each message, which the consumer can check
>>> to make sure it hasn't missed an update. Moreover, it should be able
>>> to send a request to the provider to resend (over the TCP session) the
>>> message corresponding to a given sequence number.
>>
>> Ok but how do you detect that a consumer has missed an update, if no
>> other update occurs ? You may have some desunchronized server for quite
>> a long period of time if you don't have a mechinism for the consumer to
>> regularly check if it is up to date.
>
> Good point, but easily solved with a periodic keepalive msg.
One more thing : you will have to deal with TLS at some point. There is
a RFC draft
(https://tools.ietf.org/html/draft-keoh-tls-multicast-security-00) that
proposes something, it seems to be 3 years old, and not active anymore.
8 years, 9 months
Re: syncrepl multicast MMR
by Emmanuel Lécharny
Le 09/02/15 05:15, Howard Chu a écrit :
> Emmanuel Lécharny wrote:
>> Le 08/02/15 13:52, Howard Chu a écrit :
>>> Been thinking this would be worth trying for a while now. Set a config
>>> option for syncprov to send Persist messages to a multicast group
>>> instead of the original TCP session. All the consumers would also join
>>> the group and listen for updates. This would also exercise the
>>> cldap:// support in libldap.
>>>
>>> Implementation details: since datagrams are unreliable, we need to
>>> include sequence numbers on each message, which the consumer can check
>>> to make sure it hasn't missed an update. Moreover, it should be able
>>> to send a request to the provider to resend (over the TCP session) the
>>> message corresponding to a given sequence number.
>>
>> Ok but how do you detect that a consumer has missed an update, if no
>> other update occurs ? You may have some desunchronized server for quite
>> a long period of time if you don't have a mechinism for the consumer to
>> regularly check if it is up to date.
>
> Good point, but easily solved with a periodic keepalive msg.
A heart-beat would be good to have : the producer would periodically
multi-cast the latest CSN, allowing desynchronized servers to catch up.
Another pb iw that Datagram are limited in size, which means big entries
will have to be split in many parts.
8 years, 9 months