openldap-devel September 2009

openldap-devel@openldap.org

17 participants
14 discussions

entry_free() etc. bottlenecks
by Howard Chu 16 Sep '09

16 Sep '09

We introduced entry_alloc/entry_free and attr_alloc/attr_free to avoid the severe heap fragmentation problems we were encountering with glibc malloc. However the current implementation is pretty suboptimal, using a global mutex for the entry and alloc free lists. This scales very poorly on multiprocessor machines. The obvious fix is to adopt the same strategies that tcmalloc uses. (And unfortunately we can't simply rely on tcmalloc always being available, or always being stable in a given environment.) I.e., use per-thread cached free lists. We maintain some small number of free objects per thread; this per-thread free list can be used without locking. When the number of free objects on a given thread exceeds a particular threshold then we obtain the global lock to return some number of objects to the global list. In practice this threshold can be very small - any given thread typically needs no more than 4 entries at a time. (ModDN is the worst case at 3 entries locked at once. LDAP TXNs would distort this figure but not in any critical fashion.) For attributes the typical usage is much more variable, but any number we pick will be an improvement over the current code. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

2 1

back-mdb - futures...
by Howard Chu 11 Sep '09

11 Sep '09

Just some thoughts on what I'd like to see in a new memory-based backend... One of the complaints about back-bdb/hdb is the complexity in the tuning; there are a number of different components that need to be balanced against each other and the proper balance point varies depending on data size and workload. One of the directions we were investigating a couple years back was mechanisms for self-tuning of the caches. (This was essentially the thrust of Jong-Hyuk Choi's work with zoned allocs for the back-bdb entry cache; it would allow large chunks of the entry cache to be discarded on demand when system memory pressure increased.) Unfortunately Jong hasn't been active on the project in a while and it doesn't appear that anyone else was tracking that work. Self-tuning is still a goal but it seems to me to be attacking the wrong problem. One of the things that annoys me with the current BerkeleyDB based design is that we have 3 levels of cache operating at all times - filesystem, BDB, and slapd. This means at least 2 memory copy operations to get any piece of data from disk into working memory, and you have to play games with the OS to minimize the waste in the FS cache. (E.g. on Linux, tweak the swappiness setting.) Back in the 80s I spent a lot of time working on the Apollo DOMAIN OS, which was based on the M68K platform. One of their (many) claims to fame was the notion of a single-level store: the processor architecture supported a full 32 bit address space but it was uncommon for systems to have more than 24 bits worth of that populated, and nobody had anywhere near 1GB of disk space on their entire network. As such, every byte of available disk space could be directly mapped to a virtual memory address, and all disk I/O was done thru mmaps and demand paging. As a result, memory management was completely unified and memory usage was extremely efficient. These days you could still take that sort of approach, though on a 32 bit machine a DB limit of 1-2GB may not be so useful any more. However, with the ubiquity of 64 bit machines, the idea becomes quite attractive again. The basic idea is to construct a database that is always mmap'd to a fixed virtual address, and which returns its mmap'd data pages directly to the caller (instead of copying them to a newly allocated buffer). Given a fixed address, it becomes feasible to make the on-disk record format identical to the in-memory format. Today we have to convert from a BER-like encoding into our in-memory format, and while that conversion is fast it still takes up a measurable amount of time. (Which is one reason our slapd entry cache is still so much faster than just using BDB's cache.) So instead of storing offsets into a flattened data record, we store actual pointers (since they all simply reside in the mmap'd space). Using this directly mmap'd approach immediately eliminates the 3 layers of caching and brings it down to 1. As another benefit, the DB would require *zero* cache configuration/tuning - it would be entirely under the control of the OS memory manager, and its resident set size would grow or shrink dynamically without any outside intervention. It's not clear to me that we can modify BDB to operate in this manner. It currently supports mmap access for read-only DBs, but it doesn't map to fixed addresses and still does alloc/copy before returning data to the caller. Also, while BDB development continues, the new development is mainly occurring in areas that don't matter to us (e.g. BDB replication) and the areas we care about (B-tree performance) haven't really changed much in quite a while. I've mentioned B-link trees a few times before on this list; they have much lower lock contention than plain B-trees and thus can support even greater concurrency. I've also mentioned them to the BDB team a few times and as yet they have no plans to implement them. (Here's a good reference: http://www.springerlink.com/content/eurxct8ewt0h3rxm/ ) As such, it seems likely that we would have to write our own DB engine to pursue this path. (Clearly such an engine must still provide full ACID transaction support, so this is a non-trivial undertaking.) Whether and when we embark on this is unclear; this is somewhat of an "ideal" design and as always, "good enough" is the enemy of "perfect" ... This isn't a backend we can simply add to the current slapd source base, so it's probably an OpenLDAP 3.x target: In order to have a completely canonical record on disk, we also need pointers to AttributeDescriptions to be recorded in each entry and those AttributeDescription pointers must also be persistent. Which means that our current AttributeDescription cache must be modified to also allocate its records from a fixed mmap'd region. (And we'll have to include a schema-generation stamp, so that if schema elements are deleted we can force new AD pointers to be looked up when necessary.) (Of course, given the self-contained nature of the AD cache, we can probably modify its behavior in this way without impacting any other slapd code...) There's also a potential risk to leaving all memory management up to the OS - the native memory manager on some OS's (e.g. Windows) is abysmal, and the CLOCK-based cache replacement code we now use in the entry cache is more efficient than the LRU schemes that some older OS versions use. So we may get into this and decide we still need to play games with mlock() etc. to control the cache management. That would be an unfortunate complication, but it would still allow us to do simpler tuning than we currently need. Still, establishing a 1:1 correspondence between virtual memory addresses and disk addresses is a big win for performance, scalability, and reduced complexity (== greater reliability)... (And yes, by the way, we have planning for LDAPCon2009 this September in the works; I imagine the Call For Papers will go out in a week or two. So now's a good time to pull up whatever other ideas you've had in the back of your mind for a while...) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

5 12

RE24 testing round 3
by Quanah Gibson-Mount 01 Sep '09

01 Sep '09

Please test. Thanks! --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

10 14

Fwd: Re: Plans for OpenLDAP 2.4?
by Howard Chu 01 Sep '09

01 Sep '09

Any thoughts on adding libldif as an installed library? I don't see it having any particular stability impact for our code, but it may give packagers a bit of heartburn if we add a new library in the middle of a release stream. re: MozNSS multi-init support - we have already been rolling the MozNSS code into RE24, but we've been omitting the configure switches for it because the code is still incomplete. Multi-init is one of the key features needed before this code will be safe for use by general applications. The issue of ciphersuite management still needs to be addressed as well. (The current code is usable, under very controlled circumstances...) -------- Original Message -------- Subject: Re: Plans for OpenLDAP 2.4? Date: Mon, 31 Aug 2009 15:38:17 -0600 From: Rich Megginson <rmeggins(a)redhat.com> Organization: Sec Eng (Directory Server) To: Howard Chu <hyc(a)symas.com> Howard Chu wrote: > Rich Megginson wrote: >> Do you have a date after which 2.4 branch development will be closed for >> new enhancements? We have 2 more features which we would like to get >> into OpenLDAP in order to be able to use it in our projects: >> 1) support for NSS multi-init - the guy working on this has a prototype >> working and has submitted the code to NSS upstream - no ETA yet >> 2) support for libldif >> >> How much time do we have in order to get these features into the 2.4 >> branch? > > It seems like we can add both of those without destabilizing any of > the existing code, so it should be OK. A 2.4.18 release candidate is > being tested now so it's probably too late for this cut. I'll raise > this question on the openldap-devel list. Ok. Thanks.

2 1

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-devel September 2009