(ITS#5232) Running out of BDB locks causes truncated searches but no error code
by unix.gurus@gmail.com
Full_Name: Sean Burford
Version: 2.3.32
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (65.57.245.11)
Searches that span periods of BDB lock exhaustion may return truncated results
with a success error code. It should return an err=80 failure.
This was with a BDB 4.4.20 backend.
In the log below you can see:
a search starts at 11:00:01
the search returns success:240 entries at 11:00:03
a search starts at 11:15:01
bdb runs out of locks at 11:15:02, informs the mod operation that it failed
the search returns success:128 entries at 11:15:02
Both searches were identical.
Nov 9 11:00:01 conn=71131 op=2 SRCH base="dc=example,dc=com" scope=2 deref=0
filter="(objectClass=exampleClass)"
Nov 9 11:00:01 conn=71131 op=2 SRCH attr=* +
Nov 9 11:00:03 conn=71131 op=2 SEARCH RESULT tag=101 err=0 nentries=240 text=
...
Nov 9 11:15:01 conn=72711 op=2 SRCH base="dc=example,dc=com" scope=2 deref=0
filter="(objectClass=exampleClass)"
Nov 9 11:15:01 conn=72711 op=2 SRCH attr=* +
...
Nov 9 11:15:02 bdb(dc=example,dc=com): Lock table is out of available locks
Nov 9 11:15:02 => bdb_idl_delete_key: c_get id failed: Cannot allocate memory
(12)
Nov 9 11:15:02 bdb(dc=example,dc=com): Lock table is out of available locks
Nov 9 11:15:02 bdb(dc=example,dc=com): Lock table is out of available locks
Nov 9 11:15:02 bdb(dc=example,dc=com): Lock table is out of available locks
...
Nov 9 11:15:02 Attribute index delete failure
Nov 9 11:15:02 bdb(dc=example,dc=com): Lock table is out of available locks
Nov 9 11:15:02 bdb(dc=example,dc=com): Lock table is out of available locks
Nov 9 11:15:02 bdb(dc=example,dc=com): Lock table is out of available locks
Nov 9 11:15:02 conn=72642 op=10 RESULT tag=103 err=80 text=
Nov 9 11:15:02 conn=72131 op=144 RESULT tag=103 err=80 text=internal error
...
Nov 9 11:15:02 conn=72711 op=2 SEARCH RESULT tag=101 err=0 nentries=128 text=
Nov 9 11:15:05 conn=72711 op=3 UNBIND
Nov 9 11:15:05 conn=72711 fd=57 closed ()
The database definition is:
database bdb
suffix "dc=example,dc=com"
directory /var/lib/ldap
overlay auditlog
auditlog /var/lib/ldap/ldif/auditlog/audit.com.ldif
overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 1000
overlay accesslog
logdb cn=accesslog
logops writes
logsuccess TRUE
logpurge 02+23:46 01+23:46
# This limits section applies to the user this bug is about...
limits dn.exact="uid=user1,ou=people,dc=example,dc=com"
time.soft=unlimited time.hard=unlimited size.soft=unlimited
size.hard=unlimited
cachesize 100000
idlcachesize 10000
sizelimit 200000
checkpoint 512 1
lastmod on
idletimeout 300
threads 64
16 years
Re: (ITS#5221) cache? of parent failes for hdb
by Dan.Oscarsson@tietoenator.com
mån 2007-11-12 klockan 23:17 +0000 skrev quanah(a)zimbra.com:
> --On Monday, November 12, 2007 7:02 AM +0000 hyc(a)symas.com wrote:
>
> > Dan.Oscarsson(a)tietoenator.com wrote:
> >> Full_Name: Dan Oscarsson
> >> Version: 2.3.32
> >> OS: SLES 10
> >> URL: ftp://ftp.openldap.org/incoming/
> >> Submission from: (NULL) (193.15.240.60)
> >
> Also, Just some general data on what it is you are doing that is a bit more
> explanative. For example, what does your tree layout look like? A single
> root with 20,000+ subtrees off of the root? A root with 10 subtrees, with
> thousands of subtrees off of those? How are you doing these modrdn's?
> Why, exactly? Anything that can help us to possibly come up with a
> progamatic generation of dummy data.
The tree is based on a company organisation structure, branching at each
organisational level util you get to a person. So the root has only a
few subtrees, and each subtree only a few (say 1-30) subtrees. Most
entries will probably a subtree for a unit with the persons beloning to
that unit. People will always be leafs but an entry may contain both
people and subtrees as children.
I have triggered the bug when doing major reorganisation of people. Then
very many of the people are moved around to a new place in the tree.
Many to newly created subtrees.
I have now used 2.3.38 compiled myself so I can add tracing to the code
if needed. Running a move of people, with logging set to any,
and then running my simple check program, the search at a subtree that
goes wrong (entries are missing) is logged as:
filter: (&(objectClass=enterprisePerson)(ba=telecom & media)(bu=telecom
solutions))
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: search_candidates: base=\"bu=telecom solutions,ba=telecom & media,cn=organisation,o=xx\" (0x0000513a) scope=2
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => hdb_dn2idl(\"bu=telecom solutions,ba=telecom & media,cn=organisation,o=xx\")
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_filter_candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: AND
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_list_candidates 0xa0
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_filter_candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: OR
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_list_candidates 0xa1
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_filter_candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: EQUALITY
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_equality_candidates (objectClass)
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => key_read
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: bdb_idl_fetch_key: [b49d1940]
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_index_read: failed (-30989)
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_equality_candidates: id=0, first=0, last=0
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_filter_candidates: id=0 first=0 last=0
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_filter_candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: AND
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_list_candidates 0xa0
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_filter_candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: EQUALITY
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_equality_candidates (objectClass)
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => key_read
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: bdb_idl_fetch_key: [a98323a6]
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_index_read 20569 candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_equality_candidates: id=20569, first=1483, last=26650
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_filter_candidates: id=20569 first=1483 last=26650
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_filter_candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: EQUALITY
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_equality_candidates (ba)
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => key_read
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: bdb_idl_fetch_key: [410ca247]
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_index_read 6991 candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_equality_candidates: id=6991, first=15620, last=26644
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_filter_candidates: id=6991 first=15620 last=26644
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_filter_candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: EQUALITY
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => bdb_equality_candidates (bu)
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: => key_read
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: bdb_idl_fetch_key: [8b819570]
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_index_read 2195 candidates
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_equality_candidates: id=2195, first=30, last=26614
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_filter_candidates: id=2195 first=30 last=26614
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_list_candidates: id=2144 first=20813 last=26614
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_filter_candidates: id=2144 first=20813 last=26614
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_list_candidates: id=2144 first=20813 last=26614
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_filter_candidates: id=2144 first=20813 last=26614
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_list_candidates: id=2130 first=20813 last=26614
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: <= bdb_filter_candidates: id=2130 first=20813 last=26614
2007-11-13 15.19.05 ra [local4.debug] slapd[27292]: bdb_search_candidates: id=2130 first=20813 last=26614
Stopping and starting the server. The check now works. The above search is logged the same
except for the last part:
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: bdb_idl_fetch_key: [8b819570]
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: <= bdb_index_read 2195 candidates
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: <= bdb_equality_candidates: id=2195, first=30, last=26614
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: <= bdb_filter_candidates: id=2195 first=30 last=26614
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: <= bdb_list_candidates: id=2144 first=20813 last=26614
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: <= bdb_filter_candidates: id=2144 first=20813 last=26614
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: <= bdb_list_candidates: id=2144 first=20813 last=26614
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: <= bdb_filter_candidates: id=2144 first=20813 last=26614
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: <= bdb_list_candidates: id=2141 first=20813 last=26614
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: <= bdb_filter_candidates: id=2141 first=20813 last=26614
2007-11-13 15.19.41 ra [local4.debug] slapd[27460]: bdb_search_candidates: id=2141 first=20813 last=26614
In both above cases, with the same filter but search base
set to o=xx witch is the database prefix, it works as it should.
In this case the entries that were missing above is found.
I do not know it this might give you some clue to what the problem is.
Any suggestions on what I should look for or any suitable place
in the code where I could add debug logging, would be nice.
The logs get very large when running with logging set to any.
Dan
--
Dan Oscarsson
TietoEnator Email: Dan.Oscarsson(a)tietoenator.com
Box 85
201 20 Malmo, Sweden
16 years
Re: (ITS#5221) cache? of parent failes for hdb
by quanah@zimbra.com
--On Monday, November 12, 2007 7:02 AM +0000 hyc(a)symas.com wrote:
> Dan.Oscarsson(a)tietoenator.com wrote:
>> Full_Name: Dan Oscarsson
>> Version: 2.3.32
>> OS: SLES 10
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (193.15.240.60)
>
>> From this I suspect that it is the cache of which parent node belongs to
>> that gets corrupted. Or can it be something else?
>> What more could I do to trace down the bug?
>> I do not know if the above information is enough for you to find what is
>> wrong? Cannot include data as it contains company internal information
>> and a simple test program did not give the same error.
>> I have looked at the code, but it takes some time to understand it when
>> doing it for the first time. maybe som debugging code could be added
>> find the place where the bug is.
>
> This isn't a lot of information to go on. If you can create a test
> program that shows the problem occurring, using dummy data, that would
> help. --
Also, Just some general data on what it is you are doing that is a bit more
explanative. For example, what does your tree layout look like? A single
root with 20,000+ subtrees off of the root? A root with 10 subtrees, with
thousands of subtrees off of those? How are you doing these modrdn's?
Why, exactly? Anything that can help us to possibly come up with a
progamatic generation of dummy data.
And, you said you had a script to do this, can you include that?
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
16 years
Re: (ITS#5183) Index problem with ;x- attributes.
by hyc@symas.com
ando(a)sys-net.it wrote:
> Magnus.Jonsson(a)umdac.umu.se wrote:
>> Full_Name: Magnus Jonsson
>> Version: 2.3.38
>> OS: Debian GNU/linux ”Etch”
>> URL: http://foo.fot.nu/reproduce.txt
>> Submission from: (NULL) (130.239.200.171)
>> We are using ;x- attributes in a specific appliation to group some attributes.
>>
>> When removing a ;x- attrbute all the indexs for that attribute disapears.
>>
>> example:
>>
>> cn: index
>> cn;x-f-1: index
>> cn;x-f-2: index
>>
>> If a remove the cn;x-f-1 attribute I can't search for (cn=index) anymore.
>
> I confirm your report, except that if the type and all subtypes have the
> same value, and if I remove the "cn;x-f-1" value, I can no longer search
> for "cn" equality, but I can still search for "cn;x-f-2".
This seems to be an old problem resurfacing, hash collisions in the index
where multiple values hash to the same index slot. We either need to use
ref-counts, or we just quit deleting index values at runtime, and require an
explicit garbage-collection pass of some kind to regenerate indices.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
16 years
Re: (ITS#5227) old dsn acceptance
by quanah@zimbra.com
--On Monday, November 12, 2007 2:18 PM +0000 ando(a)sys-net.it wrote:
> Howard Chu wrote:
>
>> Ah, ok. So, do you think we need to integrate this patch?
>
> Do you mean: in 2.3? It might be a good idea in case we want all
> versions to be completely interoperable. Otherwise, as the code is now,
> 2.4 tolerates 2.3 (and 2.2, AFAIK), while 2.2 and 2.3 do not tolerate
> 2.4 (or, which is worse, tolerate but don't understand 2.4: issues could
> arise when comparing CSNs generated by different versions, which only
> 2.4 correctly handles by normalizing to its form). Eventually, this
> could be a problem as soon as someone tries to use 2.4 as master and 2.3
> as slave.
I'm not sure how much we should support any release older than 2.3 when
combined with 2.4. Particularly something as ancient as 2.1. As far as
replication goes, I think given the timestamp changes, the only supported
format would be a 2.3 master with 2.4 slaves. Just my 2c. ;)
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
16 years
Re: (ITS#5229) `make install' writes to the source tree as root
by mills@cc.umanitoba.ca
On Sun, Nov 11, 2007 at 04:19:01PM -0800, Howard Chu wrote:
>
> Agreed, it should not be relinking, ever, but that's libtool. I've
> submitted patches to libtool to correct this a number of times over the
> past several years but the problem remains, and it's our policy to avoid
> using customized OpenLDAP-specific versions of the GNU tools so we no
> longer provide patched versions of libtool in our CVS.
Yes, I understand your frustration with libtool and I don't expect
OpenLDAP to include a patched version. Still, the OpenLDAP Makefiles
could be modified to avoid libtool's relinking behavior.
> In the meantime, there's no reason for this to be affecting your source
> tree. You can always build using an object tree separate from the source
> tree. And you can always use "make install DESTDIR=/tmp/foo" on the build
> system to create an alternate hierarchy that can then be copied to any
> other machine.
I didn't realize that I could do that, but it's still second best.
Copying doesn't preserve an existing slapd.conf, for example. `make
install' would be superior if it avoided writing to the source tree.
This is easy enough to accomplish if it's made part of the policy.
> Since there is no fundamental bug in OpenLDAP here, this ITS will be closed.
It's certainly not a bug in OpenLDAP, but just in the build and install
system that accompanies OpenLDAP. I'd consider it a bug.
--
-Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
16 years
Re: (ITS#5227) old dsn acceptance
by ando@sys-net.it
Howard Chu wrote:
> Ah, ok. So, do you think we need to integrate this patch?
Do you mean: in 2.3? It might be a good idea in case we want all
versions to be completely interoperable. Otherwise, as the code is now,
2.4 tolerates 2.3 (and 2.2, AFAIK), while 2.2 and 2.3 do not tolerate
2.4 (or, which is worse, tolerate but don't understand 2.4: issues could
arise when comparing CSNs generated by different versions, which only
2.4 correctly handles by normalizing to its form). Eventually, this
could be a problem as soon as someone tries to use 2.4 as master and 2.3
as slave.
p.
Ing. Pierangelo Masarati
OpenLDAP Core Team
SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: pierangelo.masarati(a)sys-net.it
---------------------------------------
16 years
Re: (ITS#5227) old dsn acceptance
by hyc@symas.com
Pierangelo Masarati wrote:
> hyc(a)symas.com wrote:
>
>> It seems a bit odd to be introducing support for that format now in 2.4, when
>> it was completely unsupported in 2.2 and 2.3. I.e., why wasn't this issue
>> raised during the migration to/thru either 2.2 or 2.3?
>
> Probably because until 2.4
>
> #define csnValidate blobValidate
>
> and as such any incompatible CSN was silently ignored. So this has
> always been broken, only 2.4 detects the inconsistency, but behaved
> badly (assert) until your fix.
Ah, ok. So, do you think we need to integrate this patch?
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
16 years