openldap-2.3.41
db-4.2.52.NC-PLUS_5_PATCHES
SunOS ldapmaster01.unix 5.10 Generic_127128-11
Our LDAP setup has been running rather well for the last 2 years or so, but
increasingly we have had to restart slapd more and more frequently. Occasional
core-dumps and "very large" process footprints as well.
I suspect that it is perhaps stuck on replication, this message show up on the
clients fairly frequently:
Dec 14 11:30:49 forward01.unix slapd[10494]: [ID 190661 local4.debug] <=
entry_decode: slap_str2undef_ad( 20091027142315Z#000001#00#000000):
AttributeDescription contains inappropriate characters
Dec 14 11:30:49 forward01.unix slapd[10494]: [ID 818565 local4.debug]
null_callback: error code 0x50
Dec 14 11:30:49 forward01.unix slapd[10494]: [ID 776556 local4.debug]
syncrepl_entry: rid 405 be_add failed (80)
Dec 14 11:30:49 forward01.unix slapd[10494]: [ID 747041 local4.debug]
do_syncrepl: rid 405 retrying (9 retries left)
Which makes me think that it has been stuck since 20091027. Is there a way for
me to find out which entry 20091027142315Z#000001#00#000000 refers to, so I can
just delete it? (or fix it).
The pertinent lines in ldapmaster are:
lastmod on
checkpoint 128 15
directory /usr/local/var/openldap-data
index objectClass eq
index uid eq
index uidNumber eq
index mail eq
index mailAlternateAddress pres,eq
index deliveryMode eq
index accountStatus eq
index gecos eq
index radiusGroupName eq
index o pres,eq
index entryCSN,entryUUID eq
index gidNumber eq
overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 100
And each slave has identical conf, except for the RID which is based on the last
octet of the IP address:
lastmod on
checkpoint 128 15
directory /usr/local/var/openldap-data
index objectClass eq
index uid eq
index uidNumber eq
index mail eq
index mailAlternateAddress pres,eq
index deliveryMode eq
index accountStatus eq
index gecos eq
index radiusGroupName eq
index o pres,eq
index entryCSN,entryUUID eq
index gidNumber eq
syncrepl rid=405
provider=ldap://172.20.12.113
type=refreshAndPersist
interval=00:00:00:30
searchbase="ou=mail,dc=gmo,dc=jp"
filter="(objectClass=*)"
attrs="*"
scope=sub
schemachecking=off
updatedn="cn=admin,dc=gmo,dc=jp"
bindmethod=simple
binddn="cn=admin,dc=gmo,dc=jp"
credentials="*censored*"
retry="60 10 300 +"
updateref ldap://172.20.12.113
In general, do they seem reasonable? Everything has been appearing to be
correct, except for the 20091027 entry. What is the recommended procedure?
Reading the documentation makes me think I should also index "ContextCSN"?
A recent ldapmaster core was triggered here:
#0 0xce7948da in t_delete () from /lib/libc.so.1
#1 0xce79417f in _malloc_unlocked () from /lib/libc.so.1
#2 0xce794058 in malloc () from /lib/libc.so.1
#3 0x08133175 in ber_memalloc_x ()
#4 0x08133487 in ber_dupbv_x ()
#5 0x0813352b in ber_dupbv ()
#6 0x0807c22e in attr_dup ()
... which I believe is replication-related due to a post made earlier in the
list. Most likely also fixed in later versions.
Thanks for any reply,
Jorgen Lundman
--
Jorgen Lundman | <lundman(a)lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)