openldap-2.3.41 db-4.2.52.NC-PLUS_5_PATCHES SunOS ldapmaster01.unix 5.10 Generic_127128-11
Our LDAP setup has been running rather well for the last 2 years or so, but increasingly we have had to restart slapd more and more frequently. Occasional core-dumps and "very large" process footprints as well.
I suspect that it is perhaps stuck on replication, this message show up on the clients fairly frequently:
Dec 14 11:30:49 forward01.unix slapd[10494]: [ID 190661 local4.debug] <= entry_decode: slap_str2undef_ad( 20091027142315Z#000001#00#000000): AttributeDescription contains inappropriate characters
Dec 14 11:30:49 forward01.unix slapd[10494]: [ID 818565 local4.debug] null_callback: error code 0x50
Dec 14 11:30:49 forward01.unix slapd[10494]: [ID 776556 local4.debug] syncrepl_entry: rid 405 be_add failed (80)
Dec 14 11:30:49 forward01.unix slapd[10494]: [ID 747041 local4.debug] do_syncrepl: rid 405 retrying (9 retries left)
Which makes me think that it has been stuck since 20091027. Is there a way for me to find out which entry 20091027142315Z#000001#00#000000 refers to, so I can just delete it? (or fix it).
The pertinent lines in ldapmaster are:
lastmod on checkpoint 128 15 directory /usr/local/var/openldap-data
index objectClass eq index uid eq index uidNumber eq index mail eq index mailAlternateAddress pres,eq index deliveryMode eq index accountStatus eq index gecos eq index radiusGroupName eq index o pres,eq index entryCSN,entryUUID eq index gidNumber eq
overlay syncprov
syncprov-checkpoint 100 10 syncprov-sessionlog 100
And each slave has identical conf, except for the RID which is based on the last octet of the IP address:
lastmod on checkpoint 128 15 directory /usr/local/var/openldap-data
index objectClass eq index uid eq index uidNumber eq index mail eq index mailAlternateAddress pres,eq index deliveryMode eq index accountStatus eq index gecos eq index radiusGroupName eq index o pres,eq index entryCSN,entryUUID eq index gidNumber eq
syncrepl rid=405 provider=ldap://172.20.12.113 type=refreshAndPersist interval=00:00:00:30 searchbase="ou=mail,dc=gmo,dc=jp" filter="(objectClass=*)" attrs="*" scope=sub schemachecking=off updatedn="cn=admin,dc=gmo,dc=jp" bindmethod=simple binddn="cn=admin,dc=gmo,dc=jp" credentials="*censored*" retry="60 10 300 +"
updateref ldap://172.20.12.113
In general, do they seem reasonable? Everything has been appearing to be correct, except for the 20091027 entry. What is the recommended procedure? Reading the documentation makes me think I should also index "ContextCSN"?
A recent ldapmaster core was triggered here: #0 0xce7948da in t_delete () from /lib/libc.so.1 #1 0xce79417f in _malloc_unlocked () from /lib/libc.so.1 #2 0xce794058 in malloc () from /lib/libc.so.1 #3 0x08133175 in ber_memalloc_x () #4 0x08133487 in ber_dupbv_x () #5 0x0813352b in ber_dupbv () #6 0x0807c22e in attr_dup ()
... which I believe is replication-related due to a post made earlier in the list. Most likely also fixed in later versions.
Thanks for any reply,
Jorgen Lundman