I observed similar behaviour when I had invalid data or violated an objectclass or rule, such as unique uid.
Find out what attribute or record the sync is working on, and make sure that you are not violating object classes or turn off your unique or other overlays that may cause restrictions to see if that helps. (suggestion)
While this is not a solution, it may help diagnose the problem and let you replicate in the interim
Sellers
openldap wrote:
Hi listers
we are running 3 openldap servers: one as provider with openldap-servers-2.3.39-3.fc8
the other 2 as consumers with openldap-servers-2.4.8-6.fc9.i386
the provider runs fedora 8, the consumers run fedora 9.
as i am aware that the slurpd mechanism for synchronization is no longer supported in openldap-2.4.xxx, i tried to install syncrepl, all to no avail.
........
the actual situation is that as soon as i do a change in the provider, the consumers immediatly get a segmentation fault and leave the bdb in an unusable state. when i restart the consumers, they inform me about the unusable db, refresh it and redo the pending syncrepl-request, which again makes them do a segmentation-fault.
we use sleepycat dbs exclusively. the version on the consumers is db4-4.6.21-5.fc9.i386
the config with regard to syncrepl is:
---- in the provider:
... index entryCSN,entryUUID eq .... modulepath /usr/lib/openldap/ moduleload syncprov.la ...
# syncrepl as a provider overlay syncprov #syncprov-checkpoint 100 10 syncprov-sessionlog 100
(i commented out syncproc-checkpoint after having read a corresponding issue from a mailing-list)
---- in the consumers
... index entryUUID,entryCSN eq ... modulepath /usr/lib/openldap/ moduleload syncprov.la ... # syncrepl as a consumer syncrepl rid=1 provider=ldap://ldapadmin.mydom.com/ binddn="cn=manager,dc=mydom,dc=com" bindmethod=simple credentials=XXXXXXXXXXXXXX searchbase="dc=mydom,dc=com" filter="(objectClass=*)"
attrs="*,structuralObjectClass,entryUUID,entryCSN, \ creatorsName,createTimestamp,modifiersName,modifyTimestamp" schemachecking=off scope=sub type=refreshAndPersist retry="5 5 300 5"
..............
when doing a change on the provider, the providers log is:
Jul 24 15:01:18 violina slapd[17811]: conn=25 fd=33 ACCEPT from IP=xxx.xxx.xxx.163:34702 (IP=0.0.0.0 :636) Jul 24 15:01:18 violina slapd[17811]: conn=25 fd=33 TLS established tls_ssf=56 ssf=56 Jul 24 15:01:18 violina slapd[17811]: conn=25 op=0 BIND dn="" method=128 Jul 24 15:01:18 violina slapd[17811]: conn=25 op=0 RESULT tag=97 err=0 text= Jul 24 15:01:18 violina slapd[17811]: conn=25 op=1 SRCH base="" scope=0 deref=2 filter="(objectClass =*)" Jul 24 15:01:18 violina slapd[17811]: conn=25 op=1 SRCH attr=namingContexts Jul 24 15:01:18 violina slapd[17811]: conn=25 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text= Jul 24 15:01:18 violina slapd[17811]: conn=25 op=2 BIND dn="cn=config" method=163 Jul 24 15:01:18 violina slapd[17811]: conn=25 op=2 RESULT tag=97 err=14 text= Jul 24 15:01:18 violina slapd[17811]: conn=25 op=3 BIND dn="cn=config" method=163 Jul 24 15:01:18 violina slapd[17811]: conn=25 op=3 BIND authcid="myuser@ldap" authzid="myuser@ldap" Jul 24 15:01:18 violina slapd[17811]: conn=25 op=3 BIND dn="cn=myuser,ou=pam-ldap,dc=mydom,dc=com" mec h=DIGEST-MD5 sasl_ssf=0 ssf=56 Jul 24 15:01:18 violina slapd[17811]: conn=25 op=3 RESULT tag=97 err=0 text= Jul 24 15:01:18 violina slapd[17811]: conn=25 op=4 MOD dn="cn=ACCUG,ou=LDIF-Test,dc=mydom,dc=com" Jul 24 15:01:18 violina slapd[17811]: conn=25 op=4 MOD attr=description Jul 24 15:01:18 violina slapd[17811]: conn=25 op=4 RESULT tag=103 err=0 text= Jul 24 15:01:18 violina slapd[17811]: conn=25 fd=33 closed (connection lost)
when the syncrepl from one of the consumers comes in, the providers log is:
Jul 24 15:13:03 violina slapd[17811]: conn=109 fd=35 ACCEPT from IP=xxx.xxx.xxx.165:59830 (IP=0.0.0.0:389) Jul 24 15:13:03 violina slapd[17811]: conn=109 op=0 BIND dn="cn=manager,dc=mydom,dc=com" method=128 Jul 24 15:13:03 violina slapd[17811]: conn=109 op=0 BIND dn="cn=Manager,dc=mydom,dc=com" mech=SIMPLE ssf=0 Jul 24 15:13:03 violina slapd[17811]: conn=109 op=0 RESULT tag=97 err=0 text= Jul 24 15:13:03 violina slapd[17811]: conn=109 op=1 SRCH base="dc=mydom,dc=com" scope=2 deref=0 filter="(objectClass=*)" Jul 24 15:13:03 violina slapd[17811]: conn=109 op=1 SRCH attr=entryUUID creatorsName createTimestamp modifiersName modifyTimestamp * objectClass structuralObjectClass entryCSN Jul 24 15:13:03 violina slapd[17811]: conn=109 fd=35 closed (connection lost)
........................
when the consumer tries do do a syncrepl the consumer's log is:
Jul 24 14:59:16 mirador slapd[31054]: null_callback : error code 0x10 Jul 24 14:59:16 mirador slapd[31054]: syncrepl_updateCookie: rid=001 be_modify failed (16)
and then it fails with a Segmentation fault:
when running slapd -d7, i get: ....
ldap_write: want=265, written=265 0000: 30 82 01 05 02 01 02 63 81 a9 04 0e 64 63 3d 61 0......c....dc=a 0010: 79 6e 69 2c 64 63 3d 63 6f 6d 0a 01 02 0a 01 00 yni,dc=com...... 0020: 02 01 00 02 01 00 01 01 00 87 0b 6f 62 6a 65 63 ...........objec 0030: 74 43 6c 61 73 73 30 7b 04 09 65 6e 74 72 79 55 tClass0{..entryU 0040: 55 49 44 04 0c 63 72 65 61 74 6f 72 73 4e 61 6d UID..creatorsNam 0050: 65 04 0f 63 72 65 61 74 65 54 69 6d 65 73 74 61 e..createTimesta 0060: 6d 70 04 0d 6d 6f 64 69 66 69 65 72 73 4e 61 6d mp..modifiersNam 0070: 65 04 0f 6d 6f 64 69 66 79 54 69 6d 65 73 74 61 e..modifyTimesta 0080: 6d 70 04 01 2a 04 0b 6f 62 6a 65 63 74 43 6c 61 mp..*..objectCla 0090: 73 73 04 15 73 74 72 75 63 74 75 72 61 6c 4f 62 ss..structuralOb 00a0: 6a 65 63 74 43 6c 61 73 73 04 08 65 6e 74 72 79 jectClass..entry 00b0: 43 53 4e a0 54 30 52 04 18 31 2e 33 2e 36 2e 31 CSN.T0R..1.3.6.1 00c0: 2e 34 2e 31 2e 34 32 30 33 2e 31 2e 39 2e 31 2e .4.1.4203.1.9.1. 00d0: 31 04 36 30 34 0a 01 03 04 2c 72 69 64 3d 30 30 1.604....,rid=00 00e0: 31 2c 63 73 6e 3d 32 30 30 38 30 37 32 34 30 39 1,csn=2008072409 00f0: 34 33 33 33 5a 23 30 30 30 30 30 31 23 30 30 23 4333Z#000001#00# 0100: 30 30 30 30 30 30 01 01 ff 000000...
=>do_syncrep2 rid=001 ldap_result ld 0xb9c94970 msgid 2 wait4msg ld 0xb9c94970 msgid 2 (timeout 0 usec) wait4msg continue ld 0xb9c94970 msgid 2 all 0 ** ld 0xb9c94970 Connections:
- host: ldapadmin.mydom.com port: 389 (default) refcnt: 2 status: Connected last used: Thu Jul 24 16:02:25 2008
** ld 0xb9c94970 Outstanding Requests:
- msgid 2, origid 2, status InProgress outstanding referrals 0, parent count 0
ld 0xb9c94970 request count 1 (abandoned 0) ** ld 0xb9c94970 Response Queue: Empty ld 0xb9c94970 response count 0 ldap_chkResponseList ld 0xb9c94970 msgid 2 all 0 ldap_chkResponseList returns ld 0xb9c94970 NULL ldap_int_select connection_get(12) connection_get(12): got connid=0 =>do_syncrepl rid=001 =>do_syncrep2 rid=001 ldap_result ld 0xb9c94970 msgid 2 wait4msg ld 0xb9c94970 msgid 2 (timeout 0 usec) wait4msg continue ld 0xb9c94970 msgid 2 all 0 ** ld 0xb9c94970 Connections:
- host: ldapadmin.mydom.com port: 389 (default) refcnt: 2 status: Connected last used: Thu Jul 24 16:02:25 2008
** ld 0xb9c94970 Outstanding Requests:
- msgid 2, origid 2, status InProgress outstanding referrals 0, parent count 0
ld 0xb9c94970 request count 1 (abandoned 0) ** ld 0xb9c94970 Response Queue: Empty ld 0xb9c94970 response count 0 ldap_chkResponseList ld 0xb9c94970 msgid 2 all 0 ldap_chkResponseList returns ld 0xb9c94970 NULL ldap_int_select read1msg: ld 0xb9c94970 msgid 2 all 0 ber_get_next ldap_read: want=8, got=8 0000: 30 48 02 01 02 79 43 80 0H...yC.
ldap_read: want=66, got=66 0000: 18 31 2e 33 2e 36 2e 31 2e 34 2e 31 2e 34 32 30 .1.3.6.1.4.1.420 0010: 33 2e 31 2e 39 2e 31 2e 34 81 27 a3 25 04 0c 63 3.1.9.1.4.'.%..c 0020: 73 6e 3d 2c 72 69 64 3d 30 30 31 01 01 ff 31 12 sn=,rid=001...1. 0030: 04 10 55 1f 4d f6 a7 44 10 29 99 1e ee 7e f2 a8 ..U.M..D.)...~.. 0040: 1c 52 .R
ber_get_next: tag 0x30 len 72 contents: read1msg: ld 0xb9c94970 msgid 2 message type intermediate ldap_parse_intermediate ber_scanf fmt ({) ber: ber_scanf fmt (a) ber: ber_scanf fmt (O) ber: ber_scanf fmt (t{) ber: ber_scanf fmt (m) ber: Segmentation fault [root@mirador /etc/openldap]#
i also thought, it might probably be a memory error and run the memory-check for more than an hour: nothing was found.
what can i do else?
thank for all hints.
suomi