synrcrepl "be_modify failed (80)"

List overview All Threads
Download

newer

older

Application data is sent but not...

SASL digest-md5 Authentication and...

manu＠netbsd.org

4 Feb 2009 4 Feb '09

12:48 p.m.

...

From time to time, syncrepl breaks on the replica, with this in the

logs:

slapd[1737]: null_callback : error code 0x50 slapd[1737]: syncrepl_entry: rid=017 be_modify failed (80) slapd[1737]: do_syncrepl: rid=017 retrying

code 80 is LDAP_OTHER, which is not very insightful. The only way to get syncrepl working again is to wipe out the database and restart slapd.

Being out of sync is not very pleasant, but there is worse: when several replicas are harassing the master with syncrepl requests, it tends to die horribly, with stuff like this:

1) assertion "c->c_conn_state == SLAP_C_CLOSING" failed: file "connection.c", line 787, function "connection_close"

2) assertion "c->c_struct_state == SLAP_C_USED" failed: file "connection.c", line 680, function "connection_state_closing"

3) slapd: Error detected by libpthread: Invalid mutex. Detected by file "/home/builds/ab/netbsd-4/src/lib/libpthread/pthread_mutex.c", line 295, function "pthread_mutex_trylock".

I end up with a hung slapd, that I can only get rid of with a kill -9.

All of this happens with 2.4.13. Are these known bugs?

-- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@netbsd.org

Show replies by date

Pierangelo Masarati

4 Feb 4 Feb

1:06 p.m.

Emmanuel Dreyfus wrote:

...

...
From time to time, syncrepl breaks on the replica, with this in the

logs:

slapd[1737]: null_callback : error code 0x50 slapd[1737]: syncrepl_entry: rid=017 be_modify failed (80) slapd[1737]: do_syncrepl: rid=017 retrying

code 80 is LDAP_OTHER, which is not very insightful. The only way to get syncrepl working again is to wipe out the database and restart slapd.

Being out of sync is not very pleasant, but there is worse: when several replicas are harassing the master with syncrepl requests, it tends to die horribly, with stuff like this:

assertion "c->c_conn_state == SLAP_C_CLOSING" failed: file

"connection.c", line 787, function "connection_close"

assertion "c->c_struct_state == SLAP_C_USED" failed: file

"connection.c", line 680, function "connection_state_closing"

slapd: Error detected by libpthread: Invalid mutex. Detected by file

"/home/builds/ab/netbsd-4/src/lib/libpthread/pthread_mutex.c", line 295, function "pthread_mutex_trylock".

I end up with a hung slapd, that I can only get rid of with a kill -9.

All of this happens with 2.4.13. Are these known bugs?

Not sure about the reason of the LDAP_OTHER issue, but the connection issue is probably known, and fixed in re24. Can you try with a fresh checkout?

Ing. Pierangelo Masarati OpenLDAP Core Team

SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando@sys-net.it -----------------------------------

manu＠netbsd.org

1:14 p.m.

Pierangelo Masarati ando@sys-net.it wrote:

...

Not sure about the reason of the LDAP_OTHER issue, but the connection issue is probably known, and fixed in re24. Can you try with a fresh checkout?

What ITS is it? I'd like to just add the fix.

-- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@netbsd.org

Pierangelo Masarati

1:18 p.m.

Emmanuel Dreyfus wrote:

...

Pierangelo Masarati ando@sys-net.it wrote:

...
Not sure about the reason of the LDAP_OTHER issue, but the connection issue is probably known, and fixed in re24. Can you try with a fresh checkout?

What ITS is it? I'd like to just add the fix.

Should be ITS#5835 and perhaps ITS#5886.

Ing. Pierangelo Masarati OpenLDAP Core Team

Quanah Gibson-Mount

2:22 p.m.

--On Wednesday, February 04, 2009 10:14 PM +0100 Emmanuel Dreyfus manu@netbsd.org wrote:

...

Pierangelo Masarati ando@sys-net.it wrote:

...
Not sure about the reason of the LDAP_OTHER issue, but the connection issue is probably known, and fixed in re24. Can you try with a fresh checkout?

What ITS is it? I'd like to just add the fix.

In this case, "just adding the fix" would be a lot of work. I'd really advise just using RE24 CVS at this moment.

--Quanah

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

Pierangelo Masarati

2:29 p.m.

Emmanuel Dreyfus wrote:

...

...
From time to time, syncrepl breaks on the replica, with this in the

logs:

slapd[1737]: null_callback : error code 0x50 slapd[1737]: syncrepl_entry: rid=017 be_modify failed (80) slapd[1737]: do_syncrepl: rid=017 retrying

code 80 is LDAP_OTHER, which is not very insightful. The only way to get syncrepl working again is to wipe out the database and restart slapd.

According to the above logs, the consumer fails within an internal modify. It could be of help to see a little bit more about that modify. Would it be possible to increase the log level? Adding at least "sync" could at least tell what was being replicated when the failure occurred.

Is the problem reproducible? Does it take a lot to reproduce?

Ing. Pierangelo Masarati OpenLDAP Core Team

manu＠netbsd.org

9:26 p.m.

Pierangelo Masarati ando@sys-net.it wrote:

...

According to the above logs, the consumer fails within an internal modify. It could be of help to see a little bit more about that modify. Would it be possible to increase the log level? Adding at least "sync" could at least tell what was being replicated when the failure occurred.

Is the problem reproducible? Does it take a lot to reproduce?

It seems to have vanished right now. I hope it will not come again, but if it does, I will increase log level se see what is going on. What level to you want?

-- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@netbsd.org

Quanah Gibson-Mount

10:29 p.m.

--On Thursday, February 05, 2009 6:26 AM +0100 Emmanuel Dreyfus manu@netbsd.org wrote:

...

Pierangelo Masarati ando@sys-net.it wrote:

...
According to the above logs, the consumer fails within an internal modify. It could be of help to see a little bit more about that modify. Would it be possible to increase the log level? Adding at least "sync" could at least tell what was being replicated when the failure occurred.

Is the problem reproducible? Does it take a lot to reproduce?

It seems to have vanished right now. I hope it will not come again, but if it does, I will increase log level se see what is going on. What level to you want?

As he said, at least add "sync".

--Quanah

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

manu＠netbsd.org

8 Feb 8 Feb

12:10 a.m.

Quanah Gibson-Mount quanah@zimbra.com wrote:

...

As he said, at least add "sync".

Here I am. It does not seems to help a lot:

syncrepl_entry: rid=017 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) syncrepl_entry: rid=017 be_search (0) syncrepl_entry: rid=017 cn=schema,cn=config syncrepl_entry: rid=017 be_add (68) null_callback : error code 0x50 syncrepl_entry: rid=017 be_modify (80) syncrepl_entry: rid=017 be_modify failed (80) do_syncrepl: rid=017 retrying

-- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@netbsd.org

Quanah Gibson-Mount

11:07 a.m.

--On Sunday, February 08, 2009 9:10 AM +0100 Emmanuel Dreyfus manu@netbsd.org wrote:

...

Quanah Gibson-Mount quanah@zimbra.com wrote:

...
As he said, at least add "sync".

Here I am. It does not seems to help a lot:

syncrepl_entry: rid=017 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) syncrepl_entry: rid=017 be_search (0) syncrepl_entry: rid=017 cn=schema,cn=config syncrepl_entry: rid=017 be_add (68) null_callback : error code 0x50 syncrepl_entry: rid=017 be_modify (80) syncrepl_entry: rid=017 be_modify failed (80) do_syncrepl: rid=017 retrying

You can't modify the hardcoded schema, so if that's what it is trying to modify, I'm not surprised. Since it looks like you're replicating back-config, I'd really advise using current CVS of RE24 if you are not already, as a number of issues around doing that have been fixed there.

--Quanah

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

manu＠netbsd.org

12:56 p.m.

Quanah Gibson-Mount quanah@zimbra.com wrote:

...

You can't modify the hardcoded schema, so if that's what it is trying to modify, I'm not surprised.

No modification to cn=schema,cn=config was done on the master. I'll consider upgrading to RE24, but it's a lot of machines to upgrade.

-- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@netbsd.org

Quanah Gibson-Mount

12:59 p.m.

--On Sunday, February 08, 2009 9:56 PM +0100 Emmanuel Dreyfus manu@netbsd.org wrote:

...

Quanah Gibson-Mount quanah@zimbra.com wrote:

...
You can't modify the hardcoded schema, so if that's what it is trying to modify, I'm not surprised.

No modification to cn=schema,cn=config was done on the master. I'll consider upgrading to RE24, but it's a lot of machines to upgrade.

No test lab? Ouch.

--Quanah

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

5998

Age (days ago)

6002

Last active (days ago)

openldap-software@openldap.org

11 comments

3 participants

tags (0)

participants (3)

manu＠netbsd.org
Pierangelo Masarati
Quanah Gibson-Mount