Help debugging sync issue with accesslog

List overview All Threads
Download

newer

older

Re: Maximum Replicating Consumers

Maximum Replicating Consumers

BECOT Jérôme

5 Aug 2024 5 Aug '24

7:38 a.m.

Hello,

We face sync issue between two servers (2.5.14) in mirrormode with accesslog. Some updates on groups are not replicated because the modification is not added to the accesslog where the modification is applied.

There is no plain error when we ADD some groups (they are added to the DIT) but after enabling the 'sync' log level, we an see these errors: Aug 5 15:20:18 server1 slapd[15489]: conn=12173218 op=8040 accesslog_response: got result 0x50 adding log entry reqStart=20240805132018.000093Z,cn=accesslog We could find that it is probably due to olcAccessLogSuccess: TRUE, but the log follows with a normal result:

Aug 5 15:20:18 server1 slapd[15489]: conn=12173218 op=8040 RESULT tag=105 err=0 qtime=0.000019 etime=0.034941 text=

If we later update the group, a fullsync happens and the group is created on the other server. Since increasing loglevel may overwhelm the server, I didn't blindly try any other loglevels.

How should I test and debug what makes the accesslog entry rejected (and thus, the object is not replicated) ?

Thanks Jerome

Attachments:

attachment.html (text/html — 3.7 KB)

Show replies by date

Ondřej Kuzník

5 Aug 5 Aug

10:21 a.m.

On Mon, Aug 05, 2024 at 02:38:08PM +0000, BECOT Jérôme wrote:

...

Hello,

We face sync issue between two servers (2.5.14) in mirrormode with accesslog. Some updates on groups are not replicated because the modification is not added to the accesslog where the modification is applied.

There is no plain error when we ADD some groups (they are added to the DIT) but after enabling the 'sync' log level, we an see these errors: Aug 5 15:20:18 server1 slapd[15489]: conn=12173218 op=8040 accesslog_response: got result 0x50 adding log entry reqStart=20240805132018.000093Z,cn=accesslog We could find that it is probably due to olcAccessLogSuccess: TRUE, but the log follows with a normal result:

Aug 5 15:20:18 server1 slapd[15489]: conn=12173218 op=8040 RESULT tag=105 err=0 qtime=0.000019 etime=0.034941 text=

If we later update the group, a fullsync happens and the group is created on the other server. Since increasing loglevel may overwhelm the server, I didn't blindly try any other loglevels.

How should I test and debug what makes the accesslog entry rejected (and thus, the object is not replicated) ?

Hi Jérôme, deltasync operates by setting up a syncrepl session on the accesslog DB, if an entry cannot be added there because of errors, the session will not have anything to replicate. As such, it is important to monitor logs for these kinds of adminitrative issues as they will lead to replication failures.

This is usually caused by local errors, e.g. the accesslog database running out of space. If you can't immediately see why it might be failing, you might want to temporarily increase logging while the issues appear and see why.

Regards.

-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP

BECOT Jérôme

6 Aug 6 Aug

12:30 a.m.

Which level should I enable to get more insights without being lost in too many messages ?

________________________________ De : Ondřej Kuzník ondra@mistotebe.net Envoyé : lundi 5 août 2024 19:21 À : BECOT Jérôme jbecot@itsgroup.com Cc : openldap-technical openldap-technical@openldap.org Objet : Re: Help debugging sync issue with accesslog

ATTENTION : Cet e-mail provient de l'extérieur de l'organisation. Ne cliquez pas sur les liens et n'ouvrez pas les pièces jointes à moins que vous ne reconnaissiez l'expéditeur et que vous sachiez que le contenu est sûr.

On Mon, Aug 05, 2024 at 02:38:08PM +0000, BECOT Jérôme wrote:

...

Hello,

We face sync issue between two servers (2.5.14) in mirrormode with accesslog. Some updates on groups are not replicated because the modification is not added to the accesslog where the modification is applied.

There is no plain error when we ADD some groups (they are added to the DIT) but after enabling the 'sync' log level, we an see these errors: Aug 5 15:20:18 server1 slapd[15489]: conn=12173218 op=8040 accesslog_response: got result 0x50 adding log entry reqStart=20240805132018.000093Z,cn=accesslog We could find that it is probably due to olcAccessLogSuccess: TRUE, but the log follows with a normal result:

Aug 5 15:20:18 server1 slapd[15489]: conn=12173218 op=8040 RESULT tag=105 err=0 qtime=0.000019 etime=0.034941 text=

If we later update the group, a fullsync happens and the group is created on the other server. Since increasing loglevel may overwhelm the server, I didn't blindly try any other loglevels.

How should I test and debug what makes the accesslog entry rejected (and thus, the object is not replicated) ?

Regards.

-- Ondřej Kuzník Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP

Quanah Gibson-Mount

7:42 a.m.

--On Tuesday, August 6, 2024 8:30 AM +0000 BECOT Jérôme jbecot@itsgroup.com wrote:

...

Which level should I enable to get more insights without being lost in too many messages ?

stats+sync is generally what you want for debugging replication issues. However, I would first check the size of your accesslog db on disk compared with your maxsize setting, to ensure that you haven't run the DB out of space.

--Quanah

481

Age (days ago)

482

Last active (days ago)

openldap-technical@openldap.org

3 comments

3 participants

tags (0)

participants (3)

BECOT Jérôme
Ondřej Kuzník
Quanah Gibson-Mount