I'm having an issue with LDAP replication hanging on our AWS ldap users (our master is on prem). So I've been playing with timeouts and keepalive But sometimes when I restart the slapd, it will start to continue replication and then all of a sudden it will start deleting all the users:
syncrepl_del_nonpresent: rid=222 be_delete DN (0)
and the only way I've found to recover is to stop the slapd, slapcat from the master and slapadd the ldif file into the consumer. Anyone know why this may be happening? Am I missing some setting that I haven't found yet?
thanks, ds
On Wed, Jan 28, 2026 at 06:23:54AM -0000, steiner@rutgers.edu wrote:
I'm having an issue with LDAP replication hanging on our AWS ldap users (our master is on prem). So I've been playing with timeouts and keepalive But sometimes when I restart the slapd, it will start to continue replication and then all of a sudden it will start deleting all the users:
syncrepl_del_nonpresent: rid=222 be_delete DN (0)
and the only way I've found to recover is to stop the slapd, slapcat from the master and slapadd the ldif file into the consumer. Anyone know why this may be happening? Am I missing some setting that I haven't found yet?
Are you running deltasync by any chance? People sometimes forget that the replication user needs unrestricted read access to the actual database as well as the accesslog DB in that case, make sure you have this covered. Even if not, ACLs would be the first thing on my list.
Regards,
We've been running OpenLDAP since 2015 and upgraded from v2.4 to v2.6 about a year ago. 99% of the time, replication works fine. I have numerous consumers and the only ones that have regular issues are the two in AWS. This week (worst so far), I had to restart both consumers because replication hung 4 out of 5 days. Two of those days, I had the be_delete issue mention above. The others just continued and finished replication after the restart. I have lowered timeouts and keepalives to see if that would help; current settings are:
idletimeout 30 syncrepl rid=XXX ... retry="10 10 20 +" network-timeout=30 timeout=60 keepalive=10:3:10
Unclear if this has helped.
Note that if all the operations/tasks finish quickly it's unlikely to have the be_delete issue. If one of the operations take a while to finish, be_delete is more likely. I'm assuming due to the "last case option" of systemd is to send SIGKILL rather than the initial SIGINT.
openldap-technical@openldap.org