ali.pouya@free.fr wrote:
Full_Name: Ali Pouya Version: 2.4.11 OS: Linux 2.6 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (145.242.11.4)
I think there is a documentation issue for OpenLdap 2.4.11 : The chapter 17.4.4 of the Admin Guide recommends configuring TWO sycrepl directives for each mirror side. If I do so, the contextCSN of the stand by mirror gets corrupted very easily. But if I confugure the mirrors with only ONE syncrepl directive it's OK.
The test environment : I have a test directory with two mirrors A (sid=1) and B (sid=2) configured as recommended in the Admin's Guide, and a replica C connected to A. The directory contains 10 million objects, and I use the server A for writing 500 000 new ones.
Very often and without any apparent reason the contextCSN in the memory of B gets suddenly corrupted while those of A and C are OK. In this situation the contextCSN of B gets stuck but B continues to receive data from A.
The value of contextCSN in base 64 is :
contextCSN: 20080727021429.070493Z#000000#000#000000 contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==
which looks like
4 bytes of garbage + "0802033718.300111Z#000000#001#000000"
I note that, according to the sid values you assigned to servers A and B, the first contextCSN should not appear, since it has sid == 0, while the second one, apart from the corruption, is plausible (as you're writing to server A, with sid == 1).
I note that only the part indicating the year (2008) is garbled. May be this part is handled differently ?
No.
At service shutdown B writes the corrupt contextCSN to the disk. At service startup B reads the corrupt contextCSN from the disk and begins to scan ALL of the data base.
Also it sends a sync request to A (a persitent search containing the corrupt contextCSN in the control field) causing A to scan the WHOLE data base. The replica C remains safe.
The fact that the two servers scan the whole database is a side effect of the incorrect contextCSN; I wouldn't bother, as soon as the corruption gets tracked and fixed.
If I reverse the roles of A and B the corruption occurs on A (always on the stand by mirror).
I have already encountered the contextCSN corruption problem in OpenLdap 2.3 and this was one of my reasons to migrate to 2.4.11.
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando@sys-net.it -----------------------------------