dear list,
I was experimenting with slapd by adding a lot of entries then deleting them. What I found is adding speed is not bad but deletion speed is lacking. With or without dbsync, delete speed is around half of that of adding. Naively, I thought deleting should be easier than adding, because in adding you actually need to pass and write whole entry of data, while in delete you can just mark the database page as free? The DB file did not shrink after massive deletion; which kind of suggest that deleting is only marking page as free, not really return them to the OS. I am using the latest git tree and the mdb backend.
Another related idea is delete a whole branch from the DIT. LDAP is already hierarchical, to delete all entries under a branch, one would assume that there must be an better way than deleting entries with a client side script, like what I am doing? With SQL you can drop a table. With LDAP, can I delete a whole branch?
Hi Derek,
no idea about the deletion speed, but if a short downtime is an option and the entries to be deleted can be matched by a search filter, you could do a
slapcat -a '(!(<filter of what should be deleted>))'
and the slapadd the resulting file into an empty database. Depending on the amount of to entries be deleted in comparison to the total number of entries in the database, this could be quicker than live deletes.
best regards Karsten Am Sa., 24. Nov. 2018 um 18:56 Uhr schrieb Derek Zhou derek@shannon-data.com:
dear list,
I was experimenting with slapd by adding a lot of entries then deleting them. What I found is adding speed is not bad but deletion speed is lacking. With or without dbsync, delete speed is around half of that of adding. Naively, I thought deleting should be easier than adding, because in adding you actually need to pass and write whole entry of data, while in delete you can just mark the database page as free? The DB file did not shrink after massive deletion; which kind of suggest that deleting is only marking page as free, not really return them to the OS. I am using the latest git tree and the mdb backend.
Another related idea is delete a whole branch from the DIT. LDAP is already hierarchical, to delete all entries under a branch, one would assume that there must be an better way than deleting entries with a client side script, like what I am doing? With SQL you can drop a table. With LDAP, can I delete a whole branch?
-- Derek Zhou Shannon Systems http://www.shannon-sys.com
Karsten
Unfortunately, down time is even worse than slow.
I wish OpenLDAP has an integrated search and delete operation, like delete entries based on a search filter, all executed in the same transaction in the server. That should be more efficient than doing search and delete from the client side because the B+ trees are just traversed once. Also the server can impose size limit on how many entries can be deleted in one go, just like the search, to avoid someone hogging the server.
For the LDAP gurus here, the above idea should be technically possible, right?
Derek
On Monday, November 26, 2018 03:36:16 PM Karsten Heymann wrote:
Hi Derek,
no idea about the deletion speed, but if a short downtime is an option and the entries to be deleted can be matched by a search filter, you could do a
slapcat -a '(!(<filter of what should be deleted>))'
and the slapadd the resulting file into an empty database. Depending on the amount of to entries be deleted in comparison to the total number of entries in the database, this could be quicker than live deletes.
best regards Karsten Am Sa., 24. Nov. 2018 um 18:56 Uhr schrieb Derek Zhou derek@shannon-data.com:
dear list,
I was experimenting with slapd by adding a lot of entries then deleting them. What I found is adding speed is not bad but deletion speed is lacking. With or without dbsync, delete speed is around half of that of adding. Naively, I thought deleting should be easier than adding, because in adding you actually need to pass and write whole entry of data, while in delete you can just mark the database page as free? The DB file did not shrink after massive deletion; which kind of suggest that deleting is only marking page as free, not really return them to the OS. I am using the latest git tree and the mdb backend.
Another related idea is delete a whole branch from the DIT. LDAP is already hierarchical, to delete all entries under a branch, one would assume that there must be an better way than deleting entries with a client side script, like what I am doing? With SQL you can drop a table. With LDAP, can I delete a whole branch?
-- Derek Zhou Shannon Systems http://www.shannon-sys.com
Derek Zhou wrote:
Karsten
Unfortunately, down time is even worse than slow.
I wish OpenLDAP has an integrated search and delete operation, like delete entries based on a search filter, all executed in the same transaction in the server. That should be more efficient than doing search and delete from the client side because the B+ trees are just traversed once. Also the server can impose size limit on how many entries can be deleted in one go, just like the search, to avoid someone hogging the server.
For the LDAP gurus here, the above idea should be technically possible, right?
In normal operation, as opposed to simple experimentation, why do you actually need to delete a large number of entries quickly? If you just need to make them unavailable to a bunch of clients, you can first do a subtree rename to move them out of the way, and then delete incrementally at whatever speed.
Derek
On Monday, November 26, 2018 03:36:16 PM Karsten Heymann wrote:
Hi Derek,
no idea about the deletion speed, but if a short downtime is an option and the entries to be deleted can be matched by a search filter, you could do a
slapcat -a '(!(<filter of what should be deleted>))'
and the slapadd the resulting file into an empty database. Depending on the amount of to entries be deleted in comparison to the total number of entries in the database, this could be quicker than live deletes.
best regards Karsten Am Sa., 24. Nov. 2018 um 18:56 Uhr schrieb Derek Zhou derek@shannon-data.com:
dear list,
I was experimenting with slapd by adding a lot of entries then deleting them. What I found is adding speed is not bad but deletion speed is lacking. With or without dbsync, delete speed is around half of that of adding. Naively, I thought deleting should be easier than adding, because in adding you actually need to pass and write whole entry of data, while in delete you can just mark the database page as free? The DB file did not shrink after massive deletion; which kind of suggest that deleting is only marking page as free, not really return them to the OS. I am using the latest git tree and the mdb backend.
Another related idea is delete a whole branch from the DIT. LDAP is already hierarchical, to delete all entries under a branch, one would assume that there must be an better way than deleting entries with a client side script, like what I am doing? With SQL you can drop a table. With LDAP, can I delete a whole branch?
-- Derek Zhou Shannon Systems http://www.shannon-sys.com
On Tuesday, November 27, 2018 5:13:41 PM CST Howard Chu wrote:
Derek Zhou wrote:
Karsten
Unfortunately, down time is even worse than slow.
I wish OpenLDAP has an integrated search and delete operation, like delete entries based on a search filter, all executed in the same transaction in the server. That should be more efficient than doing search and delete from the client side because the B+ trees are just traversed once. Also the server can impose size limit on how many entries can be deleted in one go, just like the search, to avoid someone hogging the server.
For the LDAP gurus here, the above idea should be technically possible, right?
In normal operation, as opposed to simple experimentation, why do you actually need to delete a large number of entries quickly? If you just need to make them unavailable to a bunch of clients, you can first do a subtree rename to move them out of the way, and then delete incrementally at whatever speed.
If the database is going to occupy significant amount of diskspace, the sys admin in me will feel safer if I can delete fast. A file system can do that, a SQL database can do that. Imagine a user application is running wild and injecting a lot of junk data, being able to delete fast is crucial to keep the system up til the culprit is identified and killed. This scenario has happened to me several times, just not with a LDAP database.
Derek
Howard and others:
Again on deleting lots of entries. I have 2 experiments:
1, in a fresh db insert 10 million entries. let's call this state A. then delete 9 million entries over night. let's call this state B 2, in a fresh db insert 1 million entries I call this state B'
In B and B' even the 1 million entries are the same; so from user's perspective B and B' are indistinguishable. However, deleting entries from B is much slower than deleting entries from B', like 10x slower. It seems like deleting speed depends on the peak db size, and how full the db currently is.
My question is: is this wide a deletion performance gap expected?
By the way, querying and adding speed are also different among A, B and B', but the gap is much smaller. also deletion speed gap between A and B' is not large. restarting daemon at state B does not improve deletion speed (so the difference between B and B' are persistent).
In the meantime I will do another experiment: add back 9 million entries to state B and call it A'. then compare performance between A and A'.
On Thursday, December 13, 2018 05:24:53 PM Derek Zhou wrote:
Howard and others:
Again on deleting lots of entries. I have 2 experiments:
1, in a fresh db insert 10 million entries. let's call this state A. then delete 9 million entries over night. let's call this state B 2, in a fresh db insert 1 million entries I call this state B'
In B and B' even the 1 million entries are the same; so from user's perspective B and B' are indistinguishable. However, deleting entries from B is much slower than deleting entries from B', like 10x slower. It seems like deleting speed depends on the peak db size, and how full the db currently is.
My question is: is this wide a deletion performance gap expected?
Further testing shows that this deletion speed drop only happens in writemap mode. default mode is much better. See the following plots: 1, adding 10 million entries, default vs writemap As you can see, writemap is marginally faster, with curious period performance drop.
2, deleting all 10 million entries previously added, default vs writemap As you can see, in writemap mode, after ~10% entries are deleted, the performance went down hard. The test was not finished, would probably take hours. Whereas in default mode, delete speed was better in the beginning, there is also a performance drop after ~10% entries are deleted but much less severe. The test finished in ~4800 seconds.
openldap-technical@openldap.org