Ondřej Kuzník wrote:
ITS#8486 suggests we use a more efficient structure to maintain the sessionlog in. If we're messing with sessionlog already, we might as well see if we can address another issue - it is always empty on slapd startup leading to unnecessary full refreshes happening.
slapo-accesslog has most of the data we need to support that and is already sorted in CSN order (much like sessionlog).
AFAIK, we can't use the accesslog database directly as the database as we can't efficiently search on a single serverID to get the serverID set and the oldest CSN for each.
We could tweak the overlay to always maintain these in the parent entry (auditContainer). Currently the logpurge always sets the container's entryCSN to the oldest remaining CSN.
There are a few tasks that need to be done in order to achieve this:
- configure syncprov with a suffix that contains the slapo-accesslog style logs for our DB
- change struct sessionlog to use a more efficient structure that can be iterated over from any point (only tavl is available at the moment)
We've talked about this before, an in-memory B+tree would be better for all of our AVL/TAVL uses.
- on startup:
- iterate through the *last* N entries (filtering on successful write ops that affect our suffix) and build slog_entry for each of those
- for each entry, insert a new slog_entry and update sl_mincsn
- add a control to hint the database that we require the database to iterate from the end backward (back-[mhb]db can support this)
- update accesslog to log entryUUID for the entry that has just been written
- update the test suite to exercise the new failure conditions
There are some caveats to this still:
- if we aren't guaranteed to receive the accesslog entries in reverse CSN order, the resulting sessionlog would be quite unsafe to use, we have to try and detect this and start with an empty sessionlog instead, resetting sl_mincsn set to match the database contextCSN
- We might find an accesslog entry we can't use (modification that doesn't have enough information), we should still be able to use whatever we built until then, but can't continue
On Tue, Oct 24, 2017 at 04:52:57PM +0100, Howard Chu wrote:
Ondřej Kuzník wrote:
ITS#8486 suggests we use a more efficient structure to maintain the sessionlog in. If we're messing with sessionlog already, we might as well see if we can address another issue - it is always empty on slapd startup leading to unnecessary full refreshes happening.
slapo-accesslog has most of the data we need to support that and is already sorted in CSN order (much like sessionlog).
AFAIK, we can't use the accesslog database directly as the database as we can't efficiently search on a single serverID to get the serverID set and the oldest CSN for each.
We could tweak the overlay to always maintain these in the parent entry (auditContainer). Currently the logpurge always sets the container's entryCSN to the oldest remaining CSN.
I'll look into that again, what you say sounds feasible. I should be close to having the code that populates sessionlog from accesslog. When that works, it should be possible to reuse most of that to try and use accesslog directly.
There are a few tasks that need to be done in order to achieve this:
- configure syncprov with a suffix that contains the slapo-accesslog style logs for our DB
- change struct sessionlog to use a more efficient structure that can be iterated over from any point (only tavl is available at the moment)
We've talked about this before, an in-memory B+tree would be better for all of our AVL/TAVL uses.
Yes, that would be useful.
On Tue, Oct 24, 2017 at 06:45:42PM +0200, Ondřej Kuzník wrote:
On Tue, Oct 24, 2017 at 04:52:57PM +0100, Howard Chu wrote:
Ondřej Kuzník wrote:
ITS#8486 suggests we use a more efficient structure to maintain the sessionlog in. If we're messing with sessionlog already, we might as well see if we can address another issue - it is always empty on slapd startup leading to unnecessary full refreshes happening.
slapo-accesslog has most of the data we need to support that and is already sorted in CSN order (much like sessionlog).
AFAIK, we can't use the accesslog database directly as the database as we can't efficiently search on a single serverID to get the serverID set and the oldest CSN for each.
We could tweak the overlay to always maintain these in the parent entry (auditContainer). Currently the logpurge always sets the container's entryCSN to the oldest remaining CSN.
I'll look into that again, what you say sounds feasible. I should be close to having the code that populates sessionlog from accesslog. When that works, it should be possible to reuse most of that to try and use accesslog directly.
The work to load the sessionlog from an accesslog database is here: https://github.com/mistotebe/openldap/tree/ITS8486-load-from-accesslog
The control to receive entries in the reverse order turned out something I did not manage to succeed in doing, however, so the above is only part-way to a full solution. I haven't worked on the B+tree suggestion either.
To use accesslog DB directly, the following are needed: - maintain the mincsn inside accesslog[0] - if mincsn is not set on startup, take the lowest CSN recorded for each serverID - whenever we encounter a new serverID, record it both in contextCSN and mincsn - while purging entries, before each entry is removed, the mincsn should be updated with purged entryCSN, without transactions this is the only safe way, with transactions, we can just remove them in batches and record the new mincsn set just before we commit - update syncprov_op_search to read the mincsn from the audit container, not sl_mincsn - update syncprov_playlog to run a search on the accesslog database - we still need the list of entries that have disappeared between the last sync and when the persistent search starts, so we filter on: - objectclass auditWriteObject or auditExtended (we could ignore add requests) - entryCSN in the range from lowest CSN in the cookie to highest CSN in the contextCSN at the time of the persistent search
Not sure how to prevent accesslog purge from overtaking this search or how to detect this happened and switch to a full refresh in that case, that is without the overlays communicating in some way.
Is it a concern that we run a search (for each entryUUID in our DB) within a search (for the accesslog entries)? There is a note about ITS#3456 in syncprov that sounds relevant.
[0]. mincsn is the oldest CSN set that can be safely served by the sessionlog: for each serverID, the last CSN expired from the log, oldest CSN in the database or the entry from contextCSN