Paul B. Henson wrote:
From: Howard Chu Sent: Friday, May 23, 2014 5:30 PM
you have completely changed the semantics of the timestamp. The code explicitly used "now" - the current time when
the
operation completed, and you have changed it to use op->o_time, which is the time when the operation was first received. You said all you wanted to
do
was add microseconds, but this patch changes much more than that.
Yes, I almost went into more detail on that when I posted, I suppose I probably should have.
During my testing, the seconds in op->o_time were never different than now was set to by a call to time(). Assuming an operation never takes more than one second to complete, if you are working with a one second granularity, the start time and the stop time are roughly equivalent, never differing by more than one second (ie, start at 10.8, stop at 11.2, use 11 rather than 10). What is the worst case amount of time it would take to perform a bind operation?
There is no correct answer to that question, therefore you must not assume there is such a thing. With proxy-bind it could take several seconds to get an answer back from a remote server. Under heavy load with a CPU-intensive password hash it could also take unpredictable times.
The change made was in ppolicy_bind_response, which unless I misunderstand, is only called for the bind operation, so the change in semantic time definition would only apply to binds, and impact the timestamp stored in pwdFailureTime, whether or not a given pwdFailureTime value was recent enough to consider, and whether or not a password was considered expired.
Further, while the semantics did change, is the new definition just different, or worse than the previous? Was there a specific reason it was decided to mark the time of failure as when the bind operation completed as opposed to when it began? What benefit does it provide?
The *failure* occurred at that instant, not at the instant the request was received. It is simply a matter of correctness.
I originally did intend to add microseconds to the existing call to time, but after reviewing other uses of fractional seconds, such as in the accesslog module, it seemed more convenient to use the op->o_tincr, which while not time-based also allows you to distinguish operations at a subsecond granularity. For the purpose of keeping track of the failures, it seemed not so important you actually logged the exact microsecond the failure occurred, but simply that you are able to distinguish multiple failures within one second. I considered keeping the call to time(), and simply adding op->o_tincr on to that, but then I would potentially be mixing time intervals.
You need to actually use microseconds, since the time-increment is only unique on the local server and will not guarantee uniqueness in a replication scenario.