Nick Geron wrote:
Howard Chu wrote:
Nick Geron wrote:
We're now thinking some of our issues may be attributable to time granularity issues. We're seeing missing information on the consumer if multiple successive writes are attempted via a script. If we slow down to human speed or insert sleeps in our test code, this gets a little better. I see that A.2.4 N-Way MultiMaster Replication notes that entryCSNs now record with microseconds, but does this apply to mirrors as well?
CSNs were extended to microsecond resolution only for the benefit of conflict resolution. For all other purposes, the changecount field ensures sufficient granularity.
In that case, why do we see any difference in propagation between scripted (quick) updates and hand/command line (slow) modifications? Or are you simply saying time is not the issue?
Timestamps are not the issue for propagation.
For example - manipulating one particular entry:
- update server 1 adding 1 attribute = propagates to second server
- wait a few seconds
- update server 1 adding 4 attributes = first of four propagates to
second server
After waiting a second or so, another successful operation on the 'write' server will propagate all modifications over to the second server as expected. This behavior is why we suspected a time granularity issue. It should be noted that this doesn't work for us (and others as I would expect) as there is no guarantee that another operation on the 'write' server will occur, thereby propagating the current entry.
OK, this sounds like the background thread to propagate updates isn't getting scheduled when it should. That could be a bug in the syncprov overlay.
Can I setup a two node N-Way?
"2" is certainly a valid value of "N".
Well, there's that developer 'charm' I've been reading throughout years of archives. Since the admin doc make a distinction between the 'hybrid configuration' of MirrorMode and N-Way Multi-Master, I was more looking for clarification between the two implementations.
Then that is what you should have asked. "Looking for clarification between the implementation of MirrorMode and Multi-Master" is a much clearer question than "Can I setup a two node N-Way", and there is no way one could logically get from the latter to the former, based on the context of your email. If you don't ask useful questions, you have only yourself to blame when you don't get useful answers.
There is no difference now between the MirrorMode and Multi-Master code. The only difference is purely a matter of usage. In a MirrorMode setup you use an external frontend that guarantees that writes are only directed to one server. As long as that guarantee is kept, your servers will have perfect data consistency. In a Multi-Master setup, you allow writes to any server, and the data consistency is not guaranteed. In that case the CSNs are used for conflict resolution; when competing writes are made to the same entries the last writer wins. (Note - the servers will all eventually converge on a consistent view of the data, the issue is that the resulting data may not resemble what you expected. If your servers' clocks are not tightly synchronized, it's pretty certain to be different from what you expected.)
Syncrepl doesn't write session logs. Read RFC4533.
I'll look into it. Thanks.
Switching gears, what would the devs say is the capabilities in operations per second with 2.4.7?
I've recently run back-hdb with a 5GB database in back-hdb, 20,000 indexed searches/second concurrent with 13,000 modifies/second on an 8 core Opteron server (1.9GHz cores). This was tested using slamd and ~80 client threads, sustained over a 2 hour run.
I'm seeing a number of aborts when testing under high load. The latest came from running scripted ldapsearches and ldapmodifies which resulted in a mutex error (or so I am told by one of our developers).
Specifically:
- adding about 100 attributes to an entry
- diffing the output of ldapsearch between the two nodes in loop
- once synced, grabbing the attributes, shoving them in a temp file
with delete instructions and using that with ldapmodify.
I complied with debugging on which results in an abort with "connection.c: 676: connection_state_closing: Assertion 'c_struct_state == 0x02' failed" logged.
Interesting. It would be useful to get a gdb stack trace from that situation.