Replication speed and data sizing (mdb)

List overview All Threads
Download

newer

older

ACL to allow all but one OU

Adding Members to Groups

Brian Wright

18 Jul 2015 18 Jul '15

5:22 a.m.

We are using 2.4.39. I realize there are newer versions available, but at the time when we started our LDAP project, this was the version available.

We are testing n-way master replication along with a large number of records using lmdb. Here's the config:

* 8 way replication with 8 nodes (each node having 7 other connections) * 50k records * Inserting the records into one cluster node to replicate to all the rest

The problems obvserved:

* Some nodes are faster at replication than others. In general, the time to complete replication is slower than expected. In my test environment I found that 50k records can take up to 2 hours for some nodes to complete. The fastest nodes complete in 1.5 hours. Because these records are brand new insertions, delta based replication wouldn't help here. * When the replication is completed, some of the data.mdb files are larger than others (sometimes by an order of magnitude).

We would like to understand the reason behind these two problems above. First, the replication system seems unusually slow. Second, we need to understand why the data.mdb file grows sometimes far larger on one node than the rest of the nodes. For example, in our production environment, while most nodes were around 1GB in data size, one node stored in excess of 40GB in data.mdb. In my testing lab, my the 50k record insertion left most nodes with a data.mdb size of 150MB. On one of the nodes, the data size was 262MB.

Note that I've also tried alternative replication connectivity approaches to attempt to reduce the number of connections per server, but that did not improve replication performance or the varying data sizes in the end.

If updating to a newer version helps resolve the above observed problems, please let me know.

Any tuning or debugging advice here would be appreciated.

Thanks.

-- Signature *Brian Wright* *Sr. UNIX Systems Engineer * 901 Mariners Island Blvd Suite 200 San Mateo, CA 94404 USA *Email *brianw@marketo.com mailto:brianw@marketo.com *Phone *+1.650.539.3530** *****www.marketo.com http://www.marketo.com/* Marketo Logo

Attachments:

attachment.htm (text/html — 6.6 KB)
Marketo.jpg (image/jpeg — 2.1 KB)

Show replies by date

Quanah Gibson-Mount

21 Jul 21 Jul

9:16 p.m.

--On July 17, 2015 at 8:22:15 PM -0700 Brian Wright brianw@marketo.com wrote:

...

We are using 2.4.39. I realize there are newer versions available, but at the time when we started our LDAP project, this was the version available.

There were several significant changes made to 2.4.41 to attempt to address a number of the issues you are reporting. I would suggest upgrading to 2.4.41 and see if you find any significant improvements.

--Quanah

-- Quanah Gibson-Mount Platform Architect Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

Brian Wright

10:53 p.m.

Hi Quanah,

I will upgrade to 2.4.41 and re-run my testing.

Thanks.

On 7/21/15 12:16 PM, Quanah Gibson-Mount wrote:

...

--On July 17, 2015 at 8:22:15 PM -0700 Brian Wright brianw@marketo.com wrote:

...
We are using 2.4.39. I realize there are newer versions available, but at the time when we started our LDAP project, this was the version available.

There were several significant changes made to 2.4.41 to attempt to address a number of the issues you are reporting. I would suggest upgrading to 2.4.41 and see if you find any significant improvements.

--Quanah

Brian Wright

11 Aug 11 Aug

2:19 a.m.

Hi Quanah or anyone with experience,

I have upgraded to 2.4.41 in a two node cluster and still see replication slowness. I have inserted 300k user records into an lmdb database. The data.mdb ended up 2 GB in size. The insertion took 3 hours to complete (likely mostly due to ldapadd). I enabled replication to a second node using the following 2-way replication with the following syncrepl statement (similar on both nodes):

syncrepl rid=1 provider=ldap://ldap1 type=refreshAndPersist retry="5 5 300 +" searchbase="dc=marketo,dc=com" attrs="*,+" bindmethod=simple binddn="cn=admin,dc=marketo,dc=com" credentials=<redacted>

I started this replication on Friday and by Monday it is only 28% complete (around 90k records have been transferred -- data.mdb = 571M). These servers have full speed network connections between them, so I don't understand the protocol slowness. Is replication not intended for this amount of transfer load? Are we expected to recover the node via a separate method (i.e., slapcat / slapadd) and then kick replication off only after it's been loaded?

Additionally, when I have restarted a partially replicated node far into this replication process (90k records), the entire process stops and does not resume on restart. I do not have journaling enabled, but because these are new full records it wouldn't buy much performance speed here other than perhaps better replication recovery.

My questions include...

Is syncrepl configured optimally? Will journaling help with replication recovery?

We're trying to solve the problem of how to recover/replace a failed node in a system containing a very large number of records and bring it back into the cluster as quickly as possible. We're also trying to resolve how to ensure that replication works consistently on restart.

Please let me know.

Thanks.

On 7/21/15 1:53 PM, Brian Wright wrote:

...

Hi Quanah,

I will upgrade to 2.4.41 and re-run my testing.

Thanks.

On 7/21/15 12:16 PM, Quanah Gibson-Mount wrote:

...
--On July 17, 2015 at 8:22:15 PM -0700 Brian Wrightbrianw@marketo.com wrote:

...
We are using 2.4.39. I realize there are newer versions available, but at the time when we started our LDAP project, this was the version available.

There were several significant changes made to 2.4.41 to attempt to address a number of the issues you are reporting. I would suggest upgrading to 2.4.41 and see if you find any significant improvements.

--Quanah

-- Signature

*Brian Wright* *Sr. UNIX Systems Engineer * 901 Mariners Island Blvd Suite 200 San Mateo, CA 94404 USA *Email *brianw@marketo.com mailto:brianw@marketo.com *Phone *+1.650.539.3530** *****www.marketo.com http://www.marketo.com/*

Marketo Logo

Aaron Richton

9:06 p.m.

On Mon, 10 Aug 2015, Brian Wright wrote:

...

this amount of transfer load? Are we expected to recover the node via a separate method (i.e., slapcat / slapadd) and then kick replication off only after it's been loaded?

[...]

...

We're trying to solve the problem of how to recover/replace a failed node in a system containing a very large number of records and bring it back into the cluster as quickly as possible. We're also trying to resolve how to ensure that replication works consistently on restart.

"Expected" might be too strong; there's more than one way to do it. But by definition, you're going to have slapd(8) backed (hopefully) with some flavor of transactional integrity, and that represents a extremely significant cost in your data store writes. You'll also have various syntax/schema validation, etc. occurring.

So if you bring up your initial load with slapadd(8), safely taking advantage of -q and similar options (see the man page), you'll get the bulk load completed without this overhead. Even if your input LDIF is somewhat "stale" syncrepl should be able to figure out the last delta within a reasonable time.

Regardless of method, you can use the standard CSN monitoring techniques (discussed extensively on this list) to "ensure that replication works."

Andrew Findlay

12 Aug 12 Aug

4:29 p.m.

On Mon, Aug 10, 2015 at 05:19:22PM -0700, Brian Wright wrote:

...

We're trying to solve the problem of how to recover/replace a failed node in a system containing a very large number of records and bring it back into the cluster as quickly as possible. We're also trying to resolve how to ensure that replication works consistently on restart.

In terms of recovering a failed node, the very fastest method is to use a database backup made with mdb_copy. The output from that command is a file that can be used directly as an MDB database so all you have to do is put it in place and restart slapd. Even if the backup is a day or two old, the replication process should bring in the more recent changes from another server.

If your servers have identical software, you can even take a backup from one server and install it on another one. That gives you a quick way of copying in very fresh data.

Note that mdb_copy is not installed by default. For safety you must use a binary built from the same OpenLDAP distribution as your slapd. You will find the source for the MDB tools in openldap-2.4.*/libraries/liblmdb

There are some caveats with mdb_copy. In particular it can cause database bloat if run on a server that has a heavy write load at the time.

Andrew

-- ----------------------------------------------------------------------- | From Andrew Findlay, Skills 1st Ltd | | Consultant in large-scale systems, networks, and directory services | | http://www.skills-1st.co.uk/ +44 1628 782565 | -----------------------------------------------------------------------

Brian Wright

15 Aug 15 Aug

8:17 a.m.

On 8/12/15 7:29 AM, Andrew Findlay wrote:

...

On Mon, Aug 10, 2015 at 05:19:22PM -0700, Brian Wright wrote:

...
We're trying to solve the problem of how to recover/replace a failed node in a system containing a very large number of records and bring it back into the cluster as quickly as possible. We're also trying to resolve how to ensure that replication works consistently on restart.

In terms of recovering a failed node, the very fastest method is to use a database backup made with mdb_copy. The output from that command is a file that can be used directly as an MDB database so all you have to do is put it in place and restart slapd. Even if the backup is a day or two old, the replication process should bring in the more recent changes from another server.

[...]

There are some caveats with mdb_copy. In particular it can cause database bloat if run on a server that has a heavy write load at the time.

Andrew

Thanks for the tip. This really helps us a lot with recovering failed nodes. I wouldn't have thought to dig into the libraries/liblmdb area looking for tools. The library yes, but tools no. I had assumed there must be tools somewhere, but since they didn't get installed with the regular package, I didn't know where they were. I guess I should always be more investigative and look through all of the directories of the source. :)

As for the use of our environment, our LDAP traffic will mostly consist of reads with a much small numbers of writes throughout the day. So, our workload should probably not cause much bloat, if any, as long as we're judicious with the tool usage. Though, I will make note of this aspect when I write the docs for our use of the copy tool.

Thanks again.

3609

Age (days ago)

3637

Last active (days ago)

openldap-technical@openldap.org

6 comments

4 participants

tags (0)

participants (4)

Aaron Richton
Andrew Findlay
Brian Wright
Quanah Gibson-Mount