With the release of Debian 7 (wheezy) I was rebuilding a couple test systems and was surprised to find that the load times I am seeing for populating the mdb database with slapd have gone up dramatically. The load for a master server that was taking about 10 minutes just took 35 minutes. The slave is worse. A normal load time is 20 minutes and it is at 31 minutes now with an eta of about 2.5 hours. These systems are using OpenLDAP 2.4.35.
Here are some relevent bits from the configuration.
dn: cn=config olcToolThreads: 2
dn: olcDatabase={2}mdb,cn=config olcDbCheckpoint: 1024 5 olcDbEnvFlags: writemap olcDbEnvFlags: nometasync olcDbNoSync: FALSE olcDbMaxSize: 85899345920
The systems are Dell r610s with 16 gbyte of memory. Our database is currently 3.2G on the master server.
I have been loading wheezy/2.4.35 databases for weeks now in preparation upgrading the OS and installing the new version of OpenLDAP on our production servers. This is the first time I have seen this.
I fiddled with the hardware enough to the point I don't think it is a hardware problem. There is not really much tuning to do with mdb and I would appreciate some suggestions for what to look at next.
Bill
--On Tuesday, June 25, 2013 10:29 AM -0700 Bill MacAllister whm@stanford.edu wrote:
With the release of Debian 7 (wheezy) I was rebuilding a couple test systems and was surprised to find that the load times I am seeing for populating the mdb database with slapd have gone up dramatically. The load for a master server that was taking about 10 minutes just took 35 minutes. The slave is worse. A normal load time is 20 minutes and it is at 31 minutes now with an eta of about 2.5 hours. These systems are using OpenLDAP 2.4.35.
Here are some relevent bits from the configuration.
dn: cn=config olcToolThreads: 2
dn: olcDatabase={2}mdb,cn=config olcDbCheckpoint: 1024 5 olcDbEnvFlags: writemap olcDbEnvFlags: nometasync olcDbNoSync: FALSE olcDbMaxSize: 85899345920
The systems are Dell r610s with 16 gbyte of memory. Our database is currently 3.2G on the master server.
I have been loading wheezy/2.4.35 databases for weeks now in preparation upgrading the OS and installing the new version of OpenLDAP on our production servers. This is the first time I have seen this.
I fiddled with the hardware enough to the point I don't think it is a hardware problem. There is not really much tuning to do with mdb and I would appreciate some suggestions for what to look at next.
If you've been doing multiple tests, you likely filled RAM.
I like to:
echo 3 > /proc/sys/vm/drop_caches
before doing a MDB load to empty RAM out.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
--On Tuesday, June 25, 2013 11:06:50 AM -0700 Quanah Gibson-Mount quanah@zimbra.com wrote:
--On Tuesday, June 25, 2013 10:29 AM -0700 Bill MacAllister whm@stanford.edu wrote:
With the release of Debian 7 (wheezy) I was rebuilding a couple test systems and was surprised to find that the load times I am seeing for populating the mdb database with slapd have gone up dramatically. The load for a master server that was taking about 10 minutes just took 35 minutes. The slave is worse. A normal load time is 20 minutes and it is at 31 minutes now with an eta of about 2.5 hours. These systems are using OpenLDAP 2.4.35.
Here are some relevent bits from the configuration.
dn: cn=config olcToolThreads: 2
dn: olcDatabase={2}mdb,cn=config olcDbCheckpoint: 1024 5 olcDbEnvFlags: writemap olcDbEnvFlags: nometasync olcDbNoSync: FALSE olcDbMaxSize: 85899345920
The systems are Dell r610s with 16 gbyte of memory. Our database is currently 3.2G on the master server.
I have been loading wheezy/2.4.35 databases for weeks now in preparation upgrading the OS and installing the new version of OpenLDAP on our production servers. This is the first time I have seen this.
I fiddled with the hardware enough to the point I don't think it is a hardware problem. There is not really much tuning to do with mdb and I would appreciate some suggestions for what to look at next.
If you've been doing multiple tests, you likely filled RAM.
I like to:
echo 3 > /proc/sys/vm/drop_caches
before doing a MDB load to empty RAM out.
Thanks for the suggestion.
Tried that and still got a slow load. Then I thought "I know how to clear memory cache" and rebooted the system. And the slowness persists. Current load of a replica is at 46m04s with an eta of 02h10m.
I have been setting swappiness to 0 on the ldap servers for years now. I tried setting that back to the default of 60 with no discernible change.
The load starts out at a rate of about 2 M/s. In the past I remember that dropping to something like 900 k/s and staying there. Now the load starts in the same place, but after 30 seconds it alternates between stalling out right, and a rate under 100 k/s. Dips as low as under 10 k/s and sometimes as high at 700 k/s. (My undergraduate degree was in watching water boil.)
Bill
--On Tuesday, June 25, 2013 12:38 PM -0700 Bill MacAllister whm@stanford.edu wrote:
The load starts out at a rate of about 2 M/s. In the past I remember that dropping to something like 900 k/s and staying there. Now the load starts in the same place, but after 30 seconds it alternates between stalling out right, and a rate under 100 k/s. Dips as low as under 10 k/s and sometimes as high at 700 k/s. (My undergraduate degree was in watching water boil.)
What is the partition type? ext4?
What options are set for the partition in fstab?
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
--On June 25, 2013 10:29:26 AM -0700 Bill MacAllister whm@stanford.edu wrote:
With the release of Debian 7 (wheezy) I was rebuilding a couple test systems and was surprised to find that the load times I am seeing for populating the mdb database with slapd have gone up dramatically. The load for a master server that was taking about 10 minutes just took 35 minutes. The slave is worse. A normal load time is 20 minutes and it is at 31 minutes now with an eta of about 2.5 hours.
I tested this file system and that file system with this set of options and that set of options and really never moved the problem significantly. I finally realized I should believe my experiments and think about what else could be causing the problem I was seeing.
What changed about the same time that I starting building with wheezy/stable was that I removed a partition option that had been added to improve performance on our VM farm, i.e. align-at:4k. Our LDAP servers are physical servers after all. Reinstating the parameter resulted in dramatically faster, and reproducible, load times on any file system I tried. We are now using ext4 for our LDAP server farm, well, we are when I get done rebuilding them.
This problem will be specific to the disk in use. Looking at the manufacturers documentation it never really states what the block size it, but implies it is 512 bytes. I think that is a lie, I am told most modern disks lie about their geometry. In any case, for the disks we are using 4k alignment works.
Thanks everyone for the suggestions.
Bill
Bill MacAllister wrote:
--On June 25, 2013 10:29:26 AM -0700 Bill MacAllister whm@stanford.edu wrote:
With the release of Debian 7 (wheezy) I was rebuilding a couple test systems and was surprised to find that the load times I am seeing for populating the mdb database with slapd have gone up dramatically. The load for a master server that was taking about 10 minutes just took 35 minutes. The slave is worse. A normal load time is 20 minutes and it is at 31 minutes now with an eta of about 2.5 hours.
I tested this file system and that file system with this set of options and that set of options and really never moved the problem significantly. I finally realized I should believe my experiments and think about what else could be causing the problem I was seeing.
What changed about the same time that I starting building with wheezy/stable was that I removed a partition option that had been added to improve performance on our VM farm, i.e. align-at:4k. Our LDAP servers are physical servers after all. Reinstating the parameter resulted in dramatically faster, and reproducible, load times on any file system I tried. We are now using ext4 for our LDAP server farm, well, we are when I get done rebuilding them.
This problem will be specific to the disk in use. Looking at the manufacturers documentation it never really states what the block size it, but implies it is 512 bytes. I think that is a lie, I am told most modern disks lie about their geometry. In any case, for the disks we are using 4k alignment works.
Thanks everyone for the suggestions.
Thanks for the followup. Modern hard drives have moved to 4096 byte physical sectors, they advertise 512 byte sectors for compatibility with older OSs. This will probably become an issue more often, although I expect modern Linux tools to be able to operate with actual 4096 byte sectors and make the issue more obvious. There should be a drive option that reports its true sector size, I just don't remember the details at the moment.
openldap-technical@openldap.org