--On Tuesday, June 25, 2013 12:58:54 PM -0700 Quanah Gibson-Mount quanah@zimbra.com wrote:
--On Tuesday, June 25, 2013 12:38 PM -0700 Bill MacAllister whm@stanford.edu wrote:
The load starts out at a rate of about 2 M/s. In the past I remember that dropping to something like 900 k/s and staying there. Now the load starts in the same place, but after 30 seconds it alternates between stalling out right, and a rate under 100 k/s. Dips as low as under 10 k/s and sometimes as high at 700 k/s. (My undergraduate degree was in watching water boil.)
What is the partition type? ext4?
What options are set for the partition in fstab?
This is what I am currently using. The UUID are obviously shortened for readability.
UUID=blah1 / ext4 defaults,acl,noatime,errors=remount-ro 0 1 UUID=blah2 /var/cache/openafs ext4 defaults,noatime 0 2 UUID=blah3 /var/lib/ldap ext4 defaults,noatime 0 2 UUID=blah4 none swap sw 0 0
I also tried ext3 with the same results. This is on a raid-1. I have also tried splitting the two disks and putting the OS on one and the LDAP database on the other. None of this moved the problem.
It really has the feel of a resource exhaustion. The load is now stalled in that the progress display is not updating. top does not show slapd as doing anything.
Bill
Bill MacAllister wrote:
--On Tuesday, June 25, 2013 12:58:54 PM -0700 Quanah Gibson-Mount quanah@zimbra.com wrote:
--On Tuesday, June 25, 2013 12:38 PM -0700 Bill MacAllister whm@stanford.edu wrote:
The load starts out at a rate of about 2 M/s. In the past I remember that dropping to something like 900 k/s and staying there. Now the load starts in the same place, but after 30 seconds it alternates between stalling out right, and a rate under 100 k/s. Dips as low as under 10 k/s and sometimes as high at 700 k/s. (My undergraduate degree was in watching water boil.)
What is the partition type? ext4?
What options are set for the partition in fstab?
This is what I am currently using. The UUID are obviously shortened for readability.
UUID=blah1 / ext4 defaults,acl,noatime,errors=remount-ro 0 1 UUID=blah2 /var/cache/openafs ext4 defaults,noatime 0 2 UUID=blah3 /var/lib/ldap ext4 defaults,noatime 0 2 UUID=blah4 none swap sw 0 0
I also tried ext3 with the same results. This is on a raid-1. I have also tried splitting the two disks and putting the OS on one and the LDAP database on the other. None of this moved the problem.
It really has the feel of a resource exhaustion. The load is now stalled in that the progress display is not updating. top does not show slapd as doing anything.
Probably bad default FS settings, and changed from your previous OS revision.
Also, you should watch vmstat while it runs to get a better idea of how much time the system is spending in I/O wait.
--On Tuesday, June 25, 2013 03:10:17 PM -0700 Howard Chu hyc@symas.com wrote:
Bill MacAllister wrote:
--On Tuesday, June 25, 2013 12:58:54 PM -0700 Quanah Gibson-Mount quanah@zimbra.com wrote:
--On Tuesday, June 25, 2013 12:38 PM -0700 Bill MacAllister whm@stanford.edu wrote:
The load starts out at a rate of about 2 M/s. In the past I remember that dropping to something like 900 k/s and staying there. Now the load starts in the same place, but after 30 seconds it alternates between stalling out right, and a rate under 100 k/s. Dips as low as under 10 k/s and sometimes as high at 700 k/s. (My undergraduate degree was in watching water boil.)
What is the partition type? ext4?
What options are set for the partition in fstab?
This is what I am currently using. The UUID are obviously shortened for readability.
UUID=blah1 / ext4 defaults,acl,noatime,errors=remount-ro 0 1 UUID=blah2 /var/cache/openafs ext4 defaults,noatime 0 2 UUID=blah3 /var/lib/ldap ext4 defaults,noatime 0 2 UUID=blah4 none swap sw 0 0
I also tried ext3 with the same results. This is on a raid-1. I have also tried splitting the two disks and putting the OS on one and the LDAP database on the other. None of this moved the problem.
It really has the feel of a resource exhaustion. The load is now stalled in that the progress display is not updating. top does not show slapd as doing anything.
Probably bad default FS settings, and changed from your previous OS revision.
Also, you should watch vmstat while it runs to get a better idea of how much time the system is spending in I/O wait.
I have just re-mkfs'ed the new, slow system to make it look like the old, fast system. Just to make sure nothing else changed I have started a load on the older system. Things look fine.
Now, comparing vmstat output, the new system is clearly badness incarnate.
Fast ==== procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 6358656 303468 9301620 0 0 0 0 45 45 0 0 100 0 0 0 0 6358780 303468 9301620 0 0 0 0 47 41 0 0 100 0 0 0 0 6358780 303468 9301620 0 0 0 0 47 41 0 0 100 0 0 0 0 6358780 303468 9301620 0 0 0 0 93 43 0 1 99 0 0 0 0 6358532 303468 9301620 0 0 0 0 141 71 0 1 99 0 1 0 0 6358488 303468 9301620 0 0 0 14 116 48 0 1 99 0
Slow ==== procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 4 0 13318088 36128 2759600 0 0 0 2134 379 83 0 0 88 12 0 4 0 13318308 36128 2759600 0 0 0 1044 277 70 0 0 88 12 0 4 0 13318508 36132 2759600 0 0 0 765 267 69 0 0 88 12 0 2 0 13318240 36152 2759604 0 0 0 818 593 104 0 0 88 12 0 2 0 13318332 36168 2759604 0 0 0 2611 1489 138 0 0 89 11
Lots of waiting, lots of blocking. What's the deal with all that free memory on the slow system?
I will interate on mkfs for a bit, but I thought I would send this off incase something jumps out.
Bill
Bill MacAllister wrote:
--On Tuesday, June 25, 2013 03:10:17 PM -0700 Howard Chu hyc@symas.com wrote:
Probably bad default FS settings, and changed from your previous OS revision.
Also, you should watch vmstat while it runs to get a better idea of how much time the system is spending in I/O wait.
I have just re-mkfs'ed the new, slow system to make it look like the old, fast system. Just to make sure nothing else changed I have started a load on the older system. Things look fine.
I meant mount options, mkfs should have very little impact.
ext3 journaling is pretty awful. For ext4 you probably want data=writeback mode, and you probably should compare the commit= value, which defaults to 5 seconds and also barrier; I believe the barrier default changed between kernel revisions.
Now, comparing vmstat output, the new system is clearly badness incarnate.
Fast
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 6358656 303468 9301620 0 0 0 0 45 45 0 0 100 0 0 0 0 6358780 303468 9301620 0 0 0 0 47 41 0 0 100 0 0 0 0 6358780 303468 9301620 0 0 0 0 47 41 0 0 100 0 0 0 0 6358780 303468 9301620 0 0 0 0 93 43 0 1 99 0 0 0 0 6358532 303468 9301620 0 0 0 0 141 71 0 1 99 0 1 0 0 6358488 303468 9301620 0 0 0 14 116 48 0 1 99 0
Slow
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 4 0 13318088 36128 2759600 0 0 0 2134 379 83 0 0 88 12 0 4 0 13318308 36128 2759600 0 0 0 1044 277 70 0 0 88 12 0 4 0 13318508 36132 2759600 0 0 0 765 267 69 0 0 88 12 0 2 0 13318240 36152 2759604 0 0 0 818 593 104 0 0 88 12 0 2 0 13318332 36168 2759604 0 0 0 2611 1489 138 0 0 89 11
Lots of waiting, lots of blocking. What's the deal with all that free memory on the slow system?
I will interate on mkfs for a bit, but I thought I would send this off incase something jumps out.
Bill
--On Wednesday, June 26, 2013 04:19:27 AM -0700 Howard Chu hyc@symas.com wrote:
Bill MacAllister wrote:
--On Tuesday, June 25, 2013 03:10:17 PM -0700 Howard Chu hyc@symas.com wrote:
Probably bad default FS settings, and changed from your previous OS revision.
Also, you should watch vmstat while it runs to get a better idea of how much time the system is spending in I/O wait.
I have just re-mkfs'ed the new, slow system to make it look like the old, fast system. Just to make sure nothing else changed I have started a load on the older system. Things look fine.
I meant mount options, mkfs should have very little impact.
ext3 journaling is pretty awful. For ext4 you probably want data=writeback mode, and you probably should compare the commit= value, which defaults to 5 seconds and also barrier; I believe the barrier default changed between kernel revisions.
I started with mkfs because I wanted to see if I could make things better by putting the ext4 journal file on a different disk than the database. With the default on Debian of data=ordered the load time was awful even with the journal on a separate disk. I killed it after about 20 minutes when the eta topped two hours and was climbing.
My next attempt was to do away with journaling altogether and create the database on an ext2 file system. Not surprisingly the load time was great, just a bit over 21 minutes. This is the bench mark that I used, i.e. the best that I can expect.
I tried a load on an ext4 system with options 'rw,noatime, user_xattr, barrier=1, data=writeback' and got a load time of 01h40m06s. This is the best time I have gotten so far loading on ext4.
I ended up writing a script that creates an ext2 file systems, loads the backend, umounts the partition, converts it to ext4 journaling, and then mounts the partition again. This will allow me to continue with the server rebuilds, but it is a pretty ugly hack.
Bill
Bill MacAllister wrote:
--On Wednesday, June 26, 2013 04:19:27 AM -0700 Howard Chu hyc@symas.com wrote:
Bill MacAllister wrote:
--On Tuesday, June 25, 2013 03:10:17 PM -0700 Howard Chu hyc@symas.com wrote:
Probably bad default FS settings, and changed from your previous OS revision.
Also, you should watch vmstat while it runs to get a better idea of how much time the system is spending in I/O wait.
I have just re-mkfs'ed the new, slow system to make it look like the old, fast system. Just to make sure nothing else changed I have started a load on the older system. Things look fine.
I meant mount options, mkfs should have very little impact.
ext3 journaling is pretty awful. For ext4 you probably want data=writeback mode, and you probably should compare the commit= value, which defaults to 5 seconds and also barrier; I believe the barrier default changed between kernel revisions.
I started with mkfs because I wanted to see if I could make things better by putting the ext4 journal file on a different disk than the database. With the default on Debian of data=ordered the load time was awful even with the journal on a separate disk. I killed it after about 20 minutes when the eta topped two hours and was climbing.
My next attempt was to do away with journaling altogether and create the database on an ext2 file system. Not surprisingly the load time was great, just a bit over 21 minutes. This is the bench mark that I used, i.e. the best that I can expect.
OK, makes sense. And yes, an external journal device should have helped.
I tried a load on an ext4 system with options 'rw,noatime, user_xattr, barrier=1, data=writeback' and got a load time of 01h40m06s. This is the best time I have gotten so far loading on ext4.
Did you try commit=60 barrier=0 ?
I ended up writing a script that creates an ext2 file systems, loads the backend, umounts the partition, converts it to ext4 journaling, and then mounts the partition again. This will allow me to continue with the server rebuilds, but it is a pretty ugly hack.
--On June 27, 2013 1:40:16 AM -0700 Howard Chu hyc@symas.com wrote:
I tried a load on an ext4 system with options 'rw,noatime, user_xattr, barrier=1, data=writeback' and got a load time of 01h40m06s. This is the best time I have gotten so far loading on ext4.
Did you try commit=60 barrier=0 ?
Here are the details of a test using commit=60 barrier=0.
mkfs -t ext4 \ -O "^flex_bg ^huge_file ^uninit_bg ^dir_nlink ^extra_isize ^extent" mount -t ext4 -o rw,noatime,barrier=0,commit=60,data=writeback Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file mount optiions: rw, noatime, user_xattr, commit=60, barrier=0, data=writeback elapsed 01h40m55s spd 211.0 k/s
I ended up writing a script that creates an ext2 file systems, loads the backend, umounts the partition, converts it to ext4 journaling, and then mounts the partition again. This will allow me to continue with the server rebuilds, but it is a pretty ugly hack.
I am now getting close to ext2 performance using ext3, but ext4 is consistently too slow in all of my tests. Here are the results of the fastest ext3 and fastest ext4 tests.
* mkfs -t ext3 -O has_journal mount -t ext3 -o rw,noatime,data=writeback Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file mount options: rw, noatime, errors=continue, user_xattr, acl, barrier=1, data=writeback elapsed 22m03s spd 965.6 k/s
* mkfs -t ext4 -O "^flex_bg ^huge_file ^uninit_bg ^dir_nlink ^extra_isize" mount -t ext4 -o rw,noatime,data=writeback Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent sparse_super large_file mount options: rw, noatime, user_xattr, barrier=1, data=writeback elapsed 01h32m19s spd 230.6 k/s
During a load the status display would stall periodically. The worse the load time the more frequently the display stalled and the longer it stalled for. I guessing that this is flushing data to the disk. I am also guessing that since mdb is using memory mapped files that some tuning of memory management might help improve performance. I am not familiar with the tuning knobs there so any pointers would be appreciated.
Bill
openldap-technical@openldap.org