On 2015-04-23 5:56 PM, Howard Chu wrote:
Geoff Swan wrote:
On 2015-04-23 4:43 PM, Howard Chu wrote:
Geoff Swan wrote:
On 2015-04-23 4:07 PM, Ulrich Windl wrote:
>> Geoff Swan gswan3@bigpond.net.au schrieb am 22.04.2015 um 23:18 >> in Nachricht
5538100A.3060301@bigpond.net.au:
[...]
Free stats look fine to me. No swap is being used, or has been used yet on this system.
total used free shared buffers
cached Mem: 132005400 21928192 110077208 10612 363064 19230072 -/+ buffers/cache: 2335056 129670344 Swap: 8388604 0 8388604
The effect I am seeing is that despite a reasonably fast disc system, the kernel writing of dirty pages is painfully slow.
So disk I/O stats is also available via sar. May we see them? Finally there's blktrace where you can follow the timing and positions of each individual block being written, but that's not quite easy to run and analyze (unless I missed the easy way).
I suspect scattered writes that bring your I/O rate down.
Regards, Ulrich
sysstat (and sar) is not installed on this system, so I can't give you that output.
As I said in my previous email - use iostat and/or vmstat to monitor paging activity and system wait time. You can also install atop and get all of the relevant stats on one screen.
I find it indispensible these days.
BTW, just using nmon whilst running a test script that writes 20K small objects shows a write speed of around 3-5 MB/s to the disc (without dbnosync, so as to see the immediate effects). This is on a SAS disc with a 140MB/s capability, checked using regular tools such a dd (writing 1000 files of 384K each). I'm failing to understand why the write operations from the kernel page cache is so drastically slower. With dbnosync enabled, lmdb writes to the pages fast (as expected), however the pdflush that follows writes at the same slow speed, causing delays for further processes.
In normal (safe) operation, every transaction commit performs 2 fsyncs. Your 140MB/s throughput spec isn't relevant here, your disk's IOPS rate is what matters. You can use NOMETASYNC to do only 1 fsync per commit.
OK. I ran a reduced version of test script (20 processes each performing 40 read/write operations) with normal (safe) mode of operation on a test server that has 32GB RAM, and everything else identical to the server with 128GB. A quick test using vmstat at 1s intervals gave the following output whilst it was running.
procs ---------------memory-------------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 20 0 0 32011144 167764 330416 0 0 1 15 40 56 0 0 99 1 0 0 0 0 31914848 167764 330424 0 0 0 1560 2594 2130 2 1 97 0 0 0 0 0 31914336 167764 330424 0 0 0 1708 754 1277 0 0 100 0 0 0 0 0 31914508 167772 330420 0 0 0 2028 779 1300 0 0 99 1 0 0 0 0 31913956 167772 330428 0 0 0 1996 936 1354 0 0 100 0 0 0 0 0 31914272 167772 330428 0 0 0 1588 785 1286 0 0 100 0 0 0 0 0 31914356 167780 330424 0 0 0 1604 832 1388 0 0 99 1 0 0 0 0 31914572 167780 330428 0 0 0 1240 1258 1910 0 0 100 0 0 0 0 0 31914596 167780 330428 0 0 0 1776 905 1626 0 0 100 0 0 0 0 0 31914812 167780 330428 0 0 0 1340 1117 1731 0 0 100 0 0 0 0 0 31914796 167788 330420 0 0 0 1504 1154 1883 0 0 100 0 0 0 0 0 31914936 167788 330428 0 0 0 1476 835 1390 0 0 100 0 0 0 0 0 31914844 167796 330432 0 0 0 1140 1203 1891 0 0 100 0 0 0 0 0 31914744 167796 330432 0 0 0 1600 1034 1667 0 0 100 0 0 0 0 0 31914976 167796 330432 0 0 0 1488 1149 1715 0 0 100 0 0 2 0 0 31915052 167796 330432 0 0 0 1248 1249 1930 0 0 100 0 0 0 0 0 31914316 167796 330432 0 0 0 1736 1076 1625 0 0 99 0 0 0 0 0 31914680 167804 330424 0 0 0 1176 1103 1751 0 0 100 0 0 0 0 0 31915108 167804 330432 0 0 0 1476 1063 1776 0 0 100 0 0 0 0 0 31914796 167804 330432 0 0 0 1660 967 1594 0 0 100 0 0 0 0 0 31914820 167812 330424 0 0 0 1336 1358 2124 0 0 100 0 0 0 0 0 31914904 167812 330432 0 0 0 1608 857 1464 0 0 100 0 0 0 0 0 31914972 167828 330420 0 0 0 1316 1061 1741 0 0 99 1 0 0 0 0 31915140 167828 330428 0 0 0 1320 1103 1751 0 0 100 0 0 0 0 0 31915756 167828 330436 0 0 0 1604 1049 1544 0 0 99 0 0 0 0 0 31915708 167828 330436 0 0 0 1244 1358 2052 0 0 100 0 0 0 0 0 31916008 167828 330436 0 0 0 1620 912 1595 0 0 100 0 0 0 0 0 31915964 167836 330428 0 0 0 1492 1105 1715 0 0 100 0 0 0 0 0 31915916 167836 330436 0 0 0 1376 1164 1878 0 0 100 0 0 0 0 0 31915924 167836 330436 0 0 0 1484 848 1390 0 0 100 0 0 0 0 0 31916176 167844 330428 0 0 0 1368 1447 2123 0 0 99 0 0 0 0 0 31916864 167844 330444 0 0 0 1444 1355 1832 0 0 99 1 0
The script took about 60s to complete, which is a lot longer than expected. It appears almost all I/O bound, at a fairly slow rate (1500 blocks in a second is 6MB/s).
I'll build atop or iostat to get more detail.