Howard Chu wrote:
Luke Kenneth Casson Leighton wrote:
http://symas.com/mdb/inmem/scaling.html
can i make the suggestion that, whilst i am aware that it is generally not recommended for production environments to run more processes than there are cores, you try running 128, 256 and even 512 processes all hitting that 64-core system, and monitor its I/O usage (iostats) and loadavg whilst doing so?
Sure, I can conduct that test and collect system stats using atop. Will let ya know. By the way, we're using threads here, not processes. But the overall loading behavior should be the same either way.
the hypothesis to test is that the performance, which should scale reasonably linearly downwards as a ratio of the number of processes to the number of cores, instead drops like a lead balloon.
Threads Run Time CPU % DB Size Process Size Context Switc Write Read Wall User Sys Vol Invol 1 10:01.39 00:19:39.18 00:00:21.00 199 12647888 12650460 21 1513 45605 275331 2 10:01.38 00:29:35.21 00:00:24.33 299 12647888 12650472 48 2661 42726 528514 4 10:01.37 00:49:32.93 00:00:25.42 498 12647888 12650496 84 4106 40961 1068050 8 10:01.36 01:29:32.68 00:00:23.25 897 12647888 12650756 157 7738 38812 2058741 16 10:01.36 02:49:22.44 00:00:28.51 1694 12647888 12650852 345 16941 33357 3857045 32 10:01.36 05:28:35.39 00:01:02.69 3288 12647888 12651308 923 258250 23922 6091558 64 10:01.38 10:35:44.42 00:01:51.69 6361 12648060 12652132 1766 145585 16571 8724687 128 10:01.38 10:36:43.09 00:01:45.52 6368 12649296 12654928 3276 2906109 8594 9846720 256 10:01.48 10:36:53.05 00:01:36.83 6369 12649304 12658056 5365 3557137 4178 10453540 512 10:02.11 10:36:09.58 00:03:00.83 6369 12649320 12664304 8303 3511456 1947 10728221
Looks to me like the system was reasonably well behaved.
This is reusing a DB that had already had multiple iterations of this benchmark run on it, so the size is larger than for a fresh DB, and it would have significant internal fragmentation - i.e., a lot of sequential data will be in non-adjacent pages.
The only really obvious impact is that the number of involuntary context switches jumps up at 128 threads, which is what you'd expect since there are fewer cores than threads. The writer gets progressively starved, and read rates increase slightly.