hyc@symas.com said:
Martin Lucina wrote:
That still doesn't explain the MIPS issues, any suggestions on how to proceed there? I can give someone access to a MIPS host if that would help.
Copying back to the list:
Martin Lucina wrote:
hyc@symas.com said:
It appears that this system also lacks a coherent FS cache, like some BSDs. I changed mtest.c to use MDB_WRITEMAP and it now runs fine.
The unmodified mtest.c also worked when single-stepping thru gdb, which apparently gives time for the cache to sort itself out between mdb function calls.
Interesting. What you're saying is that without MDB_WRITEMAP pages are written out separately and it is up to the FS cache to ensure that reading back via the memory map is consistent, correct?
That's the general idea. As the LMDB design paper states, LMDB requires the OS to use a unified buffer cache - so that mmap pages and FS cache pages are the same.
I'll try and dig through the OpenWRT kernel configuration, they must have changed something that triggers this behaviour.
Frankly it seems unlikely that they could have changed something so fundamental to the VM subsystem of the kernel. It's also possible that we're seeing *CPU* cache inconsistencies, and that adding a few MIPS-specific memory barrier instructions here and there may fix things up.
I did some more investigating:
1) Tried adding calls to sync_file_range() (Linux-specific syscall) and in desperation even sync(2) to mdb_txn_commit() just after mdb_page_flush() et al. No change.
2) Compiled the below test program on various plaforms. This tries (rather unscientifically) to test how "long" it takes for a mmap to become consistent after writing to the underlying file through a different fd opened with O_DSYNC (what mdb does).
The results are interesting:
x86_64 core i5m (2 cores, 4 threads): gcc -O2: consistently less than 1k iterations x86_64 core i5m (2 cores, 4 threads): gcc -O2 -DNOBARRIER: consistently around 10k iterations x86_64 dual 4-core xeon, gcc -O2: around 2k iterations x86_64 dual 4-core xeon, gcc -O2 -DNOBARRIER: 10-15k iterations MIPS target, musl gcc -O2 -mips32r2: varies, mostly 1, in each 10 runs at least one run completes in the high 100k's of iterations MIPS target, musl gcc -O2 -mips32r2 -DNOBARRIER: about the same as previous, but when not 1 the result is subjectively higher (around 1m iterations) single CPU SPARCv9 solaris 10, Sun cc -fast -mt: always[*] 1 single CPU SPARCv9 solaris 10, CSW gcc -O2, with or without -DNOBARRIER: always[*] 1 ia64 dual Itanium 2, Linux gcc -O2: around 2k iterations ia64 dual Itanium 2, Linux gcc -O2 -DNOBARRIER: anwhere between 3-8k iterations
[*] very rarely several million iterations
Does this help in any way? It certainly seems to suggest that the MIPS target's fs cache is (eventually) consistent.
Any pointers on how to proceed or what else to try/who else to ask will be much appreciated.
Martin
----test program---- #include <fcntl.h> #include <sys/types.h> #include <sys/mman.h> #include <assert.h> #include <stdio.h> #include <pthread.h> #include <unistd.h>
pthread_barrier_t b;
static void *thread (void *arg) { int fd;
pthread_barrier_wait (&b); fd = open ("/tmp/testfile", O_RDWR | O_CREAT | O_DSYNC, 0600); unsigned long v = 1; assert (write (fd, &v, sizeof v) == sizeof v); close (fd); return NULL; }
int main (int argc, char *argv[]) { int fd; pthread_barrier_init (&b, NULL, 2);
unlink ("/tmp/testfile"); fd = open ("/tmp/testfile", O_RDWR | O_CREAT, 0600); unsigned long v = 0; assert (write (fd, &v, sizeof v) == sizeof v); volatile unsigned long *p = mmap (NULL, getpagesize (), PROT_READ, MAP_SHARED, fd, 0); assert (p != MAP_FAILED);
int i = 0; pthread_t thread_id = 0; pthread_create (&thread_id, NULL, thread, NULL);
while (*p != 1) { if (!i) pthread_barrier_wait (&b); i++; #if defined (__GNUC__) && !defined (NOBARRIER) __sync_synchronize (); #endif } printf ("%d\n", i);
munmap ((void *)p, getpagesize ()); close (fd); return 0; }