Fixing this will either require adding a bunch of ugly code, or changing the on-disk format again. Opinions?
Currently the page in-use offsets mp_lower and mp_upper range from [PAGEHDRSZ to pagesize]. IMO this was a stupid choice, carried over from the original btree code. It should instead have ranged from [0 to pagesize-PAGEHDRSZ] and then we'd have no issue right now. Adjusting this would require only a few minor tweaks to the code, but would require a full dump/reload of existing databases.
-------- Original Message -------- Subject: Re: (ITS#7713) Segmentation fault if the pagesize of the Operating system is not equal to 4096. Date: Tue, 1 Oct 2013 07:16:11 GMT From: hyc@symas.com To: openldap-its@openldap.org
sumantk2@linux.vnet.ibm.com wrote:
Full_Name: sumanth k Version: 2.4.35 to any recent version with mdb support OS: Linux - ppc64 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (122.248.161.59)
The pagesize of Linux for x86 and s390x architecture is 4096. Whereas in Powerpc <ppc64> , the pagesize is by default 65536. So when the pagesize is not equal to 4096, the segmentation fault occurs. So tried to compile the powerpc64 kernel with page size of 4096 and the problem disappears and runs smoothly. But by default the powerpc64 architecture runs with 65536 pagesize . So there is some problem in mdb_env_open2() function in mdb.c when the pagesize is not equal to 4096.
Thanks for the report. I believe you should be able to instead change the definition of MDB_PAGESIZE to 65536, instead of forcing your machine to use 4096 byte pages.
There are other problems though; we use an unsigned short for page offsets. I'm not sure the assert that you tripped will succeed in this case.
These are my observations :
Compiled the source with -O0 optimization . /home/openldap-2.4.36/tests/../servers/slapd/slapd -s0 -f /home/openldap-2.4.36/tests/testrun/slapd.1.conf -h ldap://localhost:9011/ -d 0x4105
./scripts/test000-rootdse: line 31: 19059 Aborted (core dumped) $SLAPD -f $CONF1 -h $URI1 -d $LVL $TIMING > $LOG1 2>&1
Core file : bash-4.2# cat .gdbinit b mdb_db_open b mdb_env_open b mdb_env_open2 b mdb_txn_begin b mdb_txn_renew0 b mdb_dbi_open b mdb_cursor_init b mdb_cursor_set b mdb_page_search b mdb_page_get b mdb_page_search_root r -s0 -f /home/openldap-2.4.36/tests/testrun/slapd.1.conf -h ldap://localhost:9011/ -d 0x4105
# Core was generated by `/home/openldap-2.4.36/tests/../servers/slapd/slapd -s0 -f /home/openldap-2.4.36'. Program terminated with signal 6, Aborted. #0 0x00001fffff7adb70 in .raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-8.mcp8_0.2.ppc64 cyrus-sasl-md5-2.1.26-8.mcp8_0.2.ppc64 glibc-2.17-4.mcp8_0.6.ppc64 keyutils-libs-1.5.5-4.mcp8_0.2.ppc64 krb5-libs-1.11.3-1.mcp8_0.2.ppc64 libcom_err-1.42.7-2.mcp8_0.1.ppc64 libdb-5.3.21-8.mcp8_0.4.ppc64 libselinux-2.1.13-15.mcp8_0.3.ppc64 nss-softokn-freebl-3.14.3-1.mcp8_0.2.ppc64 openssl-libs-1.0.1e-4.mcp8_0.1.ppc64 pcre-8.32-7.mcp8_0.2.ppc64 zlib-1.2.7-10.mcp8_0.1.ppc64 (gdb) where #0 0x00001fffff7adb70 in .raise () from /lib64/libc.so.6 #1 0x00001fffff7afb64 in .abort () from /lib64/libc.so.6 #2 0x00001fffff7a455c in .__assert_fail_base () from /lib64/libc.so.6 #3 0x00001fffff7a464c in .__assert_fail () from /lib64/libc.so.6 #4 0x00000000101498cc in mdb_node_add (mc=0x3fffe89a7d40, indx=0, key=0x3fffe89a7d20, data=0x3fffe89a7d30, pgno=0, flags=2) at ./../../../libraries/liblmdb/mdb.c:6160 #5 0x000000001014882c in mdb_cursor_put (mc=0x3fffe89a7d40, key=0x3fffe89a7d20, data=0x3fffe89a7d30, flags=2) at ./../../../libraries/liblmdb/mdb.c:5877 #6 0x00000000101516a8 in mdb_dbi_open (txn=0x1000f675980, name=0x1027b9c8 "ad2i", flags=262152, dbi=0x1fffff0800a0) at ./../../../libraries/liblmdb/mdb.c:7902 #7 0x0000000010139cd4 in mdb_db_open (be=0x1000f4f83a0, cr=0x3fffe89a81c0) at init.c:207 #8 0x0000000010050e34 in backend_startup_one (be=0x1000f4f83a0, cr=0x3fffe89a81c0) at backend.c:224 #9 0x0000000010051588 in backend_startup (be=0x1000f4f83a0) at backend.c:325 #10 0x0000000010089a7c in slap_startup (be=0x0) at init.c:219 #11 0x000000001000a9c8 in main (argc=8, argv=0x3fffe89a8958) at main.c:991
Here is the error message :
#/home/openldap-2.4.36/tests/../servers/slapd/slapd -s0 -f /home/openldap-2.4.36/tests/testrun/slapd.1.conf -h ldap://localhost:9011/ -d 0x4105 (...some messages...) 524a2fe1 mdb_db_open: database "o=OpenLDAP Project,l=Internet": dbenv_open(/home/openldap-2.4.36/tests/testrun/db.1.a). slapd: ./../../../libraries/liblmdb/mdb.c:6160: mdb_node_add: Assertion `mp->mp_pb.pb.pb_upper >= mp->mp_pb.pb.pb_lower' failed. < === fails in assert() ; Aborted (core dumped)
Some of my observation :
in this file libraries/liblmdb/mdb.c in X86 : rc = mdb_cursor_set(&mc, &key, &data, MDB_SET, &exact); the value of rc = MD_SUCCESS, but for ppc64 it is MDB_NOTFOUND. This is due to the fact that md_root != 2
The value of md_root=2 for env->me_metas[1]->mm_dbs in X86 , but some huge value in ppc64. The value of md_pad = 4096 in x86 and 65536 in ppc64 .
in x86:
Breakpoint 2, mdb_txn_begin (env=0x9c5050, parent=0x0, flags=0, ret=0x7fffffffdf98) at ./../../../libraries/liblmdb/mdb.c:2219 2219 int rc, size, tsize = sizeof(MDB_txn); (gdb) p env->me_metas[1]->mm_dbs $1 = {{md_pad = 4096, md_flags = 8, md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root = 18446744073709551615}, {md_pad = 0, md_flags = 0, md_depth = 1, md_branch_pages = 0, md_leaf_pages = 1, md_overflow_pages = 0, md_entries = 4, md_root = 2}} (gdb) p env->me_metas[0]->mm_dbs $2 = {{md_pad = 4096, md_flags = 8, md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root = 18446744073709551615}, {md_pad = 0, md_flags = 0, md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root = 18446744073709551615}}
in ppc64:
Breakpoint 2, mdb_txn_begin (env=0x10493ef0, parent=0x0, flags=0, ret=0x3fffffffe590) at ./../../../libraries/liblmdb/mdb.c:2219 2219 int rc, size, tsize = sizeof(MDB_txn); (gdb) p env->me_metas[0]->mm_dbs $1 = {{md_pad = 65536, md_flags = 8, md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root = 18446744073709551615}, {md_pad = 0, md_flags = 0, md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root = 18446744073709551615}} (gdb) p env->me_metas[1]->mm_dbs $2 = {{md_pad = 65536, md_flags = 8, md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root = 18446744073709551615}, {md_pad = 0, md_flags = 0, md_depth = 0, md_branch_pages = 0, md_leaf_pages = 0, md_overflow_pages = 0, md_entries = 0, md_root = 18446744073709551615}}
# From further investigation , the value of env->me_metas[1] is initialized in mdb_env_open2() at :
p = (MDB_page *)env->me_map; env->me_metas[0] = METADATA(p); env->me_metas[1] = (MDB_meta *)((char *)env->me_metas[0] + meta.mm_psize);
Here the meta.mm_psize in ppc64 is 65536 , hence there is some problem.. If the value of meta.mm_psize is 4096 , then everything works fine.
As i dont have deep knowledge in openldap ,some help is needed.
Thank you, Sumanth K