Hello,
I've created a program that writes data to an lmdb database using python bindings. I've written about 300gb worth of data but when I was re-reading the database by running my script again, it crashes immediately. Did a backtrace in gdb and this is what I got:
Really have no clue what's going on here. I suspect that the server has rebooted (or was rebooted by someone while I was running the script but I thought lmdb is crash-proof.
Do you have any idea how to best troubleshoot this? The database is worth 300gb+ so it's not easy to share it, not to mention it contains sensitive information.
Starting program: /usr/bin/python3 read_events.py -d jul012022-23/ [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". build/lib/mdb.c:3274: Assertion 'len >= 0 && id <= env->me_pglast' failed in mdb_freelist_save()
Program received signal SIGABRT, Aborted. __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350283264) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) backtrace #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350283264) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140737350283264) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140737350283264, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007ffff7c96476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007ffff7c7c7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007ffff62fb3e2 in mdb_assert_fail (env=0x5555565b49f0, expr_txt=expr_txt@entry=0x7ffff62fe308 "len >= 0 && id <= env->me_pglast", func=func@entry=0x7ffff62fe930 <__func__.13> "mdb_freelist_save", line=line@entry=3274, file=0x7ffff62fe010 "build/lib/mdb.c") at build/lib/mdb.c:1545 #6 0x00007ffff62f10df in mdb_freelist_save (txn=0x555556752950) at build/lib/mdb.c:3274 #7 mdb_txn_commit (txn=0x555556752950) at build/lib/mdb.c:3646 #8 0x00007ffff62f369b in txn_db_from_name (env=env@entry=0x7ffff61c8090, name=<optimized out>, flags=262144) at lmdb/cpython.c:1017 #9 0x00007ffff62f6e1d in env_open_db (self=0x7ffff61c8090, args=<optimized out>, kwds=<optimized out>) at lmdb/cpython.c:1665 #10 0x00005555556b23a9 in ?? () #11 0x0000555555699c14 in _PyEval_EvalFrameDefault () #12 0x0000555555696176 in ?? () #13 0x000055555578bc56 in PyEval_EvalCode () #14 0x00005555557b8b18 in ?? () #15 0x00005555557b196b in ?? () #16 0x00005555557b8865 in ?? () #17 0x00005555557b7d48 in _PyRun_SimpleFileObject () #18 0x00005555557b7a43 in _PyRun_AnyFileObject () #19 0x00005555557a8c3e in Py_RunMain () #20 0x000055555577ebcd in Py_BytesMain () #21 0x00007ffff7c7dd90 in __libc_start_call_main (main=main@entry=0x55555577eb90, argc=argc@entry=14, argv=argv@entry=0x7fffffffe378) at ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x00007ffff7c7de40 in __libc_start_main_impl (main=0x55555577eb90, argc=14, argv=0x7fffffffe378, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe368) at ../csu/libc-start.c:392 #23 0x000055555577eac5 in _start () (gdb) client_loop: send disconnect: Broken pipe
For what it's worth, this is how I've opened the database environment:
dbenv = lmdb.open(db_dir, map_size=1099511627776, max_dbs=11, readahead=False)
mark jayson wrote:
Hello,
I've created a program that writes data to an lmdb database using python bindings. I've written about 300gb worth of data but when I was re-reading the database by running my script again, it crashes immediately. Did a backtrace in gdb and this is what I got:
Really have no clue what's going on here. I suspect that the server has rebooted (or was rebooted by someone while I was running the script but I thought lmdb is crash-proof.
It is. Extensively tested so, but storage devices can still lie about whether they successfully wrote data.
Do you have any idea how to best troubleshoot this? The database is worth 300gb+ so it's not easy to share it, not to mention it contains sensitive information.
Try using mdb_dump to see if it will backup the contents without crashing.
Also, when asking for help you should specify exactly what version of liblmdb you used.
The backtrace is a bit odd because normally just opening an environment shouldn't require committing any write txns.
Starting program: /usr/bin/python3 read_events.py -d jul012022-23/ [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". build/lib/mdb.c:3274: Assertion 'len >= 0 && id <= env->me_pglast' failed in mdb_freelist_save()
Program received signal SIGABRT, Aborted. __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350283264) at ./nptl/pthread_kill.c:44 44./nptl/pthread_kill.c: No such file or directory. (gdb) backtrace #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350283264) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140737350283264) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140737350283264, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007ffff7c96476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007ffff7c7c7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007ffff62fb3e2 in mdb_assert_fail (env=0x5555565b49f0, expr_txt=expr_txt@entry=0x7ffff62fe308 "len >= 0 && id <= env->me_pglast", func=func@entry=0x7ffff62fe930 <__func__.13> "mdb_freelist_save", line=line@entry=3274, file=0x7ffff62fe010 "build/lib/mdb.c") at build/lib/mdb.c:1545 #6 0x00007ffff62f10df in mdb_freelist_save (txn=0x555556752950) at build/lib/mdb.c:3274 #7 mdb_txn_commit (txn=0x555556752950) at build/lib/mdb.c:3646 #8 0x00007ffff62f369b in txn_db_from_name (env=env@entry=0x7ffff61c8090, name=<optimized out>, flags=262144) at lmdb/cpython.c:1017 #9 0x00007ffff62f6e1d in env_open_db (self=0x7ffff61c8090, args=<optimized out>, kwds=<optimized out>) at lmdb/cpython.c:1665 #10 0x00005555556b23a9 in ?? () #11 0x0000555555699c14 in _PyEval_EvalFrameDefault () #12 0x0000555555696176 in ?? () #13 0x000055555578bc56 in PyEval_EvalCode () #14 0x00005555557b8b18 in ?? () #15 0x00005555557b196b in ?? () #16 0x00005555557b8865 in ?? () #17 0x00005555557b7d48 in _PyRun_SimpleFileObject () #18 0x00005555557b7a43 in _PyRun_AnyFileObject () #19 0x00005555557a8c3e in Py_RunMain () #20 0x000055555577ebcd in Py_BytesMain () #21 0x00007ffff7c7dd90 in __libc_start_call_main (main=main@entry=0x55555577eb90, argc=argc@entry=14, argv=argv@entry=0x7fffffffe378) at ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x00007ffff7c7de40 in __libc_start_main_impl (main=0x55555577eb90, argc=14, argv=0x7fffffffe378, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe368) at ../csu/libc-start.c:392 #23 0x000055555577eac5 in _start () (gdb) client_loop: send disconnect: Broken pipe
For what it's worth, this is how I've opened the database environment:
dbenv = lmdb.open(db_dir, map_size=1099511627776, max_dbs=11, readahead=False)
Thanks Howard. I think I figured out what the problem in the data is that is causing the crash. The initial crash (while the program was running and writing the 300gb database) was different but I think it is related. So in my program I had a line that loads some value in the database and json decodes it. When I tried running the program again against that 300gb database, it crashes on that line which tells me the exact value that's throwing off the script.
The value being loaded is a date range spanning 1 year split into 2 minutes intervals. These date ranges were being used as filters when querying a remote system. Sometimes when running the program, the results being returned contain some weird characters in them like hex sequences. I thought it was the remote system returning this invalid data but when I checked the data that my program itself generated and stored in the database (those date ranges) lo and behold, they're also mixed with weird characters. You can see below (last line), some of the date ranges have these weird character sequences such as \xb023961+0\xb000.
"2022-08-03T15:46:00.023992+0000","2022-08-03T13:36:00.023928+0000":"2022-08-03T13:38:00.023928+0000","2022-08-03T15:56:00.023998+0000":"2022-08-03T15:58:00.023998+0000","2022-08-03T14:38:00.023959+0000" :"2022-08-03T14:40:00.023959+0000","2022-08-03T13:50:00.023935+0000":"2022-08-03T13:52:00.023935+0000","2022-08-03T14:18:00.023949+0000":"2022-08-03T14:20:00.023949+0000","2022-08-03T12:50:00.023905+0000 ":"2022-08-03T12:52:00.023905+0000","2022-08-03T14:40:00.023960+0000":"2022-08-03T14:42:00.023960+0000","2022-08-03T13:58:00.023939+0000":"2022-08-03T14:00:00.023939+0000","2022-08-03T12:40:00.023900+000 0":"2022-08-03T12:42:00.023900+0000","2022-08-03T14:42:00.\xb023961+0\xb000":"2022-08-03\xd414:44:02.023961+0000","0022-08-03T14:30:00.023955+0000"\xba"2022-08-03T14:32:00.023955+0000","2022-08-03T13:52:
The lmdb version I'm using is:
Python 3.10.6 lmdb==1.4.1
uname -a Linux vmmachine 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
I don't know if it has something to do with the SSD storage I'm using that could be faulty or some bug in the python bindings for LMDB, but certainly this is something that's randomly occurring. If I give my program a lengthy date range, I'm pretty sure it will crash again at some point when parsing retrieved data with these weird characters.
Do you have an alternate suggestion given this?
On Tue, Jul 18, 2023 at 11:46 PM Howard Chu hyc@symas.com wrote:
mark jayson wrote:
Hello,
I've created a program that writes data to an lmdb database using python
bindings. I've written about 300gb worth of data but when I was re-reading the database
by running my script again, it crashes immediately. Did a backtrace in
gdb and this is what I got:
Really have no clue what's going on here. I suspect that the server has
rebooted (or was rebooted by someone while I was running the script but I thought lmdb
is crash-proof.
It is. Extensively tested so, but storage devices can still lie about whether they successfully wrote data.
Do you have any idea how to best troubleshoot this? The database is
worth 300gb+ so it's not easy to share it, not to mention it contains sensitive information.
Try using mdb_dump to see if it will backup the contents without crashing.
Also, when asking for help you should specify exactly what version of liblmdb you used.
The backtrace is a bit odd because normally just opening an environment shouldn't require committing any write txns.
Starting program: /usr/bin/python3 read_events.py -d jul012022-23/ [Thread debugging using libthread_db enabled] Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
build/lib/mdb.c:3274: Assertion 'len >= 0 && id <= env->me_pglast'
failed in mdb_freelist_save()
Program received signal SIGABRT, Aborted. __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140737350283264) at ./nptl/pthread_kill.c:44
44./nptl/pthread_kill.c: No such file or directory. (gdb) backtrace #0 __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140737350283264) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=140737350283264) at
./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=140737350283264, signo=signo@entry=6)
at ./nptl/pthread_kill.c:89
#3 0x00007ffff7c96476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4 0x00007ffff7c7c7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007ffff62fb3e2 in mdb_assert_fail (env=0x5555565b49f0,
expr_txt=expr_txt@entry=0x7ffff62fe308 "len >= 0 && id <= env->me_pglast",
func=func@entry=0x7ffff62fe930 <__func__.13> "mdb_freelist_save",
line=line@entry=3274, file=0x7ffff62fe010 "build/lib/mdb.c") at build/lib/mdb.c:1545
#6 0x00007ffff62f10df in mdb_freelist_save (txn=0x555556752950) at
build/lib/mdb.c:3274
#7 mdb_txn_commit (txn=0x555556752950) at build/lib/mdb.c:3646 #8 0x00007ffff62f369b in txn_db_from_name (env=env@entry=0x7ffff61c8090,
name=<optimized out>, flags=262144) at lmdb/cpython.c:1017
#9 0x00007ffff62f6e1d in env_open_db (self=0x7ffff61c8090,
args=<optimized out>, kwds=<optimized out>) at lmdb/cpython.c:1665
#10 0x00005555556b23a9 in ?? () #11 0x0000555555699c14 in _PyEval_EvalFrameDefault () #12 0x0000555555696176 in ?? () #13 0x000055555578bc56 in PyEval_EvalCode () #14 0x00005555557b8b18 in ?? () #15 0x00005555557b196b in ?? () #16 0x00005555557b8865 in ?? () #17 0x00005555557b7d48 in _PyRun_SimpleFileObject () #18 0x00005555557b7a43 in _PyRun_AnyFileObject () #19 0x00005555557a8c3e in Py_RunMain () #20 0x000055555577ebcd in Py_BytesMain () #21 0x00007ffff7c7dd90 in __libc_start_call_main (main=main@entry=0x55555577eb90,
argc=argc@entry=14, argv=argv@entry=0x7fffffffe378) at
../sysdeps/nptl/libc_start_call_main.h:58 #22 0x00007ffff7c7de40 in __libc_start_main_impl (main=0x55555577eb90,
argc=14, argv=0x7fffffffe378, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fffffffe368) at ../csu/libc-start.c:392 #23 0x000055555577eac5 in _start () (gdb) client_loop: send disconnect: Broken pipe
For what it's worth, this is how I've opened the database environment:
dbenv = lmdb.open(db_dir, map_size=1099511627776, max_dbs=11,
readahead=False)
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
--On Wednesday, July 19, 2023 2:59 AM +0800 mark jayson mark.alvarez123@gmail.com wrote:
The lmdb version I'm using is:
Python 3.10.6
lmdb==1.4.1
I'm going to guess that's the version of the python LMDB library. But the deeper question is what is the version of the LMDB C library it is linked to?
--Quanah
openldap-technical@openldap.org