Thanks Howard. I think I figured out what the problem in the data is that is causing the crash. The initial crash (while the program was running and writing the 300gb database) was different but I think it is related.
So in my program I had a line that loads some value in the database and json decodes it. When I tried running the program again against that 300gb database, it crashes on that line which tells me the exact value that's throwing off the script.
The value being loaded is a date range spanning 1 year split into 2 minutes intervals. These date ranges were being used as filters when querying a remote system. Sometimes when running the program, the results being returned contain some weird characters in them like hex sequences. I thought it was the remote system returning this invalid data but when I checked the data that my program itself generated and stored in the database (those date ranges) lo and behold, they're also mixed with weird characters. You can see below (last line), some of the date ranges have these weird character sequences such as \xb023961+0\xb000.
"2022-08-03T15:46:00.023992+0000","2022-08-03T13:36:00.023928+0000":"2022-08-03T13:38:00.023928+0000","2022-08-03T15:56:00.023998+0000":"2022-08-03T15:58:00.023998+0000","2022-08-03T14:38:00.023959+0000"
:"2022-08-03T14:40:00.023959+0000","2022-08-03T13:50:00.023935+0000":"2022-08-03T13:52:00.023935+0000","2022-08-03T14:18:00.023949+0000":"2022-08-03T14:20:00.023949+0000","2022-08-03T12:50:00.023905+0000
":"2022-08-03T12:52:00.023905+0000","2022-08-03T14:40:00.023960+0000":"2022-08-03T14:42:00.023960+0000","2022-08-03T13:58:00.023939+0000":"2022-08-03T14:00:00.023939+0000","2022-08-03T12:40:00.023900+000
0":"2022-08-03T12:42:00.023900+0000","2022-08-03T14:42:00.\xb023961+0\xb000":"2022-08-03\xd414:44:02.023961+0000","0022-08-03T14:30:00.023955+0000"\xba"2022-08-03T14:32:00.023955+0000","2022-08-03T13:52:
The lmdb version I'm using is:
Python 3.10.6
lmdb==1.4.1
uname -a
Linux vmmachine 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
I don't know if it has something to do with the SSD storage I'm using that could be faulty or some bug in the python bindings for LMDB, but certainly this is something that's randomly occurring. If I give my program a lengthy date range, I'm pretty sure it will crash again at some point when parsing retrieved data with these weird characters.
Do you have an alternate suggestion given this?