From timur.kristof@gmail.com Wed Mar 26 08:22:30 2014 From: timur.kristof@gmail.com To: openldap-bugs@openldap.org Subject: Re: (ITS#7825) LMDB: misleading error message Date: Wed, 26 Mar 2014 08:22:29 +0000 Message-ID: <201403260822.s2Q8MTtM079409@boole.openldap.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0076875886949946785==" --===============0076875886949946785== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable --001a11c13346e990f604f57e2991 Content-Type: text/plain; charset=3DUTF-8 Hi, For further information, see the discussion here: https://github.com/Venemo/node-lmdb/pull/20 We genuinely thought that there had been something wrong with our value sizes somewhere, so I wasted several hours debugging in the wrong place. Then, by accident I discovered that it's in fact an issue with V8's GC. I definitely suggest to choose one of the solutions that don't cost any execution time when there is no error. Perhaps mdb_dbi_close could return an error (or crash) when the dbi is still used by a cursor. That'd eliminate the issue but it'd also mean an API/ABI break. Best regards, Timur On Tue, Mar 25, 2014 at 10:37 AM, Hallvard Breien Furuseth < h.b.furuseth(a)usit.uio.no> wrote: > On Sat, 2014-03-22 at 22:49 +0000, timur.kristof(a)gmail.com wrote: > > When you operate on a dbi using a cursor and accidentally close the dbi > before > > committing the transaction of the cursor, mdb_txn_commit will return > > MDB_BAD_VALSIZE. This is quite misleading because the problem doesn't > actually > > have anything to do with what the error message suggests. ("Too big > key/data, > > key is empty, or wrong DUPFIXED size") > > > > I suggest to either add a new kind of error code for this situation or > to extend > > the description of MDB_BAD_VALSIZE. > > Let's see -- here are the options I can see (after an IRC chat): > > 1) Extending the MDB_BAD_VALSIZE description. Simple. However: > > This is documented as a user error, and it's one which lmdb > won't always catch. If you also called mdb_dbi_open() and the > DBI got reused, lmdb silently updates the wrong named database. > > So I'm a bit wary about both of these fixes: Documenting this > case can be misleading if it makes users expect it to be caught. > > Unless people know better - it corresponds to EBADF for using a > closed file descriptor, not caught if open() reused it. > > > 2) Catch this case and instead return a generic "something is > wrong" - MDB_BAD_TXN or MDB_INCOMPATIBLE. > And maybe assert(did not happen) to teach users to avoid this. > > Needs to be done where md_name is used: Commit/drop(named DB), > page_search(stale named DB), cursor_touch(clean named DB). Or > pass F_SUBDATA inwards and tweak the "return MDB_BAD_VALSIZE" > statements to detect this. > > After playing around a bit, I guess this is my favorite. > > > 3) Leave the misleading message alone, which can send the user > off to a wild goose chase (as described on IRC). > > > Or we can add code to also catch close followed by open: > > 4) Put hash(DB name) in MDB_db.md_pad and verify it before > overwriting a named DB. Costs execution time even when there's > no error. Must be done in commit, drop and maybe cursor_touch. > (I've pushed a demo patch for just commit.) > > 5) Add field MDB_dbx.md_dbiseq =3D DBI usage sequence number, > incremented when dbi_open creates (not reuses) a DBI and in > dbi_close. Copy it to a new malloced array txn->mt_dbiseqs[] > in write txns, verify it in at least commit and drop. > This means close()-open(same DB) will also be caught, which > seems a good thing since with (4) it will work unreliably: > Only if open() reuses the same DBI for the same DB. > --001a11c13346e990f604f57e2991 Content-Type: text/html; charset=3DUTF-8 Content-Transfer-Encoding: quoted-printable
On Sat, 2014-03-22 at 22:49 +0000, timur.kristof(a)gmail.com wrote:<= br> > When you operate on a dbi using a cursor and accidentally close the db=3D i before
> committing the transaction of the cursor, mdb_txn_commit will return > MDB_BAD_VALSIZE. This is quite misleading because the problem doesn=3D 9;t actually
> have anything to do with what the error message suggests. ("Too b=3D ig key/data,
> key is empty, or wrong DUPFIXED size")
>
> I suggest to either add a new kind of error code for this situation or=3D to extend
> the description of MDB_BAD_VALSIZE.
Let's see -- here are the options I can see (after an IRC chat):
1) Extending the MDB_BAD_VALSIZE description. Simple. However:
This is documented as a user error, and it's one which lmdb
won't always catch. =3DC2=3DA0If you also called mdb_dbi_open() and the DBI got reused, lmdb silently updates the wrong named database.
So I'm a bit wary about both of these fixes: Documenting this
case can be misleading if it makes users expect it to be caught.
Unless people know better - it corresponds to EBADF for using a
closed file descriptor, not caught if open() reused it.
2) Catch this case and instead return a generic "something is
wrong" - MDB_BAD_TXN or MDB_INCOMPATIBLE.
And maybe assert(did not happen) to teach users to avoid this.
Needs to be done where md_name is used: Commit/drop(named DB),
page_search(stale named DB), cursor_touch(clean named DB). =3DC2=3DA0Or
pass F_SUBDATA inwards and tweak the "return MDB_BAD_VALSIZE"
statements to detect this.
After playing around a bit, I guess this is my favorite.
3) Leave the misleading message alone, which can send the user
off to a wild goose chase (as described on IRC).
Or we can add code to also catch close followed by open:
4) Put hash(DB name) in MDB_db.md_pad and verify it before
overwriting a named DB. =3DC2=3DA0Costs execution time even when there's<= br=3D > no error. =3DC2=3DA0Must be done in commit, drop and maybe cursor_touch.
(I've pushed a demo patch for just commit.)
5) Add field MDB_dbx.md_dbiseq =3D3D DBI usage sequence number,
incremented when dbi_open creates (not reuses) a DBI and in
dbi_close. =3DC2=3DA0Copy it to a new malloced array txn->mt_dbiseqs[]
in write txns, verify it in at least commit and drop.
This means close()-open(same DB) will also be caught, which
seems a good thing since with (4) it will work unreliably:
Only if open() reuses the same DBI for the same DB.