From dimitrij.denissenko@blacksquaremedia.com Tue Jul 30 21:27:40 2013 From: dimitrij.denissenko@blacksquaremedia.com To: openldap-bugs@openldap.org Subject: Re: (ITS#7651) LMDB: Uncontrolled database when opened from multiple processes Date: Tue, 30 Jul 2013 21:27:40 +0000 Message-ID: <201307302127.r6ULReaT075775@boole.openldap.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7339489981457264112==" --===============7339489981457264112== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable --001a11c2ed64c222e504e2c14561 Content-Type: text/plain; charset=3DUTF-8 Hi, All writes occur in the parent process only. The child (normally) only reopens the environment and performs a few short reads. But, it's the actual opening of the env in the forked child that is causing the database growth. I tried to close the env straight after opening it in the child (without performing any reads), and have encountered the same issues. Hope that makes sense, Dimitrij On 30 Jul 2013 21:19, "Howard Chu" wrote: > dimitrij.denissenko@**blacksquaremedia.comwrote: > >> Full_Name: Dimitrij Denissenko >> Version: >> OS: Ubuntu 12.04 >> URL: >> Submission from: (NULL) (62.30.100.0) >> >> >> Hi, >> >> I found an interesting issue with LMDB. I have populated the DB with a >> bunch of >> records and it uses ~30M on disk (after sync). Then I added a background >> process >> to my app and populated the database again with the same record set. >> Surprisingly. the resulting size on disk was >70M. >> >> The background process is forked periodically to perform some maintenance >> tasks, >> here is my (simplified) code: >> >> /* Close env before forking */ >> mdb_env_close(env); >> >> if ((childpid =3D fork()) =3D=3D 0) { >> /* Child */ >> rc =3D mdb_env_open(env, ".", MDB_NOSYNC, 0644); >> ... >> } else { >> /* Parent */ >> rc =3D mdb_env_open(env, ".", MDB_NOSYNC, 0644); >> ... >> } >> >> I could narrow it down to the mdb_env_open call in the child. If I add >> exit(0) >> before the mdb_env_open line, the DB size remains consistently at ~30M. >> The data >> size seems to grow proportionally to the number of forks performed during >> data >> load. What could be causing the growth? What can I do to prevent it? >> >> Thanks in advance >> >> PS: I tried it with MDB_FIXMAP and without, same result. >> > > Without seeing more of your code, it's impossible to tell. Are you adding > the data on both sides of the fork? In the above code snippet, where are > your mdb_put calls occurring? Are both the parent and child processes > writing identical data? > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/**project/ > --001a11c2ed64c222e504e2c14561 Content-Type: text/html; charset=3DUTF-8 Content-Transfer-Encoding: quoted-printable

Hi,

All writes occur in the parent process only. The child (norm= =3D ally) only reopens the environment and performs a few short reads.

But, it's the actual opening of the env in the forked ch= =3D ild that is causing the database growth. I tried to close the env straight =3D after opening it in the child (without performing any reads), and have enco=3D untered the same issues.

Hope that makes sense,
Dimitrij

On 30 Jul 2013 21:19, "Howard Chu" <= =3D ;hyc(a)symas.com> wrote:
dimitrij.denissenko@blacksquaremedia.com wrote:
Full_Name: Dimitrij Denissenko
Version:
OS: Ubuntu 12.04
URL:
Submission from: (NULL) (62.30.100.0)


Hi,

I found an interesting issue with LMDB. I have populated the DB with a bunc=3D h of
records and it uses ~30M on disk (after sync). Then I added a background pr=3D ocess
to my app and populated the database again with the same record set.
Surprisingly. the resulting size on disk was >70M.

The background process is forked periodically to perform some maintenance t=3D asks,
here is my (simplified) code:

/* Close env before forking */
mdb_env_close(env);

if ((childpid =3D3D fork()) =3D3D=3D3D 0) {
=3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0/* Child */
=3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0rc =3D3D mdb_env_open(env, ".", MDB= _NOSYNC, 064=3D 4);
=3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0...
} else {
=3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0/* Parent */
=3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0rc =3D3D mdb_env_open(env, ".", MDB= _NOSYNC, 064=3D 4);
=3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0...
}

I could narrow it down to the mdb_env_open call in the child. If I add exit=3D (0)
before the mdb_env_open line, the DB size remains consistently at ~30M. The=3D data
size seems to grow proportionally to the number of forks performed during d=3D ata
load. What could be causing the growth? What can I do to prevent it?

Thanks in advance

PS: I tried it with MDB_FIXMAP and without, same result.

Without seeing more of your code, it's impossible to tell. Are you addi=3D ng the data on both sides of the fork? In the above code snippet, where are=3D your mdb_put calls occurring? Are both the parent and child processes writ=3D ing identical data?

--
=3DC2=3DA0 -- Howard Chu
=3DC2=3DA0 CTO, Symas Corp. =3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0 =3DC2=3DA0 =3DC2= =3DA0 http://www.symas.com
=3DC2=3DA0 Director, Highland Sun =3DC2=3DA0 =3DC2=3DA0 http://highlandsun.com/hyc/
=3DC2=3DA0 Chief Architect, OpenLDAP =3DC2=3DA0http://www.openldap.org/project/
--001a11c2ed64c222e504e2c14561-- --===============7339489981457264112==--