NFS-shared LMDB?

List overview All Threads
Download

newer

older

LMDB proposed changes

Time fo another RE24 testing call...

Howard Chu

9 Jul 2013 9 Jul '13

5:18 p.m.

It occurs to me that there is the potential to support an interesting use case with LMDB when the database resides on remote shared storage. In the context of slapd, you could run multiple read-only slapds concurrent with a single read-write slapd on a single database.

The current liblmdb would need a couple small modifications to make this safe - an option to use fcntl(LOCK) when obtaining a reader slot, and an msync() when writing to a reader slot, to force reader lock table changes back to the server before progressing on a read txn.

With an appropriate sharding director (like the feature recently added to back-meta) you could arrange so that each slapd instance serves reads for a distinct portion of the overall database. Then each host's memory would be caching a distinct set of data, maximizing cache effectiveness. The DB size could then grow arbitrarily large, and you simply add more machines/RAM/slapds as needed to keep serving from cache.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Show replies by date

Hallvard Breien Furuseth

9 Jul 9 Jul

6:40 p.m.

Howard Chu writes:

...

It occurs to me that there is the potential to support an interesting use case with LMDB when the database resides on remote shared storage. In the context of slapd, you could run multiple read-only slapds concurrent with a single read-write slapd on a single database.

Not quite... only slap tools open an MDB (or BDB) environment in read-only mode, as far as I can tell. slapd always opens read/write. The "readonly" slapd.conf option only restricts LDAP operations.

Also there's the issue of agreeing who gets to create (and maybe reset?) a lockfile. IIRC that's where people use mkdir for atomic NFS behavior, unless modern NFS fixes that. Though maybe it's enough to omit O_CREAT for the the lockfile in the read-only slapds, if that gets supported.

...

The current liblmdb would need a couple small modifications to make this safe

an option to use fcntl(LOCK) when obtaining a reader slot, and an msync()

when writing to a reader slot, to force reader lock table changes back to the server before progressing on a read txn.

And maybe another sync call for the writer to pick the change up before progressing? OTOH readers may not need to await the msync() if an older reader in the same process is still live.

-- Hallvard

Howard Chu

9:08 p.m.

Hallvard Breien Furuseth wrote:

...

Howard Chu writes:

...
It occurs to me that there is the potential to support an interesting use case with LMDB when the database resides on remote shared storage. In the context of slapd, you could run multiple read-only slapds concurrent with a single read-write slapd on a single database.

Not quite... only slap tools open an MDB (or BDB) environment in read-only mode, as far as I can tell. slapd always opens read/write. The "readonly" slapd.conf option only restricts LDAP operations.

We can certainly change this for back-mdb if desired. Add a new config keyword for this purpose, etc.

...

Also there's the issue of agreeing who gets to create (and maybe reset?) a lockfile. IIRC that's where people use mkdir for atomic NFS behavior, unless modern NFS fixes that. Though maybe it's enough to omit O_CREAT for the the lockfile in the read-only slapds, if that gets supported.

I would expect the single writing slapd to do all environment initialization. A reading slapd would require the environment to already exist.

...

...
The current liblmdb would need a couple small modifications to make this safe

an option to use fcntl(LOCK) when obtaining a reader slot, and an msync()

when writing to a reader slot, to force reader lock table changes back to the server before progressing on a read txn.

And maybe another sync call for the writer to pick the change up before progressing? OTOH readers may not need to await the msync() if an older reader in the same process is still live.

Yeah I don't think that would be necessary.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Quanah Gibson-Mount

9:45 p.m.

--On Tuesday, July 09, 2013 12:08 PM -0700 Howard Chu hyc@symas.com wrote:

...

Hallvard Breien Furuseth wrote:

...
Howard Chu writes:

...
It occurs to me that there is the potential to support an interesting use case with LMDB when the database resides on remote shared storage. In the context of slapd, you could run multiple read-only slapds concurrent with a single read-write slapd on a single database.

Not quite... only slap tools open an MDB (or BDB) environment in read-only mode, as far as I can tell. slapd always opens read/write. The "readonly" slapd.conf option only restricts LDAP operations.

We can certainly change this for back-mdb if desired. Add a new config keyword for this purpose, etc.

...
Also there's the issue of agreeing who gets to create (and maybe reset?) a lockfile. IIRC that's where people use mkdir for atomic NFS behavior, unless modern NFS fixes that. Though maybe it's enough to omit O_CREAT for the the lockfile in the read-only slapds, if that gets supported.

I would expect the single writing slapd to do all environment initialization. A reading slapd would require the environment to already exist.

So the downside would be single point of failure for writes? I.e., if the system with the slapd configured for doing writes went down due to hardware or power issues, you'd need to configure one of the other slapds to accept writes, and then update all the clients to use that server.

--Quanah

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

Howard Chu

10:25 p.m.

Quanah Gibson-Mount wrote:

...

--On Tuesday, July 09, 2013 12:08 PM -0700 Howard Chu hyc@symas.com wrote:

...
Hallvard Breien Furuseth wrote:

...
Howard Chu writes:

...
It occurs to me that there is the potential to support an interesting use case with LMDB when the database resides on remote shared storage. In the context of slapd, you could run multiple read-only slapds concurrent with a single read-write slapd on a single database.

Not quite... only slap tools open an MDB (or BDB) environment in read-only mode, as far as I can tell. slapd always opens read/write. The "readonly" slapd.conf option only restricts LDAP operations.

We can certainly change this for back-mdb if desired. Add a new config keyword for this purpose, etc.

...
Also there's the issue of agreeing who gets to create (and maybe reset?) a lockfile. IIRC that's where people use mkdir for atomic NFS behavior, unless modern NFS fixes that. Though maybe it's enough to omit O_CREAT for the the lockfile in the read-only slapds, if that gets supported.

I would expect the single writing slapd to do all environment initialization. A reading slapd would require the environment to already exist.

So the downside would be single point of failure for writes? I.e., if the system with the slapd configured for doing writes went down due to hardware or power issues, you'd need to configure one of the other slapds to accept writes, and then update all the clients to use that server.

Yes. We could do this fairly transparently using something like the chaining overlay. Have it identically configured on all servers, with a prioritized list of write masters. (Could do this to make mirrormode easier to setup too.) If the current node is the write master, allow write ops thru, otherwise chain them to the current write master. Then clients can send ops to any server they want.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Quanah Gibson-Mount

10:28 p.m.

--On Tuesday, July 09, 2013 1:25 PM -0700 Howard Chu hyc@symas.com wrote:

...

...
So the downside would be single point of failure for writes? I.e., if the system with the slapd configured for doing writes went down due to hardware or power issues, you'd need to configure one of the other slapds to accept writes, and then update all the clients to use that server.

Yes. We could do this fairly transparently using something like the chaining overlay. Have it identically configured on all servers, with a prioritized list of write masters. (Could do this to make mirrormode easier to setup too.) If the current node is the write master, allow write ops thru, otherwise chain them to the current write master. Then clients can send ops to any server they want.

Sounds cool to me. :)

--Quanah

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

Howard Chu

10:49 p.m.

Quanah Gibson-Mount wrote:

...

--On Tuesday, July 09, 2013 1:25 PM -0700 Howard Chu hyc@symas.com wrote:

...
...
So the downside would be single point of failure for writes? I.e., if the system with the slapd configured for doing writes went down due to hardware or power issues, you'd need to configure one of the other slapds to accept writes, and then update all the clients to use that server.

Yes. We could do this fairly transparently using something like the chaining overlay. Have it identically configured on all servers, with a prioritized list of write masters. (Could do this to make mirrormode easier to setup too.) If the current node is the write master, allow write ops thru, otherwise chain them to the current write master. Then clients can send ops to any server they want.

Sounds cool to me. :)

We should probably overhaul the connection manager and threadpool before this, otherwise chaining overhead will be too bad.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Quanah Gibson-Mount

10:57 p.m.

--On Tuesday, July 09, 2013 1:49 PM -0700 Howard Chu hyc@symas.com wrote:

...

Quanah Gibson-Mount wrote:

...
--On Tuesday, July 09, 2013 1:25 PM -0700 Howard Chu hyc@symas.com wrote:

...
...
So the downside would be single point of failure for writes? I.e., if the system with the slapd configured for doing writes went down due to hardware or power issues, you'd need to configure one of the other slapds to accept writes, and then update all the clients to use that server.

Yes. We could do this fairly transparently using something like the chaining overlay. Have it identically configured on all servers, with a prioritized list of write masters. (Could do this to make mirrormode easier to setup too.) If the current node is the write master, allow write ops thru, otherwise chain them to the current write master. Then clients can send ops to any server they want.

Sounds cool to me. :)

We should probably overhaul the connection manager and threadpool before this, otherwise chaining overhead will be too bad.

Sounds like we have a 2.5 roadmap then.

--Quanah

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

Hallvard Breien Furuseth

10 Jul 10 Jul

9:46 a.m.

Howard Chu writes:

...

It occurs to me that there is the potential to support an interesting use case with LMDB when the database resides on remote shared storage. In the context of slapd, you could run multiple read-only slapds concurrent with a single read-write slapd on a single database.

I'll dig up the stale-reader detection work. Wouldn't want a crashed reader machine to cause unconstrained DB growth.

-- Hallvard

4376

Age (days ago)

4377

Last active (days ago)

openldap-devel@openldap.org

8 comments

3 participants

tags (0)

participants (3)

Hallvard Breien Furuseth
Howard Chu
Quanah Gibson-Mount