Hi All
I'm involved with a project where we're exporting information from a couple of bespoke systems for analytics purposes however there is a degree of sensitivity associated with some of the information.
The developers of the analytics system have requested 3 API associated which they would like to have for accessing objects which are as follows. The first two API's would be interactive.
1 Input (user_id, object) output bool whether a user can access the object
2 Input (user_id, (set of objects) output (set of objects a user can access)
3 Input (user_id) output (the set of objects a user can access) or possibly the set that a user can't access. (This API can be a couple of orders of magnitude slower than the previous two as it will only be provided as a response once per session.)
We have been considering a URL/URI formal for the object identification.
The object will be associated with either a set of users or groups and we're leaning towards groups. Initially there may only be two groups sensitive and general however over time a finer grained model has some longer term business appeal.
Rather than create yet another bit of bespoke infrastructure I was considering recommending that the API is implemented using LDAP.
The kicker is that there could be up to 2 or 3 hundred million objects split across 4000 or so users broken up into 500 groups. Most of these users currently exist in a Oracle backend for a bespoke app and most of the objects will be exported as rows from the database so essentially a row is an object.
The shop is a mixed windows unix/linux shop where the directory is currently AD. The internal developers are primarily MS and one has already suggested putting the objects into AD however I've said that this would probably hose operational systems and wouldn't be such a bright idea.
ADAM will be the next suggestion from the MS side and if that occurs I will be suggesting that openldap should be thrown into the mix for a bakeoff on the grounds that the best performing system should provide the service. If this comes to pass I will also be making sure that load times are in scope as well.
Based upon the information above
1 Is this completely insane.
2 Any thoughts or suggestions as to the appropriate schemas for implementing this, database backends ie BDB, HDB, malloc replacement etc. LDIF scripts for loading object would be welcome :)
On the bright side I do have a 16core sun 4600 at home for doing some initial feasibility work with 32G of RAM. In production I would expect that the limits on memory would be driven by response times and the amount of memory that AMD/intel hardware can support. I would expect that commercial support would need to occur as well however everything is too embryonic at present to push that barrow.
My background is from the unix/linux open source area however I haven't really played enough with openldap to be confident of any concepts that I have the opportunity to push.
Cheers Ian
Am Sat, 26 Mar 2011 17:29:25 +1100 schrieb Ian Willis Ian@checksum.net.au:
Hi All
I'm involved with a project where we're exporting information from a couple of bespoke systems for analytics purposes however there is a degree of sensitivity associated with some of the information.
The developers of the analytics system have requested 3 API associated which they would like to have for accessing objects which are as follows. The first two API's would be interactive.
1 Input (user_id, object) output bool whether a user can access the object
2 Input (user_id, (set of objects) output (set of objects a user can access)
3 Input (user_id) output (the set of objects a user can access) or possibly the set that a user can't access. (This API can be a couple of orders of magnitude slower than the previous two as it will only be provided as a response once per session.)
We have been considering a URL/URI formal for the object identification.
The object will be associated with either a set of users or groups and we're leaning towards groups. Initially there may only be two groups sensitive and general however over time a finer grained model has some longer term business appeal.
Rather than create yet another bit of bespoke infrastructure I was considering recommending that the API is implemented using LDAP.
The kicker is that there could be up to 2 or 3 hundred million objects split across 4000 or so users broken up into 500 groups. Most of these users currently exist in a Oracle backend for a bespoke app and most of the objects will be exported as rows from the database so essentially a row is an object.
The shop is a mixed windows unix/linux shop where the directory is currently AD. The internal developers are primarily MS and one has already suggested putting the objects into AD however I've said that this would probably hose operational systems and wouldn't be such a bright idea.
ADAM will be the next suggestion from the MS side and if that occurs I will be suggesting that openldap should be thrown into the mix for a bakeoff on the grounds that the best performing system should provide the service. If this comes to pass I will also be making sure that load times are in scope as well.
Based upon the information above
1 Is this completely insane.
Whether this is insane, you have to decide yourself. There are only few LDAP Directories that can manage such amount of entries, openLDAP can handle this, although the maximum number of entries I have ever managed where in the range of 100 million. You might consider to split the tree into 2 or 3 partitions. With regard to the so called API's, it would probably better to define proper search filters and access control. Depending on the number of connections, the bottleneck probably would be network.
2 Any thoughts or suggestions as to the appropriate schemas for implementing this, database backends ie BDB, HDB, malloc replacement etc. LDIF scripts for loading object would be welcome :)
In this early stage it it almost impossible to discuss schema and such.
On the bright side I do have a 16core sun 4600 at home for doing some initial feasibility work with 32G of RAM. In production I would expect that the limits on memory would be driven by response times and the amount of memory that AMD/intel hardware can support. I would expect that commercial support would need to occur as well however everything is too embryonic at present to push that barrow.
If you know the average size of an entry you can calculate the required disk and RAM space quite easy.
My background is from the unix/linux open source area however I haven't really played enough with openldap to be confident of any concepts that I have the opportunity to push.
Well, than it is time to get acquainted with openLDAP :-)
-Dieter
openldap-technical@openldap.org