ordered indexing for integers

List overview All Threads
Download

newer

older

LDAP/Samba 4 summary

RE24 test008 - invalid credentials

Howard Chu

20 Nov 2007 20 Nov '07

6:57 a.m.

I was looking at adding support for ordered indexing for Integer attributes. This would be an incompatible format change for index databases. In fact I'd need to change the Presence index key as well, so it would affect all index databases, not just those for Integer attributes.

Currently the Presence index uses a hardcoded 4 byte key of 0x00000001. I want to change it to a 2 byte key of 0x0000 instead, to prevent it from colliding with the Integer key space.

So far, 2.4 and 2.3 have totally identical database formats. Is it worthwhile to break this compatibility to gain this feature, or better to preserve compatibility and ignore this feature for now? Any thoughts on going ahead with it here in RE24?

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Show replies by date

Quanah Gibson-Mount

20 Nov 20 Nov

7:40 a.m.

--On Tuesday, November 20, 2007 6:57 AM -0800 Howard Chu hyc@symas.com wrote:

...

I was looking at adding support for ordered indexing for Integer attributes. This would be an incompatible format change for index databases. In fact I'd need to change the Presence index key as well, so it would affect all index databases, not just those for Integer attributes.

Currently the Presence index uses a hardcoded 4 byte key of 0x00000001. I want to change it to a 2 byte key of 0x0000 instead, to prevent it from colliding with the Integer key space.

So far, 2.4 and 2.3 have totally identical database formats. Is it worthwhile to break this compatibility to gain this feature, or better to preserve compatibility and ignore this feature for now? Any thoughts on going ahead with it here in RE24?

I think every other release has had incompatible changes, and if we are contemplating one for 2.4, we need to do it sooner rather than later. Since 2.3.39 was just marked stable, I'm guessing not too many people have migrated to 2.4.6 so far. We'd definitely want to note highlight this change when pushing 2.4.7. As for the feature, I think it sounds quite useful, I've wanted to do ordered integer searches in the past...

--Quanah

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

Aaron Richton

8 a.m.

Agreed that this would be a useful addition (I've run into/designed around this a couple times). To answer "is it a big deal that we lose binary compat with RE23," I don't consider that a strong issue at all. We always tell people to slapcat/slapadd as part of a release change, and this would be covered in that recommendation. My bigger concern would be what the 2.4.6 -> 2.4.7 (or whenever this hits) procedure would look like. i.e., are you proposing dropping read support for the RE23-style databases, so that a 2.4.6 -> 2.4.7 would require a 2.4.6 slapcat? Users might not be used to that.

On Tue, 20 Nov 2007, Quanah Gibson-Mount wrote:

...

--On Tuesday, November 20, 2007 6:57 AM -0800 Howard Chu hyc@symas.com wrote:

...
I was looking at adding support for ordered indexing for Integer attributes. This would be an incompatible format change for index databases. In fact I'd need to change the Presence index key as well, so it would affect all index databases, not just those for Integer attributes.

Currently the Presence index uses a hardcoded 4 byte key of 0x00000001. I want to change it to a 2 byte key of 0x0000 instead, to prevent it from colliding with the Integer key space.

So far, 2.4 and 2.3 have totally identical database formats. Is it worthwhile to break this compatibility to gain this feature, or better to preserve compatibility and ignore this feature for now? Any thoughts on going ahead with it here in RE24?

I think every other release has had incompatible changes, and if we are contemplating one for 2.4, we need to do it sooner rather than later. Since 2.3.39 was just marked stable, I'm guessing not too many people have migrated to 2.4.6 so far. We'd definitely want to note highlight this change when pushing 2.4.7. As for the feature, I think it sounds quite useful, I've wanted to do ordered integer searches in the past...

--Quanah

--

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc

Zimbra :: the leader in open source messaging and collaboration

Howard Chu

8:05 a.m.

Aaron Richton wrote:

...

Agreed that this would be a useful addition (I've run into/designed around this a couple times). To answer "is it a big deal that we lose binary compat with RE23," I don't consider that a strong issue at all. We always tell people to slapcat/slapadd as part of a release change, and this would be covered in that recommendation. My bigger concern would be what the 2.4.6 -> 2.4.7 (or whenever this hits) procedure would look like. i.e., are you proposing dropping read support for the RE23-style databases, so that a 2.4.6 -> 2.4.7 would require a 2.4.6 slapcat? Users might not be used to that.

Yes, dropping support for RE23 format index files.

In fact you would only need to run slapindex to regenerate them, and it would only be needed if you have presence indexing enabled on general attributes, or equality indexing on integer attributes. Since presence indexing is typically not too useful to begin with, I think this is a small enough issue that we can cover it in the 2.4.7 release announcement.

...

On Tue, 20 Nov 2007, Quanah Gibson-Mount wrote:

...
--On Tuesday, November 20, 2007 6:57 AM -0800 Howard Chu hyc@symas.com wrote:

...
I was looking at adding support for ordered indexing for Integer attributes. This would be an incompatible format change for index databases. In fact I'd need to change the Presence index key as well, so it would affect all index databases, not just those for Integer attributes.

Currently the Presence index uses a hardcoded 4 byte key of 0x00000001. I want to change it to a 2 byte key of 0x0000 instead, to prevent it from colliding with the Integer key space.

So far, 2.4 and 2.3 have totally identical database formats. Is it worthwhile to break this compatibility to gain this feature, or better to preserve compatibility and ignore this feature for now? Any thoughts on going ahead with it here in RE24?

I think every other release has had incompatible changes, and if we are contemplating one for 2.4, we need to do it sooner rather than later. Since 2.3.39 was just marked stable, I'm guessing not too many people have migrated to 2.4.6 so far. We'd definitely want to note highlight this change when pushing 2.4.7. As for the feature, I think it sounds quite useful, I've wanted to do ordered integer searches in the past...

--Quanah

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Pierangelo Masarati

8:04 a.m.

Quanah Gibson-Mount wrote:

...

--On Tuesday, November 20, 2007 6:57 AM -0800 Howard Chu hyc@symas.com wrote:

...
I was looking at adding support for ordered indexing for Integer attributes. This would be an incompatible format change for index databases. In fact I'd need to change the Presence index key as well, so it would affect all index databases, not just those for Integer attributes.

Currently the Presence index uses a hardcoded 4 byte key of 0x00000001. I want to change it to a 2 byte key of 0x0000 instead, to prevent it from colliding with the Integer key space.

So far, 2.4 and 2.3 have totally identical database formats. Is it worthwhile to break this compatibility to gain this feature, or better to preserve compatibility and ignore this feature for now? Any thoughts on going ahead with it here in RE24?

I think every other release has had incompatible changes, and if we are contemplating one for 2.4, we need to do it sooner rather than later. Since 2.3.39 was just marked stable, I'm guessing not too many people have migrated to 2.4.6 so far. We'd definitely want to note highlight this change when pushing 2.4.7. As for the feature, I think it sounds quite useful, I've wanted to do ordered integer searches in the past...

I agree this feature could be very useful and at this time 2.4 should be early enough in its life to undergo this change. Otherwise, we'd have to move to 2.5 without dramatic improvements but an incompatible database format change.

Ing. Pierangelo Masarati OpenLDAP Core Team

SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------

Hallvard B Furuseth

8:09 a.m.

Quanah Gibson-Mount writes:

...

I think every other release has had incompatible changes, and if we are contemplating one for 2.4, we need to do it sooner rather than later.

Yes. And I'd like to see this.

If incompatibility worries people even as early as this in 2.4's life, we could leave the old format as the default in 2.4, and provide a slapd.conf-option to enable ordered indexing.

Howard Chu writes:

...

Currently the Presence index uses a hardcoded 4 byte key of 0x00000001. I want to change it to a 2 byte key of 0x0000 instead, to prevent it from colliding with the Integer key space.

What representation are you planning? My thoughts about it tend towards DER-inspired formats, something like this:

for values small enough to keep the exact value in the index: <0x80 + sign*(#bytes needed)> <those bytes of 2's complement value> for larger absoulte values: same, but for the sake of speed read the value as hex, so we only need read the first digits (skipping initial zeros). for huge values, something like: (value < 0 ? 0 : 255), <0x8000... + sign*#length octets> <sign*length...> <initial part of two's complement value-as-hex>

Too complicated?

...

So far, 2.4 and 2.3 have totally identical database formats. Is it worthwhile to break this compatibility to gain this feature, or better to preserve compatibility and ignore this feature for now? Any thoughts on going ahead with it here in RE24?

I think it's worth it. But it would be possible to support both too (at least until 2.5). Could make the max size of a key configurable, and only use this format if that option is set.

-- Regards, Hallvard

Hallvard B Furuseth

8:11 a.m.

Quanah Gibson-Mount writes:

...

I think every other release has had incompatible changes, and if we are contemplating one for 2.4, we need to do it sooner rather than later.

Yes. And I'd like to see this.

If incompatibility worries people even as early as this in 2.4's life, we could leave the old format as the default in 2.4, and provide a slapd.conf-option to enable ordered indexing.

Howard Chu writes:

...

Currently the Presence index uses a hardcoded 4 byte key of 0x00000001. I want to change it to a 2 byte key of 0x0000 instead, to prevent it from colliding with the Integer key space.

What representation are you planning? My thoughts about it tend towards DER-inspired formats, something like this:

(And if there is going to be a slapd.conf-option, that could just as well be an option with the max index-key size.)

Too complicated?

-- Regards, Hallvard

Hallvard B Furuseth

8:21 a.m.

I wrote:

...

for larger absoulte values: same, but for the sake of speed read the value as hex, so we only need read the first digits (skipping initial zeros).

whoops, I meant only the first bytes of the value-as-hex will be included in the index key.

-- Hallvard

Howard Chu

8:32 a.m.

Hallvard B Furuseth wrote:

...

Quanah Gibson-Mount writes:

...
I think every other release has had incompatible changes, and if we are contemplating one for 2.4, we need to do it sooner rather than later.

Yes. And I'd like to see this.

If incompatibility worries people even as early as this in 2.4's life, we could leave the old format as the default in 2.4, and provide a slapd.conf-option to enable ordered indexing.

OK, I guess we can add a config option to switch formats.

...

Howard Chu writes:

...
Currently the Presence index uses a hardcoded 4 byte key of 0x00000001. I want to change it to a 2 byte key of 0x0000 instead, to prevent it from colliding with the Integer key space.

What representation are you planning? My thoughts about it tend towards DER-inspired formats, something like this:

I was thinking of a fixed word size. Anything with variable length will require a custom B-tree sort function, and we really don't want to go there.

8-byte keys representing 64 bit integers would do. (In fact we could leave the presence index unchanged in that case.) We'd have to use a negated 2's complement representation, since the B-tree sorts keys as unsigned characters.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

9:54 a.m.

Howard Chu writes:

...

I was thinking of a fixed word size. Anything with variable length will require a custom B-tree sort function, and we really don't want to go there. 8-byte keys representing 64 bit integers would do. (In fact we could leave the presence index unchanged in that case.) We'd have to use a negated 2's complement representation, since the B-tree sorts keys as unsigned characters.

value + 1<<63, not negated. (Otherwise we get -2 > -1 or something.) And a min and max value for huge integers.

-- Hallvard

Michael Ströder

21 Nov 21 Nov

2:05 a.m.

Howard Chu wrote:

...

...
If incompatibility worries people even as early as this in 2.4's life, we could leave the old format as the default in 2.4, and provide a slapd.conf-option to enable ordered indexing.

OK, I guess we can add a config option to switch formats.

For how long will this database backward compability be guaranteed? Will this config switch affect other changes in database format? Don't open this can of worms...

Since others also agreed that 2.4.7 is still early enough I'd prefer to avoid another config option. But 2.4.7 should released during next two weeks for introducing this change early enough.

IIRC in some former days database format changed frequently even within a media release series (don't remember which one). IMO migration from 2.3.x to 2.4.7+ is a significant update with lots of changes. slapcat/slapadd seems a reasonable migration action anyway.

Ciao, Michael.

-- Michael Ströder E-Mail: michael@stroeder.com http://www.stroeder.com

Howard Chu

2:15 a.m.

Michael Ströder wrote:

...

Howard Chu wrote:

...
...
If incompatibility worries people even as early as this in 2.4's life, we could leave the old format as the default in 2.4, and provide a slapd.conf-option to enable ordered indexing.

OK, I guess we can add a config option to switch formats.

For how long will this database backward compability be guaranteed? Will this config switch affect other changes in database format? Don't open this can of worms...

Nothing else is affected, just the Equality index for Integer attributes.

...

Since others also agreed that 2.4.7 is still early enough I'd prefer to avoid another config option. But 2.4.7 should released during next two weeks for introducing this change early enough.

We can make it the default and remove the config option. I'm fine either way. Currently the default is still the old hash format, but it's easy enough to remove the option.

...

IIRC in some former days database format changed frequently even within a media release series (don't remember which one). IMO migration from 2.3.x to 2.4.7+ is a significant update with lots of changes. slapcat/slapadd seems a reasonable migration action anyway.

Well, from 2.3 to 2.4 a change wouldn't be very surprising. This is 2.4.6 to 2.4.7, which may be more surprising.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Ralf Haferkamp

2:41 a.m.

On Mittwoch, 21. November 2007, Howard Chu wrote:

...

Michael Ströder wrote:

...
Howard Chu wrote:

...
...
If incompatibility worries people even as early as this in 2.4's life, we could leave the old format as the default in 2.4, and provide a slapd.conf-option to enable ordered indexing.

OK, I guess we can add a config option to switch formats.

For how long will this database backward compability be guaranteed? Will this config switch affect other changes in database format? Don't open this can of worms...

Nothing else is affected, just the Equality index for Integer attributes.

...
Since others also agreed that 2.4.7 is still early enough I'd prefer to avoid another config option. But 2.4.7 should released during next two weeks for introducing this change early enough.

We can make it the default and remove the config option. I'm fine either way. Currently the default is still the old hash format, but it's easy enough to remove the option.

...
IIRC in some former days database format changed frequently even within a media release series (don't remember which one). IMO migration from 2.3.x to 2.4.7+ is a significant update with lots of changes. slapcat/slapadd seems a reasonable migration action anyway.

Well, from 2.3 to 2.4 a change wouldn't be very surprising. This is 2.4.6 to 2.4.7, which may be more surprising.

I think we don't have too much 2.4.6 users currently and if we communicate the change clearly enough we can afford such a change. I think similar to Michael here, let's avoid the additional config setting and make the new index format the default. My reason for this is, that the option only really makes sense when using old 2.3 (and 2.4.6) databases, the options is of very low values when creating new databases, I guess. (And you would have to explicitly set it for a new database to be able to use ordered indexes).

-- Ralf

Howard Chu

3:14 a.m.

Ralf Haferkamp wrote:

...

...
...
Since others also agreed that 2.4.7 is still early enough I'd prefer to avoid another config option. But 2.4.7 should released during next two weeks for introducing this change early enough.

We can make it the default and remove the config option. I'm fine either way. Currently the default is still the old hash format, but it's easy enough to remove the option.

I partially take this back - the config option also lets you set the size of the index keys to use, and that flexibility is probably still useful.

...

I think we don't have too much 2.4.6 users currently and if we communicate the change clearly enough we can afford such a change. I think similar to Michael here, let's avoid the additional config setting and make the new index format the default. My reason for this is, that the option only really makes sense when using old 2.3 (and 2.4.6) databases, the options is of very low values when creating new databases, I guess. (And you would have to explicitly set it for a new database to be able to use ordered indexes).

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Aaron Richton

6:20 a.m.

...

...
Well, from 2.3 to 2.4 a change wouldn't be very surprising. This is 2.4.6 to 2.4.7, which may be more surprising.

I think we don't have too much 2.4.6 users currently and if we communicate the change clearly enough we can afford such a change. I think similar to Michael here, let's avoid the additional config setting and make the new index format the default. My reason for this is, that the option only really makes sense when using old 2.3 (and 2.4.6) databases, the options is of very low values when creating new databases, I guess. (And you would have to explicitly set it for a new database to be able to use ordered indexes).

It may be surprising in that it's not in keeping with RE22/23 practice, but there are a lot of benefits here. If this is to make it into RE24 at all, I'd want it to get into the train as early as possible (2.4.7?) and agree with it being a default. Of course there will be no bugs -- but if there are, I'd like them seen at more sites as they continue their 2.4 installations. As Ralf pointed out, 2.4 isn't often in production, so if this is introduced as default in 2.4.7 we'll probably get a lot of free testing from sites evaluating for RE23->RE24 transition.

Yes, this might be a bit painful for the early adopters. Is there any sort of magic number or similar where slapd could bail out "sorry, please slapindex" if given a RE23 format database? Then at least the FAQ-O-Matic could point out "use 2.4.6 slapcat then upgrade again" or some other concrete procedure.

Howard Chu

7 a.m.

Aaron Richton wrote:

...

...
...
Well, from 2.3 to 2.4 a change wouldn't be very surprising. This is 2.4.6 to 2.4.7, which may be more surprising.

I think we don't have too much 2.4.6 users currently and if we communicate the change clearly enough we can afford such a change. I think similar to Michael here, let's avoid the additional config setting and make the new index format the default. My reason for this is, that the option only really makes sense when using old 2.3 (and 2.4.6) databases, the options is of very low values when creating new databases, I guess. (And you would have to explicitly set it for a new database to be able to use ordered indexes).

It may be surprising in that it's not in keeping with RE22/23 practice, but there are a lot of benefits here. If this is to make it into RE24 at all, I'd want it to get into the train as early as possible (2.4.7?) and agree with it being a default. Of course there will be no bugs -- but if there are, I'd like them seen at more sites as they continue their 2.4 installations. As Ralf pointed out, 2.4 isn't often in production, so if this is introduced as default in 2.4.7 we'll probably get a lot of free testing from sites evaluating for RE23->RE24 transition.

OK, I guess it's fine to make it the default. However, because I think we still want the key size to be tunable, I'm going to keep the config keyword.

...

Yes, this might be a bit painful for the early adopters. Is there any sort of magic number or similar where slapd could bail out "sorry, please slapindex" if given a RE23 format database? Then at least the FAQ-O-Matic could point out "use 2.4.6 slapcat then upgrade again" or some other concrete procedure.

In this case, all you'd need is to run "slapindex -qt" to recreate the relevant indices. (Whether that's actually faster than slapcat/slapadd, I don't really know. Historically it's been pretty slow...)

I guess we can detect an old format index when it gets referenced, but currently we don't reference them until an actual search operation comes along. I.e., it's not something we would normally check at startup time, and it would be pretty awkward at runtime.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Ralf Haferkamp

7:47 a.m.

On Mittwoch, 21. November 2007, Howard Chu wrote:

...

Aaron Richton wrote:

[..]

...

...
Yes, this might be a bit painful for the early adopters. Is there any sort of magic number or similar where slapd could bail out "sorry, please slapindex" if given a RE23 format database? Then at least the FAQ-O-Matic could point out "use 2.4.6 slapcat then upgrade again" or some other concrete procedure.

In this case, all you'd need is to run "slapindex -qt" to recreate the relevant indices. (Whether that's actually faster than slapcat/slapadd, I don't really know. Historically it's been pretty slow...)

I guess we can detect an old format index when it gets referenced, but currently we don't reference them until an actual search operation comes along. I.e., it's not something we would normally check at startup time, and it would be pretty awkward at runtime.

On the other hand we could even let the runtime indexer task recreate the indexes on the fly when such an old index is discovered. ;-)

-- Ralf

Michael Ströder

8:02 a.m.

Ralf Haferkamp wrote:

...

On the other hand we could even let the runtime indexer task recreate the indexes on the fly when such an old index is discovered. ;-)

...

From an operational perspective I'm always a bit scared of things

automagically happening during run-time (or even during startup). Although this sounds appealing at first it might be a nightmare in large deployments.

Ciao, Michael.

Ralf Haferkamp

8:16 a.m.

On Mittwoch, 21. November 2007, Michael Ströder wrote:

...

Ralf Haferkamp wrote:

...
On the other hand we could even let the runtime indexer task recreate the indexes on the fly when such an old index is discovered. ;-)

From an operational perspective I'm always a bit scared of things automagically happening during run-time (or even during startup). Although this sounds appealing at first it might be a nightmare in large deployments.

Yes, I agree. I am not a big fan of too much automagic action either. My suggestion wasn't meant too seriously. It's just interesting what kind of things can be done with the dynamic configuration support.

-- Ralf

Howard Chu

8:07 a.m.

Ralf Haferkamp wrote:

...

On Mittwoch, 21. November 2007, Howard Chu wrote:

...
Aaron Richton wrote:

[..]

...
...
Yes, this might be a bit painful for the early adopters. Is there any sort of magic number or similar where slapd could bail out "sorry, please slapindex" if given a RE23 format database? Then at least the FAQ-O-Matic could point out "use 2.4.6 slapcat then upgrade again" or some other concrete procedure.

In this case, all you'd need is to run "slapindex -qt" to recreate the relevant indices. (Whether that's actually faster than slapcat/slapadd, I don't really know. Historically it's been pretty slow...)

I guess we can detect an old format index when it gets referenced, but currently we don't reference them until an actual search operation comes along. I.e., it's not something we would normally check at startup time, and it would be pretty awkward at runtime.

On the other hand we could even let the runtime indexer task recreate the indexes on the fly when such an old index is discovered. ;-)

Yes, I guess so. Things will be very slow though; queries against that attribute will have to behave as un-indexed until the indexer task finishes. The existing hash keys won't match the new keys so they'll be useless and need to be deleted. (But currently the runtime indexer only adds new data, it doesn't delete old index data.)

Yeah, I agree with Michael, an automagic action here may get really ugly.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Michael Ströder

8 a.m.

Howard Chu wrote:

...

OK, I guess it's fine to make it the default. However, because I think we still want the key size to be tunable, I'm going to keep the config keyword. [..] In this case, all you'd need is to run "slapindex -qt" to recreate the relevant indices.

Any implications with dynamic reconfiguration of that particular config value in back-config? Automagically trigger reindexing?

Ciao, Michael.

Howard Chu

8:13 a.m.

Michael Ströder wrote:

...

Howard Chu wrote:

...
OK, I guess it's fine to make it the default. However, because I think we still want the key size to be tunable, I'm going to keep the config keyword. [..] In this case, all you'd need is to run "slapindex -qt" to recreate the relevant indices.

Any implications with dynamic reconfiguration of that particular config value in back-config? Automagically trigger reindexing?

Unfortunately no, not yet. There's a hook for backends to register that they're interested in updates to particular config settings, but we never figured out when it would be needed, so that functionality wasn't fleshed out.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Michael Ströder

4:38 a.m.

Howard Chu wrote:

...

Michael Ströder wrote:

...
IIRC in some former days database format changed frequently even within a media release series (don't remember which one). IMO migration from 2.3.x to 2.4.7+ is a significant update with lots of changes. slapcat/slapadd seems a reasonable migration action anyway.

Well, from 2.3 to 2.4 a change wouldn't be very surprising. This is 2.4.6 to 2.4.7, which may be more surprising.

I share the opinions of others that 2.4.6 is probably not widely deployed yet. Therefore I believe if 2.4.7 comes really soon for most deployers it would a migration from 2.3.x to 2.4.7+.

Ciao, Michael.

Howard Chu

20 Nov 20 Nov

8:54 a.m.

Hallvard B Furuseth wrote:

...

What representation are you planning? My thoughts about it tend towards DER-inspired formats, something like this:

for values small enough to keep the exact value in the index: <0x80 + sign*(#bytes needed)> <those bytes of 2's complement value> for larger absoulte values: same, but for the sake of speed read the value as hex, so we only need read the first digits (skipping initial zeros). for huge values, something like: (value < 0 ? 0 : 255), <0x8000... + sign*#length octets> <sign*length...> <initial part of two's complement value-as-hex>

(And if there is going to be a slapd.conf-option, that could just as well be an option with the max index-key size.)

Too complicated?

I guess you're really describing something like floating-point representation, except our exponents are always positive. That might be the better way to go.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

9:50 a.m.

Howard Chu writes:

...

I guess you're really describing something like floating-point representation, except our exponents are always positive.

A mixture and floating point and DER, I suppose. I've hardly looked inside a floting-point number for decades. Fixed length like you say and "exponent" = "length it would have if represented exactly", as with floating-point. But with a lenght-of-length component and counted in octets, not bits, as in DER. And I don't see much point in doing the base-10 multiplication or division needed for true floating point. Unless floating-point hardware is sufficiently magical nowadays that it's an advantage to make use of it. I don't have a clue myself.

I don't know if anyone uses huge enough integers in LDAP that it's a gain to avoid base10 conversion, but I don't know they don't either. Probably a bad idea now that I think of it, in case one later moves to an an LDAP/X.500-server with ASN.1 encoding, which will then have to convert between binary and decimal anyway.

Another point with huge integers is that one might store a group of them with small differences, and want an equality index to tell them apart. For that an index of <length, first digits, last digits> could be useful. Then the inequality index and inequality filter for a huge value would need to generate different index keys though. (The filter would need to replace the "last digits" part of huge values with 0x00... or 0xff...) I don't know if OpenLDAP's indexing supports that.

In any case, for similar reasons it might be an idea to keep supporting the old hash(decimal value) index format.

-- Hallvard

Howard Chu

3:54 p.m.

Hallvard B Furuseth wrote:

...

Howard Chu writes:

...
I guess you're really describing something like floating-point representation, except our exponents are always positive.

A mixture and floating point and DER, I suppose. I've hardly looked inside a floting-point number for decades. Fixed length like you say and "exponent" = "length it would have if represented exactly", as with floating-point. But with a lenght-of-length component and counted in octets, not bits, as in DER. And I don't see much point in doing the base-10 multiplication or division needed for true floating point. Unless floating-point hardware is sufficiently magical nowadays that it's an advantage to make use of it. I don't have a clue myself.

I don't know if anyone uses huge enough integers in LDAP that it's a gain to avoid base10 conversion, but I don't know they don't either. Probably a bad idea now that I think of it, in case one later moves to an an LDAP/X.500-server with ASN.1 encoding, which will then have to convert between binary and decimal anyway.

Right, better to stick with binary.

...

Another point with huge integers is that one might store a group of them with small differences, and want an equality index to tell them apart. For that an index of <length, first digits, last digits> could be useful. Then the inequality index and inequality filter for a huge value would need to generate different index keys though. (The filter would need to replace the "last digits" part of huge values with 0x00... or 0xff...) I don't know if OpenLDAP's indexing supports that.

Schemes like that will always manage to go wrong. No matter what lengths you choose, you'll always have situations where the keys are in the wrong order. E.g., 0x99xx0088: <x,99,88> and 0x99xx0277: <x,99,77>

...

In any case, for similar reasons it might be an idea to keep supporting the old hash(decimal value) index format.

I tend to wonder if huge integers need to be accounted for here at all. I think it may be sufficient to just use <length, first N bytes> and probably N=4 is good enough.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Russ Allbery

3:01 p.m.

Howard Chu hyc@symas.com writes:

...

So far, 2.4 and 2.3 have totally identical database formats. Is it worthwhile to break this compatibility to gain this feature, or better to preserve compatibility and ignore this feature for now? Any thoughts on going ahead with it here in RE24?

I say go for it. It's early in the 2.4 lifecycle, not many people will have upgraded yet, and I expect a lot of people, when upgrading to 2.4, to take advantage of the opportunity to go to a newer version of BerkeleyDB and therefore have to dump and restore their database anyway. (Distributions in particular will want to try to minimize the number of BerkeleyDB libraries they keep around, and if 4.6 works properly, there's a strong motivation to drop 4.2.)

-- Russ Allbery (rra@stanford.edu) http://www.eyrie.org/~eagle/

Howard Chu

6:22 p.m.

Russ Allbery wrote:

...

I say go for it. It's early in the 2.4 lifecycle, not many people will have upgraded yet, and I expect a lot of people, when upgrading to 2.4, to take advantage of the opportunity to go to a newer version of BerkeleyDB and therefore have to dump and restore their database anyway. (Distributions in particular will want to try to minimize the number of BerkeleyDB libraries they keep around, and if 4.6 works properly, there's a strong motivation to drop 4.2.)

Good enough. OK, it's now in HEAD. The feature is disabled by default (so backward compatibility is still there by default); you have to configure index_intlen to enable it. The setting specifies how many bytes of an integer to use as the key value. The default is zero, which disables it. (Deleting the setting with ldapmodify also disables it.)

The minimum setting is 4, which will use the first 4 bytes of an integer. The largest value would be 255; presumably no one will ever have integers that long but whatever...

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

30 Nov 30 Nov

4:53 a.m.

Finally had a look at the implementation of this (in RE24). It is wrong. The following values produces these indexes, ordered as: 0: 0480000000 +1: 0480000001 -1: 04800000ff +257: 0480000101 -257: 048000feff To get the correct ordering, use normal (fully sign-extended) two's complement but with the sign bit inversed. (Same as value + max positive value + 1)

-- Regards, Hallvard

Hallvard B Furuseth

5:37 a.m.

Also overflow gives wrong ordering: value 0x100000000: index 0480000000 value 0x80000000: index 0580800000 I'm too tired to figure out what the code does at overflow now, and maybe I misunderstood the discussion earlier, but it should give the right _ordering_ for overflow even if it can't make a very exact index.

I wrote:

...

To get the correct ordering, use normal (fully sign-extended) two's complement but with the sign bit inversed. (Same as value + max positive value + 1)

Actually, looking at lutil_str2bin() one's complement would be easier:-) And maybe it needs to turn '-0' into '0'? I think that's invalid LDAP Integer syntax so it doesn't matter in that context, but still.

-- Regards, Hallvard

Howard Chu

9:19 a.m.

Hallvard B Furuseth wrote:

...

Also overflow gives wrong ordering: value 0x100000000: index 0480000000 value 0x80000000: index 0580800000 I'm too tired to figure out what the code does at overflow now, and maybe I misunderstood the discussion earlier, but it should give the right _ordering_ for overflow even if it can't make a very exact index.

Right. All fixed.

...

I wrote:

...
To get the correct ordering, use normal (fully sign-extended) two's complement but with the sign bit inversed. (Same as value + max positive value + 1)

The sign bit was already being handled, but I was zero-extending instead of sign-extending. (doh...)

...

Actually, looking at lutil_str2bin() one's complement would be easier:-)

It might have been, but the current 2's complement is BER-compatible.

...

And maybe it needs to turn '-0' into '0'? I think that's invalid LDAP Integer syntax so it doesn't matter in that context, but still.

Right, invalid; integerValidate will reject it so it'll never get here.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Howard Chu

1:29 p.m.

Howard Chu wrote:

...

Hallvard B Furuseth wrote:

...
Actually, looking at lutil_str2bin() one's complement would be easier:-)

It might have been, but the current 2's complement is BER-compatible.

Which of course was the original point of lutil_str2bin, and why the previous patch broke test021. Sigh. Running all tests now.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

4:26 p.m.

Still wrong. Current ordering:

index value (hex value) 0400000000 -2147483648 (-0x80000000) 0400000001 -2147483647 (-0x7fffffff) 0440000000 -1073741824 (-0x40000000) 0460000000 -9126805504 (-0x220000000) 0470000000 -268435456 (-0x10000000) 047ffffeff -257 (-0x101) 047fffffff -1 (-0x1) 0480000000 0 (0x0) 0480000001 1 (0x1) 0480000101 257 (0x101) 0490000000 268435456 (0x10000000) 04a0000000 9126805504 (0x220000000) 04c0000000 1073741824 (0x40000000) 04ffffffff 2147483647 (0x7fffffff) 05007fffff -2147483649 (-0x80000001) 057e000000 -8589934592 (-0x200000000) 057effffff -4294967297 (-0x100000001) 057f000000 -4294967296 (-0x100000000) 0580800000 2147483648 (0x80000000) 0580800000 2147483649 (0x80000001) 0581000000 4294967296 (0x100000000) 0581000000 4294967297 (0x100000001) 0582000000 8589934592 (0x200000000)

-- Hallvard

Howard Chu

1 Dec 1 Dec

2:19 a.m.

Hallvard B Furuseth wrote:

...

Still wrong. Current ordering:

Thanks, fixed now. Seems we should have a test script for this though.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

6:37 a.m.

[Oops, sent this only to Howard at first.]

Howard Chu writes:

...

Thanks, fixed now. Seems we should have a test script for this though.

A quick one in python so far. And it found more problems... ------------------------------------------------------------------------ #!/usr/bin/python import os, random, re, sys

slapadd = '../RE24/servers/slapd/slapadd' conffile = 'test.conf' ldiffile = 'test.ldif' dumpfile = 'test.dump' debug = False

def testkeys(): randint = random.Random().randint yield 0 for n in (1, 0x101, 0x10000000, 0x40000000, 0x7fffffff, 0x80000000, 0x80000001, 0x100000000, 0x100000001, 0x200000000, 0x220000000, 0x1000000000000000000): yield n yield -n for i in xrange(0, 4097, 40): v = 1L << randint(i, i + 50) v = randint(v/2, v) yield -v yield v

def printall(all): for n, key in all: print "%s\t%d\t(%s0x%x)" % (key, n, (n < 0 and '-' or ''), abs(n))

f = file(conffile, "w") print >>f, """include /ldap/src/openldap/db/schema/core.schema attributetype ( 1.2.3.4.5.6.7.8.9.10 NAME 'testNumber' EQUALITY integerMatch ORDERING integerOrderingMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.27 )

database bdb directory /ldap/src/openldap/db suffix o=test rootdn o=test rootpw secret index objectClass,testNumber eq

database monitor""" f.close()

found = [] find_index = re.compile(r'\nHEADER=END\n (\w+)\n \w+\nDATA=END\n').search for n in testkeys(): os.system("rm -f __db.00* *.bdb log.0000000001 alock") f = file(ldiffile, "w") print >>f, """dn: o=test o: test objectClass: organization objectClass: extensibleObject testNumber: """ + str(n) f.close() if os.system("%s -f %s -l %s" % (slapadd, conffile, ldiffile)): sys.exit(slapadd + ' failed') if os.system("db_dump testNumber.bdb >" + dumpfile): sys.exit('db_dump failed') key = find_index(file(dumpfile).read()).group(1) found.append((n, key)) found.sort() if debug: printall(found) prev = found.pop(0) for next in found: if prev[1] > next[1]: print "\nBad ordering:" printall((prev, next)) prev = next ------------------------------------------------------------------------

When you prepend a negative sign byte you should add 0xff (like when sign-extending), not 0x80.

Got crashes due to a write after 'tmp', incremented size by 1. Haven't looked at why yet, or whether that's the complete fix. Also the utils.c code is rather larger than necessary. So I added a few other things while I was at it:

- add with carry can be done in a one-liner, and doesn't need any shifts. - can use the same code to prepend positive and negative sign byte - it makes little sense to check if there is room for the sign byte and then return wrong result if not. For now I added a failure there, but either that error check can be removed, or there should be more error checks.

Index: utils.c =================================================================== RCS file: /repo/OpenLDAP/pkg/ldap/libraries/liblutil/utils.c,v retrieving revision 1.33.2.12 diff -u -2 -r1.33.2.12 utils.c --- utils.c 1 Dec 2007 10:17:31 -0000 1.33.2.12 +++ utils.c 1 Dec 2007 13:55:53 -0000 @@ -664,4 +664,6 @@ * Hex input can be "0x1234" or "'1234'H" * + * Temporarily modifies the input string. + * * Note: High bit of binary form is always the sign bit. If the number * is supposed to be positive but has the high bit set, a zero byte @@ -737,5 +739,5 @@ num.len = 0; if ( pin[0] == '-' ) { - neg = 1; + neg = 0xff; len--; pin++; @@ -744,6 +746,6 @@ #define DECMAX 8 /* 8 digits at a time */

- if ( len > sizeof(tmpbuf)) { - tmp = ber_memalloc( len ); + if ( len >= sizeof(tmpbuf)) { + tmp = ber_memalloc( len+1 ); } else { tmp = tmpbuf; @@ -779,27 +781,16 @@ ptr[i] ^= 0xff;

- /* Add 1, with carry */ - i--; - j = 1; - for ( ; i>=0; i-- ) { - j += ptr[i]; - ptr[i] = j & 0xff; - j >>= 8; - if (!j) - break; - } - /* If we overflowed and there's still room, - * set an explicit sign byte - */ - if ( !( ptr[0] & 0x80 ) && num.beg ) { - num.beg--; - num.len++; - num.buf[num.beg] = 0x80; + /* add 1, with carry - overflow handled below */ + while ( i-- && ! (ptr[i] = (ptr[i] + 1) & 0xff )) ; + } + /* Prepend sign byte if wrong sign bit */ + if (( num.buf[num.beg] ^ neg ) & 0x80 ) { + if ( !num.beg ) { + rc = -1; + goto decfail; } - } else if (( num.buf[num.beg] & 0x80 ) && num.beg ) { - /* positive int with high bit set, prepend 0 */ num.beg--; num.len++; - num.buf[num.beg] = 0; + num.buf[num.beg] = neg; } if ( num.beg ) ------------------------------------------------------------------------

Also the index is wrong for huge numbers. At some point the indexing should just give up and use max/min values, but I suppose at least cryptograpy-sized numbers should be usefully indexed. I.e. at least two length bytes.

So here is a suggestion similar to what I wrote before, except using the utf-8 trick to count initial bits to say how many length-bytes are needed. That also means only one bit is needed to say 'no length bytes', so I reduced the indexkey size to exactly index_intlen.

Index: schema_init.c =================================================================== RCS file: /repo/OpenLDAP/pkg/ldap/servers/slapd/schema_init.c,v retrieving revision 1.386.2.16 diff -u -2 -r1.386.2.16 schema_init.c --- schema_init.c 1 Dec 2007 10:45:01 -0000 1.386.2.16 +++ schema_init.c 1 Dec 2007 14:34:55 -0000 @@ -2120,42 +2120,59 @@ ) { - struct berval iv; - int neg; + /* index format: + * one's complement (sign*length) if too large to fit in index, + * two's complement value (sign-extended or chopped as needed), + * with 1st byte of the above adjusted as follows: + * inverse sign in the top <number of length bytes + 1> bits, + * the next bit is the sign as delimiter. + */ + + ber_len_t k; + struct berval iv = *tmp; + unsigned char neg = 0, signmask = 0x80;

- iv = *tmp; if ( lutil_str2bin( val, &iv )) { return LDAP_INVALID_SYNTAX; }

- neg = iv.bv_val[0] & 0x80; + if ( iv.bv_val[0] & 0x80 ) + neg = 0xff;

- /* Omit leading 0 pad byte */ - if ( !iv.bv_val[0] ) { + /* Omit leading sign byte */ + if ( iv.bv_val[0] == neg ) { iv.bv_val++; iv.bv_len--; }

- /* If too small, sign-extend */ - if ( iv.bv_len < index_intlen ) { - int j, k, pad; - key->bv_val[0] = index_intlen; - k = index_intlen - iv.bv_len + 1; - if ( neg ) - pad = 0xff; - else - pad = 0; - for ( j=1; j<k; j++) - key->bv_val[j] = pad; - for ( j = 0; j<iv.bv_len; j++ ) - key->bv_val[j+k] = iv.bv_val[j]; + if ( index_intlen > iv.bv_len ) { + k = index_intlen - iv.bv_len; + memset( key->bv_val, neg, k ); /* sign-extend */ } else { - key->bv_val[0] = iv.bv_len; - memcpy( key->bv_val+1, iv.bv_val, index_intlen ); - } - if ( neg ) { - key->bv_val[0] = -key->bv_val[0]; + k = iv.bv_len - index_intlen; + if ( k || ((iv.bv_val[0] ^ neg) & 0xc0) ) { + unsigned char lenbuf[sizeof(k) + 2]; + unsigned char *lenp = lenbuf + sizeof(lenbuf); + do { + *--lenp = k ^ neg; + signmask >>= 1; + } while ( (k >>= 8) != 0 ); + if ( (signmask >> 1) <= (*lenp ^ neg) ) { + *--lenp = neg; + signmask >>= 1; + } + k = (lenbuf + sizeof(lenbuf)) - lenp; + if ( k > 7 ) { /* overflow in length of length */ + k = index_intlen; + memset( key->bv_val, ~neg, k ); + } else { + if ( k > index_intlen ) + k = index_intlen; + memcpy( key->bv_val, lenp, k ); + } + iv.bv_len = index_intlen - k; + } } - /* convert signed to unsigned */ - key->bv_val[0] ^= 0x80; + memcpy( key->bv_val + k, iv.bv_val, iv.bv_len ); + key->bv_val[0] ^= -signmask; /* invert sign */ return 0; } @@ -2187,6 +2204,6 @@ keys = slap_sl_malloc( sizeof( struct berval ) * (i+1), ctx ); for ( i = 0; !BER_BVISNULL( &values[i] ); i++ ) { - keys[i].bv_len = index_intlen+1; - keys[i].bv_val = slap_sl_malloc( index_intlen+1, ctx ); + keys[i].bv_len = index_intlen; + keys[i].bv_val = slap_sl_malloc( index_intlen, ctx ); } keys[i].bv_len = 0; @@ -2239,6 +2256,6 @@ keys = slap_sl_malloc( sizeof( struct berval ) * 2, ctx );

- keys[0].bv_len = index_intlen + 1; - keys[0].bv_val = slap_sl_malloc( index_intlen+1, ctx ); + keys[0].bv_len = index_intlen; + keys[0].bv_val = slap_sl_malloc( index_intlen, ctx );

if ( value->bv_len > sizeof( ibuf )) {

-- Regards, Hallvard

Hallvard B Furuseth

7:18 a.m.

I wrote:

...

Also the index is wrong for huge numbers. At some point the indexing should just give up and use max/min values, but I suppose at least cryptograpy-sized numbers should be usefully indexed. I.e. at least two length bytes.

Eeh. It makes more sense to check for ridiculous-sized numbers before parsing them and just output a min/max value depending on sign. (Or right-truncate e.g. n*12 digits and add n*5 to the length.) Anyway, this needs the numbers to be normalized before passed to index/filter functions. Are they? With different valid text representations of the same number it gets hard to check against the cut-off size.

-- Regards, Hallvard

Howard Chu

9:54 a.m.

Hallvard B Furuseth wrote:

...

I wrote:

...
Also the index is wrong for huge numbers. At some point the indexing should just give up and use max/min values, but I suppose at least cryptograpy-sized numbers should be usefully indexed. I.e. at least two length bytes.

I wonder about that. Two length bytes implies 512Kbit numbers. Who's going to be storing those in LDAP? A single value of that size is larger than most entries... Common crypto algorithms use 1024bit numbers and even those aren't stored on their own in decimal integer format.

...

Eeh. It makes more sense to check for ridiculous-sized numbers before parsing them and just output a min/max value depending on sign. (Or right-truncate e.g. n*12 digits and add n*5 to the length.)

Sounds fine.

...

Anyway, this needs the numbers to be normalized before passed to index/filter functions. Are they? With different valid text representations of the same number it gets hard to check against the cut-off size.

Yes, the numbers are already normalized before they get here.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

10:44 a.m.

Howard Chu writes:

...

Hallvard B Furuseth wrote:

...
I wrote:

...
Also the index is wrong for huge numbers. At some point the indexing should just give up and use max/min values, but I suppose at least cryptograpy-sized numbers should be usefully indexed. I.e. at least two length bytes.

I wonder about that. Two length bytes implies 512Kbit numbers. Who's going to be storing those in LDAP?

2 "length bytes" - (3 sign bits and delimiter bit) => 31Kbit, I think. But yes. I merely wrote support for more than 2 length bytes since it didn't seem more code than for 0-2 length bytes.

...

...
Eeh. It makes more sense to check for ridiculous-sized numbers before parsing them and just output a min/max value depending on sign. (Or right-truncate e.g. n*12 digits and add n*5 to the length.)

Sounds fine.

Looking at it now... not sure if I'll have time today though. (I assume I can go ahead and change the size, since you just changed the presence key size.)

-- Regards, Hallvard

Howard Chu

10:50 a.m.

Hallvard B Furuseth wrote:

...

Howard Chu writes:

...
Hallvard B Furuseth wrote:

...
Eeh. It makes more sense to check for ridiculous-sized numbers before parsing them and just output a min/max value depending on sign. (Or right-truncate e.g. n*12 digits and add n*5 to the length.)

Sounds fine.

Looking at it now... not sure if I'll have time today though. (I assume I can go ahead and change the size, since you just changed the presence key size.)

Yes, go ahead.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

2 Dec 2 Dec

7:20 a.m.

I wrote:

...

It makes more sense to check for ridiculous-sized numbers before parsing them and just output a min/max value depending on sign. (Or right-truncate e.g. n*12 digits and add n*5 to the length.)

Committed. Chopping n*7 digits and adding n*3 to the binary-integer length instead. More wasteful of the exponent bits, but leaves fewer digits to parse.

Could possibly chop just as many digits as needed and compute the length more exactly, but I wasn't up to figuring out what algorithm to use so an exponent taken from the parsed digits could add to overlap an exponent taken from chopping. Will fiddle a bit with it now and see if I come up with something.

One detail: Looks like lutil_str2bin() could use a ctx parameter, for the temporary malloc/free in that function.

-- Regards, Hallvard

Howard Chu

10:02 a.m.

Hallvard B Furuseth wrote:

...

I wrote:

...
It makes more sense to check for ridiculous-sized numbers before parsing them and just output a min/max value depending on sign. (Or right-truncate e.g. n*12 digits and add n*5 to the length.)

Committed. Chopping n*7 digits and adding n*3 to the binary-integer length instead. More wasteful of the exponent bits, but leaves fewer digits to parse.

Could possibly chop just as many digits as needed and compute the length more exactly, but I wasn't up to figuring out what algorithm to use so an exponent taken from the parsed digits could add to overlap an exponent taken from chopping. Will fiddle a bit with it now and see if I come up with something.

...

One detail: Looks like lutil_str2bin() could use a ctx parameter, for the temporary malloc/free in that function.

Good idea. I'll add that.

This change was unnecessary: @@ -2114,49 +2116,69 @@

static int integerVal2Key( - struct berval *val, + struct berval val, struct berval *key, - struct berval *tmp + struct berval itmp ) {

Passing structures by value may be perfectly legal now, but we never do it in this source base. Why start now?

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

10:16 a.m.

Howard Chu writes:

...

Passing structures by value may be perfectly legal now, but we never do it in this source base. Why start now?

We don't? OK, it's fine to do two struct copies instead inside the function.

-- Regards, Hallvard

Howard Chu

10:22 a.m.

Hallvard B Furuseth wrote:

...

I wrote:

...
It makes more sense to check for ridiculous-sized numbers before parsing them and just output a min/max value depending on sign. (Or right-truncate e.g. n*12 digits and add n*5 to the length.)

Committed. Chopping n*7 digits and adding n*3 to the binary-integer length instead. More wasteful of the exponent bits, but leaves fewer digits to parse.

This code is wrong:

/* Chop least significant digits, increase length instead */ if ( val.bv_len > k ) { chop = (val.bv_len - k + 2) / 7; /* 2 fewer digits */ val.bv_len -= chop * INDEX_INTLEN_CHOP; /* #digits chopped */ chop *= 3; /* >#key bytes chopped: 256**3 > 10**7 */ if ( chop > 0x7fffffff ) { memset( key->bv_val, neg ^ 0xff, index_intlen ); return 0; } }

On most 32 bit machines the "if( chop ...)" comparison must always fail, since ber_slen_t is only 32bits. You should do the test before multiplying chop by 3.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

10:32 a.m.

Howard Chu writes:

...

This code is wrong: (..) On most 32 bit machines the "if( chop ...)" comparison must always fail, since ber_slen_t is only 32bits. You should do the test before multiplying chop by 3.

No, chop already fits in ber_slen_t, since it is a ber_len_t/7*3. The check is for overflow in the length field in the final value, the algorithm doesn't handle more than 7 (or was it 6?) bytes. Could add code to handle distinguish even bigger integers, but it didn't seem much point. Anyway, I'll add a comment.

-- Regards, Hallvard

Howard Chu

10:37 a.m.

Hallvard B Furuseth wrote:

...

Howard Chu writes:

...
This code is wrong: (..) On most 32 bit machines the "if( chop ...)" comparison must always fail, since ber_slen_t is only 32bits. You should do the test before multiplying chop by 3.

No, chop already fits in ber_slen_t, since it is a ber_len_t/7*3. The check is for overflow in the length field in the final value, the algorithm doesn't handle more than 7 (or was it 6?) bytes. Could add code to handle distinguish even bigger integers, but it didn't seem much point. Anyway, I'll add a comment.

I think you misunderstood.

No matter what you're actually testing for, there is no way you're going to get a TRUE result from comparing ( signed 32 bit integer > 0x7fffffff ).

If it is actually possible for ( chop * 3 > 0x7fffffff ) then this test won't detect that fact. You need to test for ( chop > 0x7fffffff / 3 )

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

10:52 a.m.

Howard Chu writes:

...

I think you misunderstood.

No, I wrote too briefly.

...

No matter what you're actually testing for, there is no way you're going to get a TRUE result from comparing ( signed 32 bit integer > 0x7fffffff ).

Right, and chop <= (max ber_len_t)/2.3333 so the test is a no-op on 32-bit hosts. But on 64-bit hosts, chop can be an 8-byte integer which would require the first byte in the output key to hold 10 bits: inverse sign bit, 8 more bits (one for each chop byte), final sign bit.

-- Hallvard

Howard Chu

10:58 a.m.

Hallvard B Furuseth wrote:

...

Howard Chu writes:

...
I think you misunderstood.

No, I wrote too briefly.

...
No matter what you're actually testing for, there is no way you're going to get a TRUE result from comparing ( signed 32 bit integer > 0x7fffffff ).

Right, and chop <= (max ber_len_t)/2.3333 so the test is a no-op on 32-bit hosts. But on 64-bit hosts, chop can be an 8-byte integer which would require the first byte in the output key to hold 10 bits: inverse sign bit, 8 more bits (one for each chop byte), final sign bit.

If you simply rewrote the test it would eliminate all ambiguity.

...

...
If it is actually possible for ( chop * 3 > 0x7fffffff ) then this test won't detect that fact. You need to test for ( chop > 0x7fffffff / 3 )

The other point being, it saves a multiply; the division is a compile time constant.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Howard Chu

1 Dec 1 Dec

10:02 a.m.

Hallvard B Furuseth wrote:

...

When you prepend a negative sign byte you should add 0xff (like when sign-extending), not 0x80.

Right, good catch.

...

Got crashes due to a write after 'tmp', incremented size by 1. Haven't looked at why yet, or whether that's the complete fix.

That should never happen. Do you have the backtrace from when the write occurred?

...

Also the utils.c code is rather larger than necessary. So I added a few other things while I was at it:

add with carry can be done in a one-liner, and doesn't need any shifts.

can use the same code to prepend positive and negative sign byte

OK.

...

it makes little sense to check if there is room for the sign byte and then return wrong result if not. For now I added a failure there, but either that error check can be removed, or there should be more error checks.

It should never run out of room; the binary representation will always be smaller than the buffer length. Just remove that error check.

...

Also the index is wrong for huge numbers. At some point the indexing should just give up and use max/min values, but I suppose at least cryptograpy-sized numbers should be usefully indexed. I.e. at least two length bytes.

I guess that means the cutoff size should depend on the index_intlen. I don't think most people will need to handle huge values.

...

So here is a suggestion similar to what I wrote before, except using the utf-8 trick to count initial bits to say how many length-bytes are needed. That also means only one bit is needed to say 'no length bytes', so I reduced the indexkey size to exactly index_intlen.

In that case we must change the presence key in back-bdb. I left that unchanged since the key sizes were different so far, and ordered equality keys must be different in size from any other key types.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard B Furuseth

10:28 a.m.

Howard Chu writes:

...

Hallvard B Furuseth wrote:

...
Got crashes due to a write after 'tmp', incremented size by 1. Haven't looked at why yet, or whether that's the complete fix.

That should never happen. Do you have the backtrace from when the write occurred?

No, got memory corruption later. Valgrind pointed out the tmp write, but I don't see in the doc how to get valgrind to core at first error.

-- Regards, Hallvard

Howard Chu

10:38 a.m.

Hallvard B Furuseth wrote:

...

Howard Chu writes:

...
Hallvard B Furuseth wrote:

...
Got crashes due to a write after 'tmp', incremented size by 1. Haven't looked at why yet, or whether that's the complete fix.

That should never happen. Do you have the backtrace from when the write occurred?

No, got memory corruption later. Valgrind pointed out the tmp write, but I don't see in the doc how to get valgrind to core at first error.

Never mind, I see. Your fix is fine. It's the final NUL-terminator in the call to strtol().

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Howard Chu

10:39 a.m.

Howard Chu wrote:

...

Hallvard B Furuseth wrote:

...
Howard Chu writes:

...
Hallvard B Furuseth wrote:

...
Got crashes due to a write after 'tmp', incremented size by 1. Haven't looked at why yet, or whether that's the complete fix.

That should never happen. Do you have the backtrace from when the write occurred?

No, got memory corruption later. Valgrind pointed out the tmp write, but I don't see in the doc how to get valgrind to core at first error.

Never mind, I see. Your fix is fine. It's the final NUL-terminator in the call to strtol().

I take that back. That's completely unrelated...

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Howard Chu

11:01 a.m.

Hallvard B Furuseth wrote:

...

Howard Chu writes:

...
Hallvard B Furuseth wrote:

...
Got crashes due to a write after 'tmp', incremented size by 1. Haven't looked at why yet, or whether that's the complete fix.

That should never happen. Do you have the backtrace from when the write occurred?

No, got memory corruption later. Valgrind pointed out the tmp write, but I don't see in the doc how to get valgrind to core at first error.

If you run it with --db-attach it should prompt you for whether you want to drop into gdb on every error.

Running your utils.c code (without the len+1 change) doesn't show any problems in ElectricFence. Strange.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Howard Chu

11:26 a.m.

Howard Chu wrote:

...

Hallvard B Furuseth wrote:

...
Howard Chu writes:

...
Hallvard B Furuseth wrote:

...
Got crashes due to a write after 'tmp', incremented size by 1. Haven't looked at why yet, or whether that's the complete fix.

That should never happen. Do you have the backtrace from when the write occurred?

No, got memory corruption later. Valgrind pointed out the tmp write, but I don't see in the doc how to get valgrind to core at first error.

If you run it with --db-attach it should prompt you for whether you want to drop into gdb on every error.

Running your utils.c code (without the len+1 change) doesn't show any problems in ElectricFence. Strange.

Found it, fixed in HEAD.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

6424

Age (days ago)

6436

Last active (days ago)

openldap-devel@openldap.org

52 comments

8 participants

tags (0)

participants (8)

Aaron Richton
Hallvard B Furuseth
Howard Chu
Michael Ströder
Pierangelo Masarati
Quanah Gibson-Mount
Ralf Haferkamp
Russ Allbery