Please test RE24, it has a few new updates. This is a release candidate.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 06/24/11 13:31, Quanah Gibson-Mount wrote:
Please test RE24, it has a few new updates. This is a release candidate.
Revision 7080b68fee1fad5b1f8e14befa9e564269dd8f06 have all tests passed on FreeBSD/amd64 9.0-CURRENT.
Cheers, - -- Xin LI delphij@delphij.net https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die
We can drop the new CRC code in back-ldif in favor of the stronger MD5 from liblutil. They have the same speed on my host: CRC is simpler, but memory-bound. CRC guarantees to catch certain transmission errors like a few wrong bits, but this comes at the expense of its quality as a general hash function.
Or we could use a slower SHA function, but I figure MD5 is already stronger than we need.
Hallvard B Furuseth wrote:
We can drop the new CRC code in back-ldif in favor of the stronger MD5 from liblutil. They have the same speed on my host: CRC is simpler, but memory-bound. CRC guarantees to catch certain transmission errors like a few wrong bits, but this comes at the expense of its quality as a general hash function.
Or we could use a slower SHA function, but I figure MD5 is already stronger than we need.
I had considered MD5 before (especially since we already had code for it) but it was slower, and we're not looking for cryptographic assurances or hash distribution anyway. Basically all of these crypto hash functions are overkill, in terms of hash size and computation. We're only looking to detect casual misuse or corruption, not malicious deception.
I didn't really spend a lot of time comparing the two functions' speed. But even with the memory access bottleneck, I would guess that on a loaded system with many threads running, the algorithm with fewer instructions is the better choice. Have you measured the throughput when multiple threads are executing?
Howard Chu writes:
I didn't really spend a lot of time comparing the two functions' speed. But even with the memory access bottleneck, I would guess that on a loaded system with many threads running, the algorithm with fewer instructions is the better choice. Have you measured the throughput when multiple threads are executing?
Good point. I just did a quick single-threaded test program.
Howard Chu writes:
I had considered MD5 before (especially since we already had code for it) but it was slower, and we're not looking for cryptographic assurances or hash distribution anyway.
Yes, I was thinking mostly of killing unnecessary code and exposed features. After a brief test, I could see no good reason to keep CRC: Neither speed nor quality. On my machine, anyway.
Basically all of these crypto hash functions are overkill, in terms of hash size and computation. We're only looking to detect casual misuse or corruption, not malicious deception.
Yes. The main reason I see to use either CRC or MD5 is that they are likely to be installed somewhere so non-programmers can use them. We could use a faster hash like MurmurHash3, but would then need to provide a 'slaputil hash' command or something for use by shell programs.
I didn't really spend a lot of time comparing the two functions' speed. But even with the memory access bottleneck, I would guess that on a loaded system with many threads running, the algorithm with fewer instructions is the better choice. Have you measured the throughput when multiple threads are executing?
On my 32-bit host, MD5 on a threaded test program had 90-95% of CRC's throughput instead of 105% or whatever it was unthreaded. OTOH crc32() from linking -lz gave ~275%. OTOH I'd expect MD5 to be more costly on an older or cheaper machine where the which hasn't outpaced memory speed as much as modern workhorses.
Hallvard B Furuseth wrote:
Howard Chu writes:
Basically all of these crypto hash functions are overkill, in terms of hash size and computation. We're only looking to detect casual misuse or corruption, not malicious deception.
Yes. The main reason I see to use either CRC or MD5 is that they are likely to be installed somewhere so non-programmers can use them. We could use a faster hash like MurmurHash3, but would then need to provide a 'slaputil hash' command or something for use by shell programs.
Right. The fewer custom utilities involved the better.
I didn't really spend a lot of time comparing the two functions' speed. But even with the memory access bottleneck, I would guess that on a loaded system with many threads running, the algorithm with fewer instructions is the better choice. Have you measured the throughput when multiple threads are executing?
On my 32-bit host, MD5 on a threaded test program had 90-95% of CRC's throughput instead of 105% or whatever it was unthreaded. OTOH crc32() from linking -lz gave ~275%. OTOH I'd expect MD5 to be more costly on an older or cheaper machine where the which hasn't outpaced memory speed as much as modern workhorses.
Sounds to me like we should just use zlib's crc32 code then.