Amazon.com Widgets

Quiz: Primes in the BCL... part 2

Some good comments on my BCL prime quiz… but still one unanswered question (at the end)…

 

Thomas Woelfer quickly got that the place in the BCL for generating prime numbers is in Hashtable.cs and Eric Wilson does a good job explaining why…

Specifically, primes are used when the size of a hash table must be increased because of the load has reached a certain level. The size of the hashtable is increased such that the number of buckets in the hashtable increases to the smallest prime number that is larger than twice the current number of Hashtable buckets.

 

The reason for this is two fold. Using a prime number of buckets reduces the effect of clustering in the hashtable (having multiple keys hash to the same location). The size is at least doubled because by doing that the BCL can insure that over a series of N insert, the total time for all inserts takes no more than O(log(N)) total time

And I really love the reference to Juan Felipe Machado gives: “Knuth's Art of Computer Programming, Vol. 3, p. 528-9”, clearly a desktop reference favorite.   How long before it is common place to quote the SLAR v1, pp 239-243 ;-)

 

In addition, Norman Headlam gives some of the code in question, but the method I was really talking about is this one…

00149         private bool Primep (int candidate) {
00150             if ((candidate & 1) != 0) {
00151                 for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2){
00152                     if ((candidate % divisor) == 0)
00153                         return false;
00154                 }
00155                 return true;
00156             }
00157             else {
00158                 return (candidate == 2);
00159             }
00160         }
00161    

 

A few of you pointed out some interesting issues, but not what I was looking for… the problem is in the code above and it results in some numbers being return as prime that are not really prime…The bug can be found for “normal” input ranges such as [1-100].

 

Published 28 July 04 08:12 by BradA
Filed under:

Comments

# S N said on July 28, 2004 9:33 PM:
It should have been

divisor <= (int)Math.Sqrt(candidate) in the for loop condition.
# Mike Dunn said on July 28, 2004 11:24 PM:
If you pass in 9, the function will return true because divisor starts at 3, which is not less than sqrt(9), so the for loop never runs and the code drops to the return true line.
# b. schwehn said on July 29, 2004 12:23 AM:
same if you put in 1 which by definition isn't prime.
# b. schwehn said on July 29, 2004 12:35 AM:
btw, using divisor <= (int)Math.Sqrt(candidate) still returns true for candidate = 1
# damien morton said on July 29, 2004 1:25 AM:
for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2)

should be:
sqrtcand = Math.Sqrt(candidate);
for (int divisor = 3; divisor < sqrtcand; divisor+=2)

its a minor thing, but you dont want to be calling Math.Sqrt on every loop iteration.
# Fons Sonnemans said on July 29, 2004 1:46 AM:
What about:

int sqrtcand = (int)Math.Sqrt(candidate);
for (int divisor = 3; divisor <= sqrtcand; divisor+=2)
# Jonathan Perret said on July 29, 2004 2:24 AM:
Indeed, every squared prime will be considered prime.
Funny how I had a slight doubt yesterday about that '<' but I didn't push into it as I was looking for something more subtle...

Cheers,
--Jonathan
# b.schwehn said on July 29, 2004 3:15 AM:
<quote>for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2)

should be:
sqrtcand = Math.Sqrt(candidate);
for (int divisor = 3; divisor < sqrtcand; divisor+=2)</quote>

I believe this is neccessary, the compiler knows that candidate is constant and therefore sqrt(candidate) is constant as well and will do this optimisation itself
# b.schwehn said on July 29, 2004 3:47 AM:
Actually I was wrong, the nither the C# compilter nor the JIT compiler are able to optimize this themselves, presuably because they don't know whether Math.Sqrt has any intended Sideeffects besides the result. I wonder if there is a way to indicate this to the compiler..?

<code>
if ((candidate & 1) != 0) {
00000000 push ebp
00000001 mov ebp,esp
00000003 sub esp,20h
00000006 push edi
00000007 push esi
00000008 push ebx
00000009 mov dword ptr [ebp-4],ecx
0000000c mov esi,edx
0000000e xor ebx,ebx
00000010 xor edi,edi
00000012 test esi,1
00000018 je 00000056
for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2){
0000001a mov ebx,3
0000001f nop
00000020 jmp 00000031
if ((candidate % divisor) == 0)
00000022 mov eax,esi
00000024 cdq
00000025 idiv eax,ebx
00000027 test edx,edx
00000029 jne 0000002F
return false;
0000002b xor edi,edi
0000002d jmp 00000063
for (int divisor = 3; divisor < (int)Math.Sqrt (candidate); divisor+=2){
0000002f inc ebx
00000030 inc ebx
00000031 mov dword ptr [ebp-14h],ebx
00000034 mov dword ptr [ebp-20h],esi
00000037 fild dword ptr [ebp-20h]
0000003a fsqrt
0000003c fstp qword ptr [ebp-1Ch]
0000003f push dword ptr [ebp-18h]
00000042 push dword ptr [ebp-1Ch]
00000045 call 725CFA7C
0000004a cmp eax,dword ptr [ebp-14h]
0000004d jg 00000022
return true;
0000004f mov edi,1
00000054 jmp 00000063
return (candidate == 2);
00000056 cmp esi,2
00000059 sete al
0000005c movzx eax,al
0000005f mov edi,eax
00000061 jmp 00000063
}
</code>

versus
<code>
if ((candidate & 1) != 0) {
00000000 push ebp
00000001 mov ebp,esp
00000003 sub esp,20h
00000006 push edi
00000007 push esi
00000008 push ebx
00000009 mov dword ptr [ebp-4],ecx
0000000c mov esi,edx
0000000e mov dword ptr [ebp-0Ch],0
00000015 xor ebx,ebx
00000017 xor edi,edi
00000019 test esi,1
0000001f je 0000005D
int tt = (int)Math.Sqrt (candidate);
00000021 mov dword ptr [ebp-20h],esi
00000024 fild dword ptr [ebp-20h]
00000027 fsqrt
00000029 fstp qword ptr [ebp-1Ch]
0000002c push dword ptr [ebp-18h]
0000002f push dword ptr [ebp-1Ch]
00000032 call 725CF9FC
00000037 mov dword ptr [ebp-0Ch],eax
for (int divisor = 3; divisor < tt;divisor+=2){
0000003a mov ebx,3
0000003f nop
00000040 jmp 00000051
if ((candidate % divisor) == 0)
00000042 mov eax,esi
00000044 cdq
00000045 idiv eax,ebx
00000047 test edx,edx
00000049 jne 0000004F
return false;
0000004b xor edi,edi
0000004d jmp 0000006A
for (int divisor = 3; divisor < tt;divisor+=2){
0000004f inc ebx
00000050 inc ebx
00000051 cmp ebx,dword ptr [ebp-0Ch]
00000054 jl 00000042
return true;
00000056 mov edi,1
0000005b jmp 0000006A
return (candidate == 2);
0000005d cmp esi,2
00000060 sete al
00000063 movzx eax,al
00000066 mov edi,eax
00000068 jmp 0000006A
}
</code>
# Richard said on July 29, 2004 11:14 AM:

Use of prime number sizes for the array of hash table buckets is traditional. However the VC++ 7 & 7.1's hash based containers require a number of buckets that is a power of two:

The integer constant min_buckets specifies the minimum number of buckets to maintain in the hash table. It must be a power of two and greater than zero. The value supplied by hash_compare is 8.

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vcstdlib/html/vclrfHashcompare.asp?frame=true)

(This is the description of the hash_compare template, which you specialise to override the defaults in hash_set and hash_map container instances.)

It does some clever things to avoid rehashing everything as the number of buckets increases.
# Brad Abrams said on July 29, 2004 9:59 PM:
I was really going for the <= in line 151 as SN points out... Those off-by-one errors are really a kicker aren't they?
But good to see the other comments as well...
# Eric Wilson said on July 30, 2004 6:41 AM:
Is this fixed in Widbey?
# Brad Abrams said on July 30, 2004 6:44 AM:
Yes, this is fixed in Whidbey
# Frank Hileman said on July 30, 2004 11:51 AM:
Knuth's mod prime reference is so common, you would think that is the best way to do a hash table. But in fact, if you can get a better hash to start with, you don't need a mod prime, as the hash is already well distributed. Then you can use faster and better aligned power of 2 sized tables. That mod prime technique has been slow and obsolete for a long time now.

For an example of a more modern hash algorithm, look at FNV:
http://www.isthe.com/chongo/tech/comp/fnv/

That algorithm has already proven itself, speeding up a number of OS hashtables that used the obsolete mod prime method.
# damien morton said on August 3, 2004 11:36 AM:
FNV is for hashing strings, and is only indirectly related to hashtables and their hash functions.
# Web Log di Gianluca Carucci said on August 27, 2004 12:38 PM:
# Scott on Writing said on April 9, 2005 9:28 PM:
New Comments to this post are disabled

Search

This Blog

Syndication

Page view tracker