diff options
-rw-r--r-- | proc/primes.c | 67 |
1 files changed, 66 insertions, 1 deletions
diff --git a/proc/primes.c b/proc/primes.c index 1561cf21..a0be5acb 100644 --- a/proc/primes.c +++ b/proc/primes.c @@ -29,7 +29,8 @@ nextprime (int n) int *m; int i, j; - /* You are not expected to understand this. */ + /* See the comment at the end for an explanation of the algorithm + used. */ if (!q) { @@ -86,4 +87,68 @@ nextprime (int n) return q[p]; } +/* [This code originally contained the comment "You are not expected +to understand this" (on the theory that every Unix-like system should +have such a comment somewhere, and now I have to find somewhere else +to put it). I then offered this function as a challenge to Jim +Blandy (jimb@totoro.bio.indiana.edu). At that time only the six +comments in the function and the description at the top were present. +Jim produced the following brilliant explanation. + -mib] + + +The static variable q points to a sorted array of the first l natural +prime numbers. k is the number of elements which have been allocated +to q, l <= k; we occasionally double k and realloc q accordingly to +maintain this invariant. + +The table is initialized to contain a few primes (lines 26, 27, +34-40). Subsequent code assumes the table isn't empty. + +When passed a number n, we grow q until it contains a prime >= n +(lines 45--70), do a binary search in q to find the least prime >= n +(lines 72--84), and return that. + +We grow q using a "sieve of Eratosthenes" procedure. Let p be the +largest prime we've yet found, q[l-1]. We allocate a boolean array m +of p^2 elements, and initialize all its elements to false. (Upon +completion, m[j] will be false iff j is a prime.) For each number +q[i] in q, we set all m[j] to true, where j is a multiple of q[i], and +j is a valid index in m. Once this is done, since every number j (p < +j < p^2) is either prime, or has a prime factor not greater than p, +m[j] will be false iff j is prime. We scan m for false elements, and +add their indices to q. + +As an optimization, we take advantage of the fact that 2 is the first +prime. But essentially, the code works as described above. + +Why is m's size chosen as it is? Note that the sieve only guarantees +to mark multiples of the numbers in q. Given that q contains all the +prime numbers from 2 to p, we can safely sieve for prime numbers +between 2 and p^2, because any composite number in that range must +have a prime factor not greater than its square root, and thus not +greater than p; q already contains all such primes. I suppose we +could trust the sieve a bit farther ahead, but I'm not sure we could +go very far at all. + + +Possible bug: if there is no prime j such that p < j <= p^2, then the +growth loop will add no primes to q, and thus never exit. Is there +any guarantee that there will be such a j? I thought that people had +found proofs that the average density of primes was logarithmic in +their size, or something like that, but no guarantees. Perhaps there +is no bug, and this is the part I am not expected to understand; oh, +well. + + +[I don't know if this is a bug or not. The proofs of the density of +primes only deal with average long-term density, and there might be +stretches where prims thin out to the point that this algorithm fails. +However, it can only compute primes up to maxint (because ints are +used as indexes into the array M in the seive step). It's certainly +the case that no such thinning out happens before 2^32, so we're safe. +And since this function is only used to size process tables, I don't +think it will ever get that far even. -mib] + +*/ |