--------
[ academics ] Message 9266 (8 left): Sun July 11, 1999  10:37pm
From: Aaron Emigh (aaron@alumni.cse.ucsc.edu)
Subject: more bit twiddling
20 lines (?)

I have an O(lg n) solution, modulo bickering over machine capabilities.

Think of a number in binary notation as having, in each bit position,
the "number of ones" in each bit position, 0 or 1.  Now think of using
each bit *pair* to hold the "number of ones" (zero, one or two) in that
pair of bits in the original number.  Now think of each bit 4-tuple as
containing the "number of ones" (0-4) in the corresponding nibble of
the original number, and so on.  Each successive stage of this transformation
can be done in O(1) with a shift and add (assuming a barrel shifter and
number of bits no greater than the native word size).

With lots of intermediate variables and funny formatting for exposition:

        sum_2  = (n      & 0x55555555) + ((n      & 0xaaaaaaaa) >> 1);
        sum_4  = (sum_2  & 0x33333333) + ((sum_2  & 0xcccccccc) >> 2);
        sum_8  = (sum_4  & 0x0f0f0f0f) + ((sum_4  & 0xf0f0f0f0) >> 4);
        sum_16 = (sum_8  & 0x00ff00ff) + ((sum_8  & 0xff00ff00) >> 8);
        count  = (sum_16 & 0x0000ffff) + ((sum_16 & 0xffff0000) >> 16);


--------
[ academics ] Message 9274 (0 left): Mon July 12, 1999  11:23am
From: Aaron Emigh (aaron@alumni.cse.ucsc.edu)
Subject: restatement
22 lines (?)

I'll go for the unification theory:

The algorithm as stated is O(lg n), where the fundamental operations are
AND, shifting and addition.

These fundamental operations are not in fact constant-time on arbitrarily
long datasets on real architectures -- they take n/<word-size>, or O(n)
time.  So you could call an implementation that followed the basic
algorithm exactly O(n lg n).

Of course, you wouldn't implement it that way -- it would be silly to
do these operations across the whole dataset.  You'd just do it piecewise,
one natural word at a time, and add up the intermediate results.  Each
word-sized piece will take O(lg <word-size>) = O(1) time since <word-size>
is constant, and there are O(n) performances of the intermediate count and
subsequent additions, making the whole thing O(n), though with a smaller
constant factor than more traditional techniques.

Whether you want to call it O(lg n) or O(n) is ultimately a matter of
personal style -- I claim it's O(lg n) in theory, O(n) in practice.