Mar  4 00:28 2002 from Josh
Sweet.  I just got a new favorite de-skewing method. :-) (Thank you John von Neumann!)
  Old favorite: Take output bits and XOR them.  Someone, sometime along the line told me that this resulted in the entropy adding (up to 1 bit, of course).  This has always struck me as wrong, though I just ran the math today.  Indeed, it is wrong.  It is cumulative, but not literally additive.
 So XORing 2 bits, each with half a bit of information entropy, does not yield a single bit with a full bit of information entropy... more like .75 bits, and discovering the exact amount seems to be a remarkable pain in the ass, though perhaps I'll think of a better way of making this calculation...
 Mathematically:
 H(X) = Sum (i = 1 to n) { p_i * lg (1/p_i) }
  (lg x is the notation I'm using for log base 2)
 for this problem, there are only two states: 0 and 1.  Define P(X = 1) = p.  So, P(X = 0) = 1-p.  So, the basic formula for information entropy for a binary system is:
 H(X) = (p-1)*(lg(1-p)) - p*lg(p)
 OK, so assume a bit, which is skewed (p != .5).  My previous favorite method for removing skew was just to XOR the bits together. Unfortunately, making general statements about the entropy of the resulting bit is irritating.
  X_0
             1              0       X_1
   1       0 (p^2)      1 (p)(1-p)
   0       1 (p)(1-p)   0 (1-p)^2

 so, P(X_c = 1) = 2*p*(1-p) = p_c
 (and P(X_c = 0) = p^2 + (1-p)^2 = 1 - P(X_c = 1) which works out)
 OK, great, but you'll note that if p!=.5, then p_c != .5, so the information entropy of the bit is still less than one, no matter what... Do this enough, and you asymptotically approach 1 bit of information entropy, but that's the best you get.
  Other common approaches at de-skewing (stream parity, FFT, compression, integrating other "random" values like sound, video, hard drive timing, etc, are either ridiculously complex, or just pushing the problem out one additional level, and are consequently on very shaky ground from an information theoretic point of view.
  Enter Transition Mappings, as proposed by von Neumann: Take your two independent (but skewed) bits.  If the bits are 11 or 00 discard them.  If they are 10 the output is 1.  If they are 01, the output is 0.  Ok, let's look at this:
 (NO = no output)
  X_0
             1              0       X_1
   1       NO (p^2)      1 (p)(1-p)
   0       0 (p)(1-p)    NO (1-p)^2

 Well, fuck'n a Bubba! Look at that, the probability for both
 P(X_t = 1) = P(X_t = 0) = = (p)(1-p)
 That's 1 bit of information entropy, baby! Oh yeah!