TODO:

* Calcuate statistics based on actual size of the bloom filter.  Use
  real binomial stats, the normal dist. approx. is inapproprate.

* Speed up by optimizing the matching function (don't check words that
  will be ignored anyway).

* At the same time, re-introduce the double-hit rule to further reduce
  false positives.

* Parallelize! And make it run efficiently on 8-CPU machines.

* Check each direction separately (doesn't make sense to combine hits
  in  both dirs)

* Limit hits to region/window.

Etc.