TODO: * Calcuate statistics based on actual size of the bloom filter. Use real binomial stats, the normal dist. approx. is inapproprate. * Speed up by optimizing the matching function (don't check words that will be ignored anyway). * At the same time, re-introduce the double-hit rule to further reduce false positives. * Parallelize! And make it run efficiently on 8-CPU machines. * Check each direction separately (doesn't make sense to combine hits in both dirs) * Limit hits to region/window. Etc.