This contains the following tools: To build these, you will need a Haskell compiler (the most likely candidate begin GHC), and my bioinformatics library and the SimpleArgs module installed (Downloadable from: ). filter - remove unwanted sequences from a clustering usage: filter seq.list < cluster.L > cluster2.L cluster2.L will only contain sequence labels found in seq.list hist - produce a histogram of cluster sizes from a "label"-formatted clustering. clusc - compare clusterings, calculating numerous pair-based and entropy based indices. xcerpt - given a file containing a list of sequence labels (e.g. a "label" formatted clustering), extract matching sequences from a FASTA file. Like "agrep -d '^>'" without the bugs. Usage: xcerpt list.txt fasta.seq creates "fasta.seq.match" and "fasta.seq.rest" add_single - add singletons to a clustering. Usage: add_single all.L clustering.L creates clustering.L_s listing all sequences in all.L but not in clustering.L, one per line. ace2contigs - parse an ACE assembly file, and output the contigs in a FASTA file (named by tacking on .fasta to the ACE file name), and the corresponding quality information (.qual). ace2fasta - parse an ACE assembly, and output each assembly in a separate FASTA formatted file, with the necessary gaps inserted to align the sequences (suitable for import into e.g. Seaview) ace2clusters - parse an ACE assembly, and output clusters composed of the sequences used for each contig. The format is similar to TGICL's, with cluster output as one line consisting of a '>' and the contig name, and the next line containing the names of the sequences that comprise the cluster. clusterlibs - given a table of regular expressions and library names, along with a clustering (TGICL-format), output a table of clusters with the library name prepended to the sequences.