Bioinformatics (and other) tools in Haskell

An assorted collection of tools for (mainly) sequence analysis that I've found useful or interesting.

There's also a blog where I will document code, provide examples, and generally jot down stuff about bioinformatics and Haskell that's on my mind.

The tools are mostly written in Haskell and available as darcs repositories.

Oh, and there's a health warning image in SVG format. But I digress.

You could also check the bin directory, where I'll try to have current, statically linked Linux binaries floating around. I'll also build debian packages when I get around to it.

Stability may vary, but drop me an email at <ketil(at)malde.org> if you run into problems or have any questions

Non-Bioinformatic tools in Haskell

Okay, these aren't bioinformatics-related, really.

SimpleArgs

A 'getArgs' that is somewhat more flexible and informative (in case of incorrect usage) than the one in System.Environment. Not a replacement for GetOpt, but useful for quick hacks. Check the README for some examples.

darcs get http://malde.org/~ketil/simpleargs

interlude

Tired of anonymous error messages from 'read', 'head' or 'tail'? Interlude is an include file that defines replacements for partial functions as CPP macros, so that you'll at least get the line number where the function failed.

darcs get http://malde.org/~ketil/interlude

Bioinformatics tools in Haskell

(Note that these are now grouped under a separate subdirectory - biohaskell).

repeats from ESTs

darcs get http://malde.org/~ketil/biohaskell/estreps

xml2x

Convert XML output from BLASTN or BLASTX (BLASTP as soon as I - or you - need it), supporting both pre- and post 2.2.13.

darcs get http://malde.org/~ketil/biohaskell/xml2x

simseq

Okay, I'd really have preferred to have cleaned this up a bit first, but if it is useful to anybody... This is a simulator that can generate simulated sequences -- primarily EST type sequences, but quite possibly other types as well. Mail me for further information on usage etc.

darcs get http://malde.org/~ketil/biohaskell/simseq

dephd

A tool for converting phd files (phred output) to...well, different things. Can produces FASTA sequences and quality, quality plots, and rank sequences by a couple of different quality metrics. There's a README-file with a bit more detail.

darcs get http://malde.org/~ketil/biohaskell/dephd

rbr

RBR is a tool for masking EST sequences. It uses a statistical model to identify suspicious sequence parts, and masks them. The README has more details.

darcs get http://malde.org/~ketil/biohaskell/rbr

xsact

Xsact is an EST clustering program with a variety of output options.

darcs get http://malde.org/~ketil/biohaskell/xsact

cluster tools

This is a bunch of stuff I needed at some for manipulating sequence clusters. See the README for details.

darcs get http://malde.org/~ketil/biohaskell/cluster_tools

the bioinformatics library

I've collected most of the recyclable material in a library. Currently contains functionality for sequence data, including a bunch of file formats.

darcs get http://malde.org/~ketil/biohaskell/biolib