Synopsis
  --------

korfu - Ketil's ORF Utility

Reads nucleotide sequences, and outputs potential ORFs

  Installation
  ------------

You need the GHC compiler, or if you know what you are doing, another
Haskell compiler or interpreter with Cabal.  You also need to install
the 'bio' library (darcs get http://malde.org/~ketil/bio)

With those things in place, you should be able to do

     chmod +x Setup.hs
     ./Setup.hs configure
     ./Setup.hs build
     sudo ./Setup.hs install

Optionally, add "--prefix $HOME" (without the quotes) after configure
to install to your home directory.

  Usage
  -----

        korfu -[a|f|s] [fasta1.seq [fasta2.seq..]]

Where:

  -a  prints all (overlapping) ORFs
  -f  outputs translated sequences in Fasta-format
  -s  is not implemented, but will eventually output some statistics

If no files are specified, korfu will read from stdin.


  Bugs
  ----

I'm not finished with the features yet.  Please come back later.

	- annotate Kozak or Shine-Dalgarno patterns
	- calculate amino composition
 	- codon bias
	- later: input ACE files, and calculate Ka/Ks etc
	- quality?  frame shifts?

Done:

  * poly-A bonus

Todo:

  * quality: variable "p" (penalty)
  * improved path selection (M/STP bonus)
  * output quality values
  * linker sequence bonus - we use GAACTT(....AAA..AAA)CTCGAG, is this universal?

   Related work:
   ------------

(List at http://biolinfo.org/EST/orf.html)
ESTSCAN2 - uses HMM of hexanucleotides, can fix gaps (like GENSCAN)
DECODER (riken - not found)
DIANA-EST - ANN + "statistics", 90% accuracy in a test
Diogenes - not specific
TargetIdentifier - blastx-based, no frame-shift support?
   https://fungalgenome.concordia.ca/tools/docs/TargetIdentifier_faq.html

Codon bias by organism:
 http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=72036&aa=1&style=N
GC is often predominant in first position, less so in second and (least in?) third.

Calculate overall GC, and adjust?