Synopsis -------- korfu - Ketil's ORF Utility Reads nucleotide sequences, and outputs potential ORFs Installation ------------ You need the GHC compiler, or if you know what you are doing, another Haskell compiler or interpreter with Cabal. You also need to install the 'bio' library (darcs get http://malde.org/~ketil/bio) With those things in place, you should be able to do chmod +x Setup.hs ./Setup.hs configure ./Setup.hs build sudo ./Setup.hs install Optionally, add "--prefix $HOME" (without the quotes) after configure to install to your home directory. Usage ----- korfu -[a|f|s] [fasta1.seq [fasta2.seq..]] Where: -a prints all (overlapping) ORFs -f outputs translated sequences in Fasta-format -s is not implemented, but will eventually output some statistics If no files are specified, korfu will read from stdin. Bugs ---- I'm not finished with the features yet. Please come back later. - annotate Kozak or Shine-Dalgarno patterns - calculate amino composition - codon bias - later: input ACE files, and calculate Ka/Ks etc - quality? frame shifts? Done: * poly-A bonus Todo: * quality: variable "p" (penalty) * improved path selection (M/STP bonus) * output quality values * linker sequence bonus - we use GAACTT(....AAA..AAA)CTCGAG, is this universal? Related work: ------------ (List at http://biolinfo.org/EST/orf.html) ESTSCAN2 - uses HMM of hexanucleotides, can fix gaps (like GENSCAN) DECODER (riken - not found) DIANA-EST - ANN + "statistics", 90% accuracy in a test Diogenes - not specific TargetIdentifier - blastx-based, no frame-shift support? https://fungalgenome.concordia.ca/tools/docs/TargetIdentifier_faq.html Codon bias by organism: http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=72036&aa=1&style=N GC is often predominant in first position, less so in second and (least in?) third. Calculate overall GC, and adjust?