* Author: Wesley Carl Maness
* Email: wesley.maness@yale.edu
* Class: MBB452/CPSC752
* Assignment: Final Project
* Due: April 29, 2005
* By comparing the frequency of codons in a region of an species genome
* read in a given frame with the typical frequency of codons in the
* species genes, it is possible to estimate a likelihood of the
* region coding for a protein in such a frame. Regions in which codons
* are used with frequencies similar to the typical species codon
* frequencies are likely to code for genes (exons) while regions
* which codons distribution is uniform -- 1/64 (0.015625) -- could
* be considered as non-coding regions (introns).
*
* Algorithm: In my implementation I use the Table of Human Codon Frequencies
* which is a log likliehood scale. I use a size 120(configurable) window and
slide this window
* throughout the entire DNA raw sequence and for each sequence compute its
* log likliehood value from the table. Then I scan the entire array of these
values
* and identify sequences of positive values: sequences in which there are
positive
* values are more likely to be those of coding region for the Homo Sapiens
species.
* I simply scan until I find the first positive, then last positive and repeat
and
* display those sequences that are positive.
*
* Reference: The Table of Human Codon Frequencies is from the Weismann
Institute
* of Science.
NOTE: Due to particular browser policies
(active content) please use Microsoft's IE.
I am not condoning Microsoft, instead I am simply suggesting use of their
browser due to its lack of security, making it easier for my Applet to work
in-browser. Firefox, however, will cause problems and not run
Coding Finder correctly. If you get any errors please email me.
Thanks!
Code Package: all files in one: {zip
file}, {CodingRegionFinder.java}, {CodingRegionFinderApplet.java},
{DataLoader.java}, {freq.txt},
and a sample DNA sequence from Homo Sapiens dUTPase exon 3 {sequence}.