This quiz will count for four percent of your final grade, and should take no more than twenty minutes to complete.
(2 points)
1 a) Explain briefly how the central idea of dynamic programming is violated in structural alignment.
Dynamic programming splits up a big task into little bits. In this case, computing a similarity matrix and summing it. However, using this matrix to adjust the alignment between the two structures can CHANGE the matrix, and it must be recomputed. Therefore, this is not a job where each calculation can be done once and assembled at the end, since every step can wipe the slate and require the whole matrix to be done again. (1 point)
1 b) How is this overcome?
This is overcome by iterating -- recalculating the matrix at every step (or at every step when it needs to be recalc, i.e. when it changes). (1 point)
(1 point)
2) What is SQL and what does it stand for?
Structured Query Language -- simple way to create and access tables, run dbase queries (1 point)
(4 points)
3 a) Define top-down and bottom-up clustering.
a) top down, you tell it how many groups -- bottom up, it decides how many groups / divisions should exist in the data (2 points)
b) Briefly contrast supervised and unsupervised machine learning.
supervised = the goal of the analysis is given by the user. In the case that no explicit goal parameter is available, the analysis is called unsupervised. (2 points)
NB this question is SUPER EASY POINTS
(2 points)
4) Briefly, what does it mean to NORMALIZE a table?
ANSWER --
to split data into separate tables so updating is easy and redundancy is minimized
properly, it is 3 steps:
(they do not need to write this much this is so RINN can mark better, this is the full and proper definition, they need to have some concept of this)
The first normal form: The key
In order to make a table meet the criteria of the first normal form, the database designer must ensure that every row is unique, that every cell in a column utilizes the same data type, and that each cell only holds one value. Typically, the designer needs to focus on removing any repeating groups to move a database schema into the first normal form.
The second normal form: The whole key
For the second normal form, you must ensure that all nonkey columns are dependent on the whole key. This is targeted at composite key tables and dictates that all nonkey columns must be dependent on the entire key.
The third normal form: Nothing but the key
The third normal form is concerned with removing what are known as transitive dependencies, which occur when nonkey columns are actually dependent on other nonkey columns.
(6 points)
5) You are given the following decision tree (see attached page).
PART ONE
What is the probability that the following organisms will be pathogenic? (Writing a ratio is fine, if you cant work out the percentage in your head.)
a) ORGANISM 1: Eukaryotic, introns in 35% of genes, does not live on minimal media
population is 85/15 so chance is 85/100 = 85%
b) ORGANISM 2: Prokaryotic, not thermophilic, has XYZ gene family
175 / (175+125 = 300) so 175 / 300
c) ORGANISM 3: Prokaryotic, not thermophilic, no additional data known
This one stops before asking the XYZ question. The leaf ratio is 200 / (200+200=400) or 50%
PART TWO
c) It turns out that new data has been published and the tree needs to be revised: now, 50% (instead of 75%) of the eukaryotes in the data set have introns in >20% of their genes (meaning 50%, instead of 25%, do not). Assuming all other RATIOS remain the same, how do your answers to a, b, and c change?
This means that the total numbers in each leaf will change but since the ratios remain the same, there is no change to the probability in a). Also, no change to b) or c), since the right hand branch of the tree is not affected by this.
d) Given the conditions in c) above, how do the numbers in leaf X change?
Now we see a change. Previously, the total in leaf X was 75 + 25 = 100, or 25% of the 400 in the above node. This is now changing to 50% of the above node, meaning the total population of leaf X is now 200, instead of 100. The ratio remains the same, so 75/25 becomes 150/50.
e) Which decision pathway is most enriched for pathogenic organisms meaning, what set of conditions are most likely to result in an organism being pathogenic?
Simple, look for the leaf / node with the most pathogenicity. This is the third from the left on the bottom, and the ratio is 125/175. Therefore, the conditions are those which lead to this node:
NOT EUKARYOTIC ( NOT THERMOPHILIC ( HAS XYZ GENE FAMILY
MBB452a/752a Genomics & Bioinformatics QUIZ #2
Wednesday December 3 2003 15 POINTS TOTAL
Name: ___________________________________________ Credit / Audit (C/A)? _____
Page PAGE 3 of NUMPAGES 4
_AuthorEmail_AuthorEmailDisplayName layName 4WMBB452a QUIZ 2 with answersmichael.seringhaus@yale.eduMichael Seringhaus