Sequence variations in G-protein-coupled receptors: analysis of single nucleotide polymorphisms
Suganthi Balasubramanian, Yu Xia, Elizaveta Freinkman and Mark Gerstein
We assessed the disease-causing potential of single nucleotide polymorphisms (SNPs) based on a simple set of sequence-based features. We focused on SNPs from dbSNP in G-protein-coupled receptors (GPCRs), a large class of important transmembrane (TM) proteins. Apart from the location of the SNP in the protein, we evaluated the predictive power of three major classes of features to differentiate between disease-causing mutations and neutral changes: (1) Properties derived from amino-acid scales, such as volume and hydrophobicity; (2) Position-specific phylogenetic features reflecting evolutionary conservation, such as normalized site entropy, residue frequency and SIFT score; and (3) Substitution-matrix scores such as those from the BLOSUM62, GRANTHAM and PHAT matrices. We validated this approach using a control dataset consisting of known disease-causing mutations and neutral variations. Logistic regression analyses indicated that position-specific phylogenetic features that describe the conservation of an amino acid at a specific site are the best discriminators of disease mutations versus neutral variations and integration of all the features improves discrimination power. Overall, we identify 115 SNPs in GPCRs from dbSNP that are likely to be associated with disease and thus are good candidates for genotyping in association studies.