A Fourier Tour of Protein Function Prediction
ECE, Georgia Tech
A Fourier Tour of Protein Function Prediction
Predicting the biological functions of proteins from their amino acid sequences is one of the long-standing challenges in biology. A comprehensive solution has remained elusive due to the vastness of the combinatorial space of sequences and our limited ability to probe the space experimentally. In this primer, we view protein function prediction from a signal recovery perspective through the lens of the Fourier transform—also known as Walsh-Hadamard (WH) transform for sequence functions. We discuss how WH transform allows us to view protein functions as a multilinear polynomial and in terms of a familiar concept in statistical genetics called epistasis. We demonstrate that an intuitive divide-and-conquer strategy can find the polynomial using a number of samples and times that grows only linearly with the length of the protein sequence. Next, we discuss how we can leverage natural assumptions about the polynomial such as sparsity, to develop efficient protein function prediction algorithms rooted in signal processing and coding theory.