Overlapping Protein Coding Regions
A research article entitled A First Look at ARFome: Dual-Coding Genes in Mammalian Genomes authored by Wen-Yu Chung1, Samir Wadhawan1, Radek Szklarczyk2, Sergei Kosakovsky Pond3, and Anton Nekrutenko1 contained this interesting opening paragraph:
Coding of multiple proteins by overlapping reading frames is not a feature one would associate with eukaryotic genes. Indeed, codependency between codons of overlapping protein-coding regions imposes a unique set of evolutionary constraints, making it a costly arrangement. Yet in cases of tightly coexpressed interacting proteins, dual coding may be advantageous. Here we show that although dual coding is nearly impossible by chance, a number of human transcripts contain overlapping coding regions. Using newly developed statistical techniques, we identified 40 candidate genes with evolutionarily conserved overlapping coding regions. Because our approach is conservative, we expect mammals to possess more dual-coding genes. Our results emphasize that the skepticism surrounding eukaryotic dual coding is unwarranted: rather than being artifacts, overlapping reading frames are often hallmarks of fascinating biology.
Coding of multiple proteins by overlapping reading frames is indeed an unexpected outcome of a selection process whose options are stochastically generated. The authors allude to the obvious when they observe that dual coding imposes constraints on amino acid possibilities. Note the following paragraph and its introductory headline (article quotes in blue). It contains some helpful definitions:
Dual Coding Is Virtually Impossible by Chance
Before describing our analyses, we define terms used in this paper. A dual-coding gene contains two frames read in the same direction: canonical (annotated as protein coding in literature and/or databases) and alternative. The alternative reading frame (ARF) is shifted forward one or two nucleotides relative to the canonical frame (+1 and +2 ARFs, respectively). To identify dual-coding genes, we used a comparative genomics strategy, because all presently known alternative reading frames are conserved in multiple species. For example, ARFs in Gnas1, XBP1, and INK4A are conserved in all sequenced mammals [8,10,12].
It is highly improbable that the encoding sequences for two distinct proteins would align exactly as required to confer function to both. The conclusion notes probability in deducing a likelihood of functionality.
Maintenance of dual-coding regions is evolutionarily costly and their occurrence by chance is statistically improbable. Therefore, an ARF that is conserved in multiple species is highly likely to be functional.
1 Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania, United States of America, 2 Integrative Bioinformatics Institute, Vrije Universiteit, Amsterdam, The Netherlands, 3 Antiviral Research Center, University of California San Diego, La Jolla, California, United States of America