What Happens When the Conclusion is Assumed
'Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families' is an article appearing in PLOS Biology and can be viewed at the linked site. Snippets are included in this post for the purpose of commentary. The first of two:
"The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature."1
This sounds like an exciting project likely to yield a gold mine of new data. A vast number of hitherto unknown proteins are being discovered. This includes new protein families and new sequences. But how did this shed light on evolution? Envisioned is the possibility of future inclusions of data within an evolutionary theoretical framework but what are the authors able to state now to back the claim that proteins "shed light on their evolution?" What would have happened if IDists, upon discovering new protein familes, announced that these discoveries had shed light on the intelligent design of such proteins? Can you imagine the reaction? The shedding of light comment makes sense only when an evolutionary outcome is presumed. But if that is the case the discovery can hardly be evidence for the conclusion that evolutionary pathways led to the proteins. More from the cited article:
"Our analysis predicted more than six million proteins in the GOS data—nearly twice the number of proteins present in current databases. These predictions add tremendous diversity to known protein families and cover nearly all known prokaryotic protein families. Some of the predicted proteins had no similarity to any currently known proteins and therefore represent new families. A higher than expected fraction of these novel families is predicted to be of viral origin. We also found that several protein domains that were previously thought to be kingdom specific have GOS examples in other kingdoms. Our analysis opens the door for a multitude of follow-up protein family analyses and indicates that we are a long way from sampling all the protein families that exist in nature."2
An acknowledgement that we are a long way from knowing what remains to be known about all the protein families that exist in nature is reason for pause. Might we be in for surprises that do not accord with current theories and make their revision necessary? Let's wait and see.
1. Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families; Shibu Yooseph, Granger Sutton, Douglas B. Rusch1, Aaron L. Halpern, Shannon J. Williamson, Karin Remington, Jonathan A. Eisen, Karla B. Heidelberg, Gerard Manning, Weizhong Li, Lukasz Jaroszewski, Piotr Cieplak, Christopher S. Miller, Huiying Li, Susan T. Mashiyama, Marcin P. Joachimiak, Christopher van Belle, John-Marc Chandonia, David A. Soergel, Yufeng Zhai, Kannan Natarajan, Shaun Lee, Benjamin J. Raphael, Vineet Bafna, Robert Friedman1, Steven E. Brenner, Adam Godzik, David Eisenberg, Jack E. Dixon, Susan S. Taylor, Robert L. Strausberg, Marvin Frazier, J. Craig Venter; PLOS Biology; http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pbio.0050016