A well-known mechanism through which new protein-coding genes originate is by modification of pre-existing genes, e.g. by duplication or horizontal transfer. In contrast, many viruses generate protein-coding genes de novo, via the overprinting of a new reading frame over an existing ("ancestral") frame. This mechanism is thought to play an important role in viral pathogenicity, but has been poorly explored, perhaps because identifying the de novo frames is very challenging. Thus, a new approach is needed. We assembled a reference set of overlapping genes for which we could determine the ancestral frames reliably, and found that their codon usage was significantly closer to that of the rest of the genome than the codon usage of de novo frames. Based on this observation, we designed a method that allowed the identification of de novo frames with a very good specificity, but intermediate sensitivity. Using this method, we predicted that the deltaretrovirus Rex gene has originated de novo by overprinting the Tax gene. Intriguingly, we showed that several genes in the same genomic region, encoding proteins which refine or regulate the functions of Tax, have also originated de novo. Such "gene nurseries", encoding proteins with complementary functions originated de novo and/or by horizontal transfer, may be common in viral genomes. Finally, our results confirm that the genomic GC content is not the only determinant of codon usage in viruses and suggest that a constraint linked to translation must influence codon usage.
|Tipologia ministeriale:||Articolo su rivista|
|Appare nelle tipologie:||1.1 Articolo su rivista|