An open reading frame (ORF) is a contiguous sequence of nucleotides in DNA or RNA that could potentially encode a polypeptide. It begins with a start codon and extends without interruption by in‑frame stop codons until a termination codon is reached.
Identification and Characteristics
In most organisms the canonical start codon is AUG, which specifies methionine. Stop codons include UAA, UAG and UGA. Because nucleic acids are read in triplets, every stretch of double‑stranded DNA contains six possible reading frames—three in the forward direction and three on the reverse complement. An ORF is defined within one of these frames by the absence of termination codons; its length is measured in codons. In prokaryotes, ORFs correspond closely to protein‑coding genes because there are no introns. In eukaryotes, exons are interrupted by introns, so the ORF in mature mRNA may be discontinuous in genomic DNA. Short upstream ORFs in 5′ untranslated regions can regulate translation of downstream coding sequences. Overlapping ORFs, where two proteins are encoded in different reading frames on the same nucleotide sequence, are common in viruses and compact bacterial genomes. Identifying ORFs typically involves scanning sequence data for start and stop codons and assessing coding potential based on codon usage and conservation.
Applications in Genomics and Research
ORF prediction is a fundamental step in genome annotation and helps determine the complement of proteins encoded by an organism. Bioinformatics tools such as ORFfinder and GeneMark automatically locate ORFs within sequenced genomes and predict coding sequences. Molecular biologists insert ORFs into expression vectors to produce recombinant proteins in cells or cell‑free systems. In virology, characterization of ORFs reveals strategies such as polyprotein synthesis and ribosomal frameshifting. Comparative genomics uses ORF length and sequence conservation to distinguish functional genes from pseudogenes or random open reading frames. Cataloguing ORFs has led to the discovery of numerous small proteins and micropeptides previously overlooked due to their short length. Open reading frames provide the basic framework for translating nucleic acid sequences into proteins. Understanding and identifying ORFs are essential for interpreting genome sequences, studying gene expression and engineering synthetic constructs. Related Terms: Codon, Coding sequence, Start codon, Stop codon, Reading frame