Nascent rna sequencing reveals distinct features in plant transcription

of nascent and steady-state transcripts in Arabidopsis thaliana seedlings using global nuclear run-on sequencing (GRO-seq), 5′GRO-seq, and RNA-seq and reanalyzed published maize data to

showed a lack of enhancer RNAs, promoter-proximal pausing, and divergent transcription in Arabidopsis seedlings and maize, which are commonly present in yeast and humans. In contrast, Arabidopsis and maize genes accumulate RNA polymerases in proximity of the polyadenylation site, a trend that coincided with longer genes

indicate Arabidopsis may regulate transcription predominantly at the level of initiation. Our findings provide insight into plant transcription

commonly bind the proximal promoter around −150 to −50 bp upstream of the transcriptional start site (TSS) ( 1, 2). At the core promoter, located approximately ±50 bp relative to the TSS, basal TFs cooperate with conserved DNA sequence

motifs to orchestrate recruitment of the RNA polymerase (RNAP) ( 1, 3). Transcription has been studied extensively in a number of species ( 1 ⇓– 3) but not in plant model systems. Studies focusing on promoter-enriched sequences were hindered by the lack of precise TSSs

( 4, 5) but have improved dramatically through techniques such as paired end analysis of transcription start sites (3PEAT) ( 6) and cap analysis gene expression (CAGE) ( 7), but both methods are affected by RNA processing and transcript stability.

RNA sequencing by global nuclear run-on sequencing (GRO-seq) ( 8), precision nuclear run-on sequencing (PRO-seq) ( 9), or native elongating transcript sequencing (NET-seq) ( 10) highlighted the abundance of unstable transcripts in some eukaryotes such as yeast and mammals ( 11), and yet these methods have been difficult to perform in plants. GRO-seq was recently used in maize seedlings and provided

important insight into monocot transcription ( 12) but with limited TSS data and the omission of sarkosyl during the run-on reaction.

Sarkosyl is required to block RNAP initiation,

unhindered elongation, and efficient pause release ( 13, 14). We thus sought to optimize traditional GRO-seq for plants using Arabidopsis as a model with the aim to make it readily available to the community.

Here, we report an adapted GRO-seq method ( 8), as well as a new version of HOMER ( 15), to facilitate analysis of plant next-generation sequencing (NGS) data. In this study, we focus on 7meG-capped transcripts

as generated by RNAP II from 6-day-old Arabidopsis seedlings to identify transcripts encoding protein-coding genes, microRNAs (miRNAs), and other noncoding RNAs. De novo annotation

of nascent transcripts revealed many unstable noncoding transcripts, although these transcripts were underrepresented in Arabidopsis compared with mammals. Motif analysis identified previously unreported promoter motifs and revealed comparable structures

than in Arabidopsis ( Arabidopsis, r 2 = 0.57; Human, r 2 = 0.32; Fig. S5 A), underlining a much tighter correlation between transcription and steady-state RNA levels in Arabidopsis. Only exons were used to avoid bias associated with differential intron length between species. Together with the absence

of promoter-proximal pausing, this correlation proposes Arabidopsis transcription is more predominantly regulated at the level of transcription initiation compared with humans.

of nascent transcripts and definition of TSSs revealed distinct characteristics of Arabidopsis transcription and their connection to other eukaryotic systems. The lack of divergent transcription in Arabidopsis and likely maize contrasts with the notion that eukaryotic promoters are inherently divergent ( 41). Highly directional initiation of transcription was also observed in Drosophila ( 38). Notably, both Arabidopsis and Drosophila display strong core promoter signatures, suggesting a prominent role for the core promoter and its motifs in mediating transcriptional

directionality. Arabidopsis core promoters were enriched for distinct Inr-like motifs and the TATA-box with 80% and 30%, respectively. The strong prevalence

TATA-box binding protein (TBP) gene, plants lack TBP-related factors ( 42). In bilateral symmetric animals, these factors were shown to support different transcription systems, enabling regulatory

diversity through core promoter motif diversity ( 42, 43). Arabidopsis, on the other hand, encodes two additional eukaryotic RNAPs: RNAP IV and RNAP V, which are integral to the repression of

a subset of genes and transposons through RNA-directed DNA methylation ( 44). These additional RNAPs may reflect a different evolutionary approach to increasing the regulatory diversity of the genome.

GRO-seq identified 9,200 transcripts in 6-d-old Arabidopsis seedlings, of which only 153 were noncoding transcripts generated by RNAP II. This number is considerably less than in humans

pausing ( 45). eRNAs were reported to mediate release of NELF-dependent pausing ( 46). Therefore, given the absence of NELF, potential eRNAs may not have provided the same selective advantages in plants. In

contrast, however, Zhu et al. ( 25) predicted over 10,000 plant enhancers based on chromatin signatures in leaves and flowers. Without tissue-matched GRO-seq

a high correlation between nascent and steady-state transcript levels argues that Arabidopsis transcription is predominantly regulated at the level of initiation. Li et al. ( 35) reported transcription to be the most regulated step in human gene regulation. In this light, transcription initiation may

RNAP pausing, a major regulator of transcription elongation in mammals ( 48), was observed predominantly downstream of the PAS in Arabidopsis and maize. The underlying mechanism is unknown but is likely a common feature in plant transcription. Previous in vitro yeast

and therefore a higher chance of degradation ( 39). This idea may hold true in plants based on the higher GRO-seq signal compared with RNA-seq for longer Arabidopsis genes, which also show higher amounts of 3′ pausing compared with shorter genes. In addition, DNA methylation was shown to