Abstract
Abstract
Generation of productive transcripts of protein coding genes in eukaryotes is a complex, multistep process centrally controlled by the RNA polymerase II (Pol II) complex. The carboxy terminal domain (CTD) of the largest subunit of the enzyme is designed to be modified by differential phosphorylation, and plays a key role in orchestrating the multiple events of the process by interacting with a host of transcription-associated proteins (TAPs) at different stages. We analyzed, in silico, the role of serine phosphorylation of CTD in relation to molecular interaction between different TAPs and a representative part of the CTD repeat structure. Using molecular docking, we investigated eight different proteins involved in capping, elongation, splicing, 3’ end cleavage, or polyadenylation functions during the transcription process. Among the different phosphorylated forms of CTD, the form found to have the most affinity for a particular protein was also the form that is predominant during that process, the only exception being the equally high affinity of S2PCTD to Spt4, although S5PCTD is the known active form during elongation. The unique phosphoserine of the CTD forms associated with the TAPs was an important participant in the association between both the molecules. These studies have also identified other residues of TAPs interacting with CTD which in previous studies have not been recognized as being functionally significant. These findings add to an emerging body of literature on the regulatory aspects of genomics and proteomics and thus, might catalyze future applications for discovery and translational omics science.
Introduction
Several transcription-associated proteins (TAPs), either involved directly in the modification of the primary transcript or involved in the modification of the polymerase itself, interact through their association with CTD. Different motifs of structures mediating these interactions have been identified. CTD binding factors bind to CTD via conserved domains such as tryptophan (WW) domains, phenylalanine (FF) domains, and CTD interacting domains (CIDs) (Fabrega et al., 2003; Meinhart and Cramer, 2004; Verdecia et al., 2000). These domains are found in many proteins such as the guanylyl transferase domain of the capping enzyme (Cgt1) (Fabrega et al., 2003), WW domain of peptidyl proline isomerase (Pin1) (Verdecia et al., 2000), and the CID domain of cleavage/polyadenylation factor (Pcf11) (Meinhart and Cramer, 2004).
In the present study, we analyzed the molecular interaction between different TAPs with a representative part of the CTD repeat structure. CTD assumes different phosphorylation configurations during the transcription process. Using molecular docking, we have investigated the role of the alternate forms of CTD in participating in the interaction with TAPs and have observed that the interaction between the biologically relevant form of CTD to be the strongest and the structure to be robust. We have also identified the key residues in these interactions with three TAPs.
Materials and Methods
Of the proteins reported to have biochemical or genetic associations with Pol II CTD, eight proteins involved at various stages of transcription were chosen for the study. Their structures were retrieved from the Protein Data Bank (PDB) (Table 1). From this group, the structures of three proteins (Cgt1, Scp1, and Pcf11) bound to a particular phosphorylated form of CTD are also available in the PDB. The structures of S5PCTD, S2S5PCTD, and S2PCTD were extracted from their respective complexes with Cgt1, Scp1, and Pcf11. The structural coordinates of the CTD, as well as the proteins without their corresponding CTD pairs, were energy minimized using Schrodinger's Impact tool (Impact version, 2005).
Docking analysis
Hex 4.5 docking software (Ritchie and Kemp, 2000) was used to study the strength of interaction between the proteins and CTD peptides. In Hex's docking calculations, each molecule is modeled using 3D parametric functions that are used to encode shape and surface geometry, electrostatic charge, and potential distributions. Essentially, this allows each property to be represented by a vector of coefficients. For each interaction, the docking trial was run for 100 simulations, which were ranked in the order of increasing docking energy values and grouped into clusters of similar conformation. The eight transcription factors analyzed were subjected to docking with the four different phosphorylated forms of CTD (Ser5P, Ser2P, both, and neither). Table 2 summarizes the structural data of CTDs used in this study. The energy values were a sum of the energy contributed by the complementarity geometry of the ligand–receptor pair and the energy contributed by various forces between the two molecules.
Indicates the position of phosphorylated serines.
The interacting residues in each molecular pair were visualized using a two-dimensional view of the ligand–receptor complex. The docked complexes were initially minimized using Schrodinger's Impact software and were further analyzed through their corresponding ligplots. The factors Pcf11, Cgt1, and Scp1 were selected for this analysis as their X-ray crystallographic structures in complex with a particular form of PCTD were available in the protein data bank. A comparative analysis of the interacting residues between the ligand–receptor pair in different complexes of a single protein with all the four forms of CTD was done.
Results
The eukaryotic RNA Pol II interacts with several different proteins while different tasks get accomplished. At the same time, during its journey from the beginning of a transcription unit to its end, the CTD of RNA Pol II assumes several alternate phosphorylated forms. During these transitions, the CTD is likely to be the initiator of the interaction with different proteins at different stages of the transcription process. To study how the alternate phosphorylated forms of CTD might facilitate various proteins involved in different transcription associated events, the present study was undertaken.
CTD structure
As the structure of the non-phosphorylated form of CTD was not available in any publicly accessible database, either as a part of the RNA pol II molecule or in association with any TAPs, its structure was culled by extricating the phosphate groups from the other available structures of phosphorylated CTDs. The structures of CTD were derived from the structure of its complexes with Cgt1 (Fabrega et al., 2003), Scp1 (Zhang et al., 2006), or Pcf11 (Meinhart and Cramer, 2004). To test whether these derived structures were comparable with one another, and were not influenced by their source or the interacting proteins, the binding affinities between the different TAPs and the unphosphorylated CTD from the alternate sources were independently determined by docking analysis. Each of the structure was individually tested for affinity of interaction, and the mean and standard deviation of the binding affinities from the independent docking scores is shown in Table 3. It can be seen that the binding affinities between the proteins and the peptides did not vary much between each other as the variance between the binding energies was very little, even though the input data for the phosphorylated forms of CTD were with different proteins and from different laboratories (Fabrega et al., 2003; Meinhart and Cramer, 2004; Zhang et al., 2006). The structures of the non-phosphorylated CTD structures derived from these alternate sources were within reliable limits of variation for comparative analysis. Thus, it may be assumed that the method of deriving structural data by minor modification of existing structural data of peptides can broadly be applied. In the case of unphosphorylated CTD interactions, the mean of the binding affinities was employed in further analyses.
Binding affinity of transcription-associated proteins with different phosphorylated forms of CTD
Since it has been reported that different phosphorylated forms of CTD are involved at different stages of the transcription process, it has been postulated that these forms facilitate recruitment of different proteins at different stages of the transcription process (Bartkowiak et al., 2011; Bentley, 2002; Hirose and Manley, 2000; Kim et al., 2010; Maniatis and Reed, 2002; Proudfoot, 2004). To evaluate this idea further, eight proteins involved during different stages of transcription, as shown in Table 1, were analyzed for their interaction with different forms of CTD by docking analysis. The binding affinity between the different combinations of molecular pairs are shown in Table 4. It could be seen that the strength of association between the different phosphorylated forms of CTD with the proteins showed a large variation, suggesting the influence of phosphorylation in effecting the interaction. In the case of Cgt1 and Set2, there was a 5- to 10-fold difference in the affinity between the most and the least interactive forms of CTD to the proteins, while in the other cases the difference in the affinity between the best and the worst case was less than 5-fold. In all cases except Spt4, the strongest binding affinity seen was with a phosphorylated form which has been reported to be the biochemically active form in the transcription process. This suggests that the affinity between CTD peptides and the TAPs was an important component in keeping together the molecular complex.
The location of the phosphorylated amino acid in the peptide is indicated. The preferred binding form, indicated by literature is shown in bold font.
Cgt1 and Set2 have catalytic function while in association with the transcription complex. In these two cases, there was a very weak association to the unphosphorylated form of CTD, suggesting that in the absence of phosphorylation of the CTD, the enzymatic reaction may not occur. In the case of Spt4, both S2PCTD and S5PCTD showed strongest binding, although only S5PCTD has been reported to be the functional form. Thus, in this case, there was an anomaly between function and structural association.
Critical residues in the interaction between transcription-associated-proteins and different phosphorylated forms of CTD
The capacity of different TAPs to interact with the different phosphorylated forms of CTD was indicated by their differential affinity for the various forms. To understand the differences generated by alternate phosphorylated forms of CTD, the basis of the interaction was studied using ligplot analysis of the protein–CTD complex. The structural coordinates of the proteins Cgt1, Scp1, and Pcf11 with their cognate CTD peptide has been solved and the data are available in PDB. The docked structures of these three proteins, in complex with the four different phosphorylated states of CTD peptides, were generated, and the ligplots of these structures were analyzed to deduce the interacting residues and the nature of association. The results are tabulated in Table 5. None of these proteins showed any interacting residues with the unphosphorylated form of CTD, in reflection of the low binding affinities seen earlier (Table 4). With Cgt1, the unphosphorylated CTD showed no association with any of the residues of this protein, whereas in the case of proteins Scp1 and Pcf11, although their binding affinities with the unphosphorylated CTD were relatively stronger, there were still no specific contacts between amino acids seen (Table 5).
Residues in bold font indicate those from CTD.
In cases with strong affinity between the molecular pairs, Cgt1with S5PCTD, Scp1with S2P CTD, and Pcf11with S2PCTD, more hydrogen bonded contacts were seen, suggesting their importance in contributing to the association. In the above pairs, the phosphorylated serine residues were involved in either hydrogen bonded contacts or nonbonded contacts with the corresponding proteins. Thus, phosphorylation of the specific serine appeared to be a necessary feature for establishing an interaction with these proteins.
Discussion
Structure of unphosphorylated CTD
The consensus heptad of CTD has the potential to be modified extensively post-translationally. Also, it is a small peptide to have firm structural conformation. Thus, to obtain structural coordinates for this peptide for in silico analyses can be difficult. Our approach of using solved structure in association with a scaffold-like binding protein could be a way to overcome this problem. The consistency of the docking scores among the differently sourced peptides with all the proteins tested in this study suggests that this approach was viable.
Docking of TAPs to the phosphorylated CTD variants
Docking analysis showed that the most active form of the phosphorylated forms of CTD have the greatest affinity to the TAPs analyzed. The only exception to this was that of Spt4, which also showed high affinity to S2PCTD. One explanation could be that this analysis was performed with the putative structure of CTD of one to two heptads (Table 2) and without the Pol II molecule, but in cells the CTD is in multiple repeats tethered to pol II. Also, the phosphorylation levels have been reported to be heterogeneous, albeit predominantly one of the four forms. It is likely that for the recruitment of the TAPs, attraction by a few copies of the repeated CTD heptad chain in an appropriate configuration would suffice. Alternatively, the S2PCTD could be a different functional form not reported earlier. Hartzog and colleagues (1998) have reported a strong physical interaction between Spt4 and Spt5. This complex has been shown to function as a positive mediator of transcription elongation (Wada et al., 1998). Lindstorm and Hartzog (2001) have reported that the Spt4–Spt5 complex functioned early in elongation, and the kinase Bur1 directly regulated the phosphorylation of Spt4–Spt5 and mediated their function in elongation. Apart from its role in early elongation, many reports indicate the involvement of Spt proteins in activities related to mRNA capping (Wen and Shatkin, 1999). Kaplan and colleagues (2000) have reported the involvement of Drosophila Spt5 protein with a domain implicated in the regulation of splicing. Our analysis suggests that Spt4 might in fact bind to S2PCTD as well with equal affinity as that with S5PCTD, and perhaps Spt4 was important in the later stages of elongation as well.
The capping complex places the m7G cap on the nascent transcript as it exits the core polymerase, stabilizing the mRNA by preventing its degradation by 5’-3’ exonucleases. The CTD repeats proximal to the core pol II were ideally placed near the RNA exit tunnel to facilitate this capping reaction. Phosphorylation of S5 played an important role in the recruitment of the capping enzyme complex. Ho and Shuman (1999) reported that in vitro mammalian RNA guanylyl transferase bound to CTD phosphorylated at either S2 or S5, but the enzymatic activity was stimulated only if the CTD was phosphorylated at S5. Our studies suggest that there was a 3.5-fold stronger binding of Cgt1 to S5PCTD compared to S2PCTD, and binding to other CTD forms was much weaker. Thus, S5 phosphorylation at this position of the transcription complex was more likely to be eventful than other phosphorylated forms of the CTD.
Molecular interactions
The salient interacting residues of Cgt1, Scp1, and Pcf11 interacting with CTD reported from other published work are summarized in Table 6. In Cgt1, the residues L163 and G164 were shown to be important for the capping activity in many species (Cramer et al., 2001). The results from our analyses showed that these residues were seen to be making nonbonded contacts with the CTD residues Y1 and S2. In the crystal structure reported by Fabrega et al., (2003), the CTD Docking Site 1 (CDS1) was shown to be composed of the residues K152, R157, and Y165. In our analysis, K152 involved hydrogen bonded contacts with residues of CTD. However, the other two residues, R157 and Y165, were not seen to be a part of any association as analyzed by the ligplot. Fabrega et al., (2003) have also reported another CDS (CDS2) involving the residues R140, D175, and K178 of Cgt1, which were in conformance with our results. However, the involvement of the residues K161, T167, and E168 in nonbonded contacts identified by our study has not been reported by earlier studies. Since these three residues are polar residues, they could be involved in stabilizing the ionic interactions between the protein and the peptide.
In the case of Scp1, Komarnitsky et al., (2000) have shown that the residues R 178 and K 190 to be important for phosphatase activity and D 98 to function as a general base for the final catalytic step, based on mutational and structural analyses. Our results show that the residue R 178 was hydrogen bonded to T4 and also made a nonbonded contact with P6 with the doubly phosphorylated CTD. Also, K 190 was observed to form hydrogen bond to S7 and a nonbonded interaction between D98 and T4. None of these interactions were seen with the other forms of CTD. In the docked structures, few residues not reported earlier were also observed and included the hydrogen bonded contacts with K140, N187, Y188, and K190 and nonbonded contacts with the residues D121, H125, A153, and S154 were seen. The significance of these associations will be worth further perusal.
Amino acid residues D 68, S 69, and I 28 of Pcf11 have been shown by Meinhart and Cramer (2004) to be important for polyadenylation reaction. Our analyses showed that D 68 formed hydrogen bond with Y1, while S 69 formed nonbonded contact with Y1 in S2PCTD. Other forms of CTD did not show any significant associations. Pcf11 docked with S2PCTD showed the presence of all the residues listed in the CID, forming either part of hydrogen bonded contacts or nonbonded contacts. The only exception is the residue K104, which has not been identified in earlier studies.
Conclusion
The structure of the consensus heptapeptide of CTD was remarkably consistent while in association with TAPs, and this could be used as a basis for deriving structures of its phosphorylated modifications. The most active biochemical form of CTD had the most affinity with TAPs, while the other forms had weaker or no interaction. The CTD peptide TAP interactions were predominantly mediated through hydrogen bonds, and the specific form of phosphoserine of the CTD was a key residue in the interaction with the TAPs. These molecular docking studies have identified key residues participating in the association between the CTD peptide and the three TAPs: Cgt1, Scp1, and Pcf11. Although some of these have been reported in earlier studies, not all reportedly important residues were involved in molecular association. These studies have also disclosed some new residues not identified through earlier biochemical, genetic, or crystallographic studies.
Footnotes
Acknowledgments
The authors thank the management of PSG College of Technology for the infrastructure facility provided. VS thanks AICTE for a financial grant through National Doctoral Fellowship.
Author Disclosure Statement
The authors declare that no conflicting financial interests exist.
