Exploring a Drosophila Transcription Factor Interaction Network to Identify Cis-Regulatory Modules

Abstract

Multiple transcription factors (TFs) bind to specific sites in the genome and interact among themselves to form the cis-regulatory modules (CRMs). They are essential in modulating the expression of genes, and it is important to study this interplay to understand gene regulation. In the present study, we integrated experimentally identified TF binding sites collected from published studies with computationally predicted TF binding sites to identify Drosophila CRMs. Along with the detection of the previously known CRMs, this approach identified novel protein combinations. We determined high-occupancy target sites, where a large number of TFs bind. Investigating these sites revealed that Giant, Dichaete, and Knirp are highly enriched in these locations. A common TAG team motif was observed at these sites, which might play a role in recruiting other TFs. While comparing the binding sites at distal and proximal promoters, we found that certain regulatory TFs, such as Zelda, were highly enriched in enhancers. Our study has shown that, from the information available concerning the TF binding sites, the real CRMs could be predicted accurately and efficiently. Although we only may claim co-occurrence of these proteins in this study, it may actually point to their interaction (as known interaction proteins typically co-occur together). Such an integrative approach can, therefore, help us to provide a better understanding of the interplay among the factors, even though further experimental verification is required.

1. INTRODUCTION

Multiple transcription factors (TFs) bind to a specific site in the genome. These sites are noncoding DNA sequences of a few hundred base pairs (bp) to several thousand base pairs long and are referred to as cis-regulatory elements (CREs) (Ong and Corces, 2011; Wittkopp and Kalay 2011). The binding of TFs to these CREs may result in the activation of chromatin remodeling (Siggers et al., 2011; Slattery et al., 2011) or the recruitment of a transcription preinitiation complex, which ensures the regulation of genes. The transcription factor binding site (TFBS) motifs are short in length and often imprecise in nature and are referred to as degenerate TFBSs (Stormo, 2000; Bulyk, 2004). This makes the accurate identification of TFBSs an abiding challenge (Wasserman and Sandelin, 2004; Grove and Walhout, 2008). In addition, most of the computationally predicted functional TFBSs are not always available for binding, presumably due to chromatin structure and/or local epigenetic properties. TFBSs often show preferential co-occurrence with other TFBSs (Cao et al., 2010; Yao et al., 2013). The spatial combination of such multiple TFBSs is nonrandom in nature and forms the cis-regulatory modules (CRMs) (Ludwig et al., 1998; Yuh et al., 1998; Krivan and Wasserman, 2001).

For example, even-skipped (eve) stripe 2 enhancer in Drosophila contains multiple binding sites of Kr, Bcd, Gt, and Hb within 500 bp (Small et al., 1992). With the advent of advanced techniques, such as immunoprecipitation of in vivo cross-linked chromatin followed by microarray (ChIP-chip) or sequencing (ChIP-seq), the precise locations of TFBSs can be determined. However, such high-throughput empirical methods remain expensive and technically challenging despite advancements in, and decreasing costs of, the respective technologies. However, computational approaches can be a quicker and more cost-effective solution for CRM or TFBS prediction.

A number of in silico studies have been carried out to determine CRMs (Su et al., 2010; Aerts, 2012; Hardison and Taylor, 2012; Nandi et al., 2013; Suryamohan and Halfon, 2015). These studies (Frith et al., 2001; Pilpel et al., 2001; Berman et al., 2002; Markstein et al., 2002; Rajewsky et al., 2002; Frith et al., 2003; Sharan et al., 2004; Gotea and Ovcharenko, 2008; Kazemian et al., 2013; Spivak and Stormo, 2016) often depend on the computational identification of individual TFBS across the genome. The parameters required for these methods may include prior information on interacting TFs and the expected distance between TFBSs, prior knowledge of the co-regulated genes, and information on specific search areas, such as promoters of certain genes, with a background model for the search space or sequences. Some of the methods (Grundy et al., 1997; Guhathakurta and Stormo, 2001; Sinha et al., 2006; Kim et al., 2010; Wu and Xie, 2010) first deduce the probabilistic grammar rules for the functional module, which are later detected in the genome or preferred search area. Such models predict CRMs by comparing the observed likelihood of the CRMs with the likelihood of CRMs generated by the background models (Aerts, 2012).

As mentioned above, a computational prediction is not free from random co-occurrence, and this could arise from an inappropriate search space. Some methods, instead of considering the whole genome as a search space, target evolutionary conserved space (Chiang et al., 2003; Kellis et al., 2003; Blanchette et al., 2006; Hallikas et al., 2006; Jiang and Singh, 2014; Spivak and Stormo, 2016), which is again not a trivial feat. There are few methods to examine the experimental ChIP sequencing data as a search space to improve CRM prediction (Aguilar and Oliva, 2008). The efficiency of CRM identification can be improved by integrating more information from experimental data. For example, certain regulatory regions, such as enhancers, can be predicted with high accuracy if certain epigenetic features are considered (Hon et al., 2009), such as the monomethylation of lysine 4 of histone H3 (H3K4me1) (Heintzman et al., 2007), the acetylation of histone (Roh et al., 2005; Calo and Wysocka, 2013), or the binding of coactivator CBP/p300 (Heintzman et al., 2007) to the genome.

Several experimental studies have been conducted to identify and categorize the TF-TF interactions in the model organism Drosophila (Giot et al., 2003; Stanyon et al., 2004; Guruharsha et al., 2011; Rhee et al., 2014). A number of studies based on co-expression have been carried out (Tomancak et al., 2007; Adryan and Teichmann, 2010) to explore the TF interactome. Although these studies have categorized a large number of TF-TF interactions in Drosophila, the catalog is still far from complete, since these studies do not include the complete range of TFs. Moreover, there are some databases available that contain a large volume of binding site information of numerous TFs. For example, TRANSFAC (Matys et al., 2006) and JASPAR (Mathelier et al., 2015) contain the binding site information of the TFs in the form of aligned DNA motifs or frequency tables. These databases can be utilized to predict the binding sites and determine the CRMs.

Therefore, to complement the present gap in the experimental studies, we integrated experimental data and computational predictions to obtain additional speculative associations among TFs. This integration helped us to identify novel combinations in addition to the known CRMs. This analysis also identified paired (prd) as an essential factor during developmental processes along with others, such as Zelda (zld). Moreover, after investigating the TF co-occurrence in enhancers, Zelda and Dichaete (D) were found to be highly enriched.

Many different TFBSs can occupy a compact genome locus, referred to as a high-occupancy target (HOT) region (Moorman et al., 2006; Macarthur et al., 2009; Kvon et al., 2012). These studies have revealed that numerous factors, often functionally unrelated, bind to these HOT regions, which result in the activation of nearby genes. Studies in Drosophila (Weigel et al., 1990; Borok et al., 2010) have shown that genomic loci for binding sites of several TFs and cofactors strongly overlap with each other. Previous studies have indicated the sequence signature of certain TFs such as Vielfaltig/Zelda and Trithorax-like/GAGA in these HOT regions to be enriched. In the present study, we identified the HOT regions using data sets similar to those previously used (Consortium, 2010). The observations made in our study indicated that the binding motifs of known TFs are depleted in the HOT regions, although we used more known motifs in our study. Furthermore, the present study has shown that the de novo predicted motifs, Giant, Dichaete, and Knirp, are enriched in the HOT regions.

2. METHODS

2.1. Identification of genomic binding locations/regions of TFs

We determined the genomic binding locations of the recent chromatin immunoprecipitation experiment for the genome-wide binding of 38 TFs in Drosophila (Macarthur et al., 2009; Contrino et al., 2012) by the following method (Supplementary Table S1). All ChIP-chip raw data sets were converted to a current genomic version by using tools available in the UPCSC database (Siepel et al., 2005). The intensity ratio of chromatin immunoprecipitation by antibodies (Ip) and for control DNA input (In) as well as the ratio of Ip to mock precipitation by IgG (IgG) were calculated using the Tiling Array Software (TAS). TAS was used with default parameters and Probe Analysis settings: bandwidth, 300; Test Type, one side upper; perfect match probe intensities. Bound regions were defined by array probes that passed the three standard deviation cutoff within a window size of 200 bp. Regions of less than 5 probes were excluded, and a region was extended as long as there was a value within 200 bp of the previous value. The midposition of the six highest consecutive probe values was set as the center of each detected region. The value of each detected region was set as the average of the highest six probe ratios. After the region had been detected, all the binding values of TFs were linearly adjusted to 0 and 1. In this current study, we compared the identified binding sites with two more available experimental data (Negre et al., 2011; Paris et al., 2013). These studies mapped a large number of TFs to the Drosophila genome. The binding sites obtained from these studies highly overlap with our identified binding sites.

2.2. Prediction of binding sites in Drosophila promoters

The genomic coordinates of the transcription start sites (TSSs) of all known genes were obtained from the Drosophila melanogaster genome (version dm3), downloaded using BioMart (Durinck et al., 2005). The promoter sequences, defined as 1000 bp upstream and 1000 bp downstream sequences around the TSS, were extracted from the Drosophila R5 genome sequence (Marygold et al., 2013). Binding site information in the form of a frequency table was obtained from JASPAR (Version: 5.0 ALPHA) (Mathelier et al., 2015), and position weight matrices (PWMs), which represent the log-odd probabilities of finding each base at each position, were calculated from the frequency table using the method described in Gershenzon et al. (2005) and Nandi and Ioshikhes (2012). The PWMs were then applied to the extracted promoter sequences to predict the binding sites.

The threshold was derived from a background model of randomized promoter sequences, which were created by shuffling the promoter sequences (Nandi et al., 2013). Each PWM was applied to the randomized promoter sequences to build the background model. The background model generated the “Randomized Occurrence Frequency” (OFr) at a given threshold, which essentially represents how many hits could be obtained by chance for the threshold. The threshold value was selected for each PWM by choosing the threshold that generated OFr value of 0.0001 with an intention to minimize the detection of false positives from the promoter sequences.

2.3. Co-occurrence count

In the genomic binding location, if any two factors were found to bind within the range of 100 bp upstream or 100 bp downstream of each other, these factors were considered as associated/co-occurring factors. From the complete genome-wide binding data, the number of times a factor was associated with another factor was retrieved using the criterion. We call this number the co-occurrence count. This co-occurrence count is represented by the thickness of the edges between two nodes, which show the magnitude or number of times each pairwise combination was observed. The widths of the edges were scaled as the ratio of pairwise co-occurrence count of two given factors with the highest pairwise co-occurrence count of any two factors in the particular network. The width was calculated using the following formula: $w i d t h_{(a, b)} = \frac{μ_{(a, b)}}{\begin{matrix} N \end{matrix}},$

where μ_(a,b) = co-occurrence count of the combination of factors a and b; N = total number of TFBSs.

2.4. Generation of network

The network layout was drawn using Yifan Hu layout implemented in Gephi (Gephi 0.9.1, https://gephi.org) (Hu, 2005). The following parameters were used: optimal distance = 250, relative strength = 0.2, initial step site = 20.0, step ratio = 0.95, adaptive cooling is checked, and theta = 1.2. The bigger the optimal distance and the relative strength make the nodes farther apart. The smaller theta means better accuracy. Eigenvector centrality was calculated with the algorithm implemented in Gephi. The eigenvector centrality is based not only of the degree of the connections to the nodes, but it also favors the connection to the central nodes (Bonacich, 2007).

2.5. Statistical significance of the co-occurring factors

Identified binding sites conforming to the criterion given above were used to determine the co-occurrence. The co-occurrence count was obtained from the designated promoters and compared with the count from shuffled sequences. The z-score was then calculated as follows: $z - s c o r e = \frac{(O b s - e x p)}{\sqrt{e x p}},$

where Obs is the observed co-occurrence count in the designated promoters, and exp is the expected co-occurrence count in the shuffled promoter sequences. The combinations were selected if the z-score is >3.

2.6. Protein–protein interaction

The network data collected from DroPNet (Yu et al., 2008; Renaud et al., 2012) and Cytoscape (Shannon et al., 2003) were used for better visualization and editing. The connections in the network are based on human interologs, DPIM (Drosophila Protein Interaction Mapping) co-AP/MS, or other physical interactions. Human orthologs in Drosophila were used due to the protein interaction data of the corresponding TFs being unavailable. DPIM co-AP/MS (co-affinity purification combined with mass spectrometric analysis) is a Drosophila protein interaction map using affinity-tag-based protein purification followed by mass spectrometry-based proteomics. Other physical interaction data sets are based on a yeast two-hybrid system retrieved from DroID (Renaud et al., 2012).

2.7. Identification of the HOT regions

The binding profiles of 38 early embryonic TFs were used to identify the genomic regions where more than one TFBSs were found within 200 bp. The central peak of a TF binding region was used as an initial point, and all other peak centers were included within a certain region if their peaks were not separated by more than 200 bp from each other. The HOTness value of a genomic region was calculated based on the number of TFBSs within the genomic region. For the occurrence of each TFBS in the region, 1 was added to the HOTness value. And if the TFBS was already present in the given region, 0.5 was added to the HOTness value (Supplementary Fig. S1). All the determined TFBSs were assigned to one of five classes according to their HOTness values. Class I contained TFBSs with a HOTness value of 1 to 1.5, Class II (2 ≤ HOTness <6), Class III (6 ≤ HOTness <8), Class IV (8 ≤ HOTness <13), and Class V (13 ≤ HOTness). Regions in which ≥8 TFs bind were defined as the HOT regions.

2.8. De novo motif identification

We used previously developed tools, Sigfinder [Sigfinder V2012, with default parameters with probe setting parameter: createProbesBylength(6)] and Aligner (Aligner V2012, with default parameters), for de novo motif discovery (Philip et al., 2012). Supervised multivariate method, orthogonal partial least squares discriminant analysis (OPLS-DA) (Bylesjo et al., 2006), was implemented to identify motifs or sequences that could be used to predict TF recruitment in the HOT regions. For motif prediction, the peak centers were used together with the region, which extended to 100 bp both upstream and downstream. Only intronic regions of Class I HOTness that were strongly bound by the corresponding TFs were selected as a positive set. Class I regions, which were bound by TFs other than their corresponding one, were randomly selected and used as a training set for constructing the model. The top 1000 sequence words from each OPLS-DA model were used. Aligner detects sequence differences between genes strongly bound and weakly bound by the TFs based on the top 1000 sequence words obtained from the OPLS-DA models.

2.9. Identification of enhancers

Heintzman et al. (2007) have shown that enhancers are enriched with H3K4me1 and CBP/p300 protein, which form the signature of active transcriptional enhancers. In the present study, to predict the enhancers, the H3K4me1 regions and CBP/p300 bound regions were obtained from modENCODE (Consortium, 2010) for different time points, from early stage to 12 hours. These regions were then overlapped, which narrowed down the search space.

3. RESULTS

3.1. Higher association and CRMs among the TFs at early developmental stages

The experimentally determined TFBS information (Macarthur et al., 2009; Contrino et al., 2012) was used to determine the cooperative binding or association among the factors across the genome. Based on the developmental stages considered, the data were divided into two groups: 0–3 and 0–12 hours. From the experimentally identified genomic binding locations, closely associated or co-occurring TFs were used to derive a network (Fig. 1A). Factors Vielfaltig/Zelda (zld), Ultrabithorax (Ubx), Zn finger homeodomain 1 (zfh1), twist (twi), and Stat92E (stat) were observed to be highly connected with all the factors in the experimental study. These factors were followed by snail (sna), schnurri (shn), zeste (z), paired (prd), sloppy paired 1 (slp), runt (run), medea (med), and Mothers against dpp (Mad). Previous studies have indicated that these factors are involved in the developmental processes (Lewis et al., 2000; Castanon et al., 2001; Czermin et al., 2002; Leatherman and Dinardo, 2008; Liang et al., 2008). The observation of high connectedness of these factors in the network indicates that their co-occurrence is essential for the process of development.

FIG. 1.

TF-TF interactions derived from experimental data (Giot et al., 2003; Adryan and Teichmann, 2010). (A) Overall network. Each node represents the TF, and the edge represents the connection between two TFs. The width of the edge represents the strength of connectedness, and the size of the node represents the eigenvector centrality (Bonacich, 2007), which represent the influence of a node in a network. (B) TF-TF interactions of the factors studied for 0–3 hours in ChIP experiments. (C) The network obtained for the factors studied for 0–12 hours in the ChIP experiments. (D and E) Distribution of the factors with respect to prd and zld, respectively. The positions of other factors are represented by blue dots; the red bars show the range of distribution of the factors. In the plot, the prd and zld are centered at 0. TF, transcription factor.

The number of factors in the 0–3 hours (Fig. 1B) and 0–12 hours (Fig. 1C) data sets were 23 and 15, respectively. The factors present at 0–3 hours (Fig. 1B) seem to associate very strongly as presented in the thickness of the connections in the overall interaction (Fig. 1A) and can be observed as the factors toward the center of the network. Investigating the number of co-occurrence of each factor with others revealed that the factor prd was connected with other 21 factors. The mutual positioning of the other factors in the whole genome with respect to prd was investigated to determine if there were any positional preferences in terms of their distances from each other (Fig. 1D). Figure 1D reveals that prd combines with other factors in an overlapping manner, for example, Mothers against dpp (Mad), twist (twi), daughterless (da), and so on, which is shown using blue dots in the figure. The spread of these factors with respect to the prd binding site was very restricted and is represented by the red bars in Figure 1D. The second most highly connected factor was zld in the 0–3 hours network (Fig. 1E). This analysis shows that the ChIP data used in the analysis were of good resolution. This resolution can be understood from the distribution of the factors with respect to prd and zld in Figure 1D and E, respectively. The distribution represented by the red bar shows that the spread is confined.

In the 0–3 hours data, a previously reported daughterless-twist (da-twi) (Wong et al., 2008) was observed as the highest co-occurring factors. Following the pair, dorsal (dl) was found to be paired with several other common factors, including Zelda, Twist, Dichaete, and Daughterless. The co-occurrence of many factors with Dorsal signifies Dorsal as being an important factor during the early developmental stage of Drosophila. This pair was followed by many factors paired with Ultrabithorax (Ubx). The factors paired with Ubx were as follows: bab, cnc, GAGA factor (GAF), and Distal-less (Dll). However, the most highly interacting factors were zld, twi, and z in the 0–3 hours network; and zfh and ubx in the 0–12 hours network. Previous studies show that these are essential developmental factors. This study also reveals the importance of these factors from the binding sites association investigation.

3.2. Predicted TFBS associations

Co-occurrence of factors from JASPAR database (Mathelier et al., 2015) was identified following the criteria used in the previous section (Fig. 2A). The connections represent the pairwise combination of two factors in promoters. The width of the edge between two factors represents the count of pairs in the promoters. In the network, mirr was found to be the most highly connected. This high interaction, however, was due to the short and low complexity motif of mirr, which increased the possibility of discovering a large number of random hits. Mirr was therefore removed from the downstream analysis to avoid the noise.

FIG. 2.

TF interaction network generated from predicted TFBSs. (A) TF-TF interaction network generated using the predicted TFBS from the JASPAR database. TFs are represented by the node; the edge represents the corresponding connection between two TFs. The thickness of the edge represents the strength of connectedness, and the size of the node represents the eigenvector centrality, which represents the influence of a node in a network. Node colors show the time of peak expression of the factors: light blue for 0–6 hours, followed by deeper blue for 7–12 hours, and yellow for later than 12 hours. (B–D) The expression pattern of hb, onecut, and Cf2_II across different developmental time points of Drosophila. TFBS, transcription factor binding site.

The obtained network was found to have three distinct clusters or modules; these were termed as the first, second, and third module. The expression pattern of all the factors along different developmental stages of Drosophila was integrated into the network (Graveley et al., 2010). The nodes of the network were colored to show the time point at which the expression of TFs reaches the peak: 0–6 hours, 7–12 hours, and later than 12 hours (Fig. 2A). Interestingly, the first module was popularized by the factors with peak expression during the 0–6 hours of development. The second module was populated with both 0–6 and 7–12 hour factors, and the third module was populated with factors of later than 12 hours of development.

Until recently, of around 700 TFs, only 38 were experimentally mapped to the Drosophila genome. To determine whether there were any other possible associations of factors not otherwise included in the ChIP experiments (Contrino et al., 2012), the ChIP experiment was overlapped with the predicted co-occurrence network from JASPAR. Common factors between the two networks were selected, and a subnetwork was generated from these selected factors as shown in Figure 3. The green nodes in the network indicate those factors common between the two networks, whereas the orange ones are the factors only identified in one of the networks. Essentially, this analysis highlighted the additional associations with the factors in JASPAR, which were otherwise missing from the experimental data. The edge widths of the networks were also adjusted based on the co-occurrence count. From this analysis, we identified factors with strong co-occurrence counts, such as Mother against dpp (Mad)–snail (sna)–brinker (brk)–CTCF, huckebein (hkb)–Mad–Buttonhead (btd), caudal (CAD)–CG4328–hunchback (hb), and Homeobox protein caupolican (caup)–hunchback (hb)–Homeobox protein araucan (ara). These factors individually have a role in chromatin organization and developmental processes. However, their roles in combinations need to be explored.

FIG. 3.

Overlapping network. The network generated with the common TFs between the ChIP experiment and the JASPAR database.

3.3. CRMs in enhancers

Previous studies and comparative genomics have revealed that regulatory elements involved in early eukaryotic development are located at remote sites, far away from the TSS of their target genes (Weigel et al., 1990; Lettice et al., 2003; Nobrega et al., 2003; Woolfe et al., 2005; Borok et al., 2010). In this study, the binding sites were categorized as proximal and distal according to their position with respect to the TSS and the enhancers. If the distance of the binding sites were ≤1000 bp to the TSS than we regarded them as proximal and if the distance of the binding sites were >1000 bp from TSS, we considered them as distal/enhancers. The networks were generated by connecting two TFs if their binding sites are present close to each other within ±100 bp (Supplementary Figs. S2 and S3). We observed around 15,000 and 36,000 sites in the proximal and distal regions, respectively. The network generated from these segregated data revealed that Ultrabithorax (Ubx), Twist (twi), Dorsal (dl), and dachs (d) were highly co-occurring factors in the proximal region. However, Ubx was not detected to be interacting in this manner at enhancers. In enhancers, Runt (run), Zelda (zld), Invected (inv), Sloppy paired 1 (slp1), and Giant (gt) were found to be highly connected with other factors. This difference indicates that these factors exhibit a preferential binding pattern with respect to the TSS.

3.4. Genome-wide identification of the HOT sites

Some recent studies have shown the presence of large numbers of TFBSs at the regulatory regions, termed as the HOT regions, and large numbers of single factor binding sites available in the HOT regions in Drosophila have been demonstrated (Bylesjo et al., 2006; Roy et al., 2010). The binding profiles of 38 early embryonic TFs were used to identify 21,146 genomic regions to which between 1 and up to 22 TFs were bound within 200 bp of each other. The HOT regions were scored or a HOTness value was assigned to each HOT regions based on the occurrence of the TFBS. A total of 1244 regions were found to be the HOT regions, which comprised 5% of all binding sites. However, most binding sites were single factor binding sites or Class I binding sites, which comprised 67% of the total (Supplementary Fig. S4).

3.5. Known motifs of TFs are depleted in the HOT regions

The known motifs of the TFs in the different HOTness class regions were examined to determine whether the recruitment of factors to the HOT regions was specific. The enrichment scores of known regulatory motifs of the 30 TFs were calculated throughout all TFBSs. To determine the enrichment score, we have used an in-house built algorithm written in Perl where we searched the PWM around a genomic region in bins. The length normalization was carried out to optimize the variation in genomic regions. The motif score cutoff was set based on the average enrichment score in a similar number of the randomly selected control region from the genome of variable length. The motif enrichment was calculated using the hypergeometric test by comparing with the probability of observing the given number (or more) of target sequences with the motif by chance. The aim was to observe if the presence of individual TFs could be predicted using these motifs. Most of the TFs were commonly present in the HOT regions while their DNA motifs were not (e.g., Hairy and Snail). Other factors, such as GAF, Dorsal, and Bicoid, seemed to correlate well with their binding motifs. These differences are not likely to be caused by differences in motif quality since most motifs correlate well with the TFs presence at the sites of low HOTness. It is well known that many TFs interact physically, and it may be true that some are mainly recruited to the HOT regions by other TFs.

3.6. HOT regions have a specific sequence signature

Thirteen TFs were selected based on their enrichment in the high HOTness HOT (Fig. 4, Clusters 1 and 2). These 13 TFs were used to identify the motif enrichment in the HOT regions. After the de novo motif discovery for all the 13 TFs, both the known motifs and the predicted motifs were mapped across the peak centers of each TFBS. In most cases, motifs of the selected TFs were not enriched in the HOT regions, whereas the presence of some TFs' motifs decreased with increasing HOTness. Although some TFs were enriched in the HOT region, their dependency on motifs to bind in the HOT regions disappears. However, three de novo motifs of TFs (Giant, Dichaete, and Knirp) were enriched in the HOT regions. Their predicted motifs were similar, and motifs of these three factors reported earlier do not seem to be present in their binding sites. De novo motif discovery in the HOT regions revealed an enrichment of a common motif CAGGtA in both promoter and intronic HOT regions (Supplementary Fig. S5). This motif is very similar to the zld motif. Other predicted motifs do not match with any other known motifs of TFs of D. melanogaster.

FIG. 4.

TF occupancy among different classes of HOTness. Occupancy (in percentage) is calculated according to the presence of the corresponding TF against the total number of TFBS within a particular class. TF occupancy is calculated according to their presence as a percentage in a particular class. For the 13 TFs (Clusters 1 and 2), the occupancy increases with increasing HOTness value. Clusters 1 and 2 represent the HOT sites. K-means clustering was used to group the 13 TFs into 4 clusters. HOT, high-occupancy target.

4. DISCUSSION

The CREs in genome contain a set of TFBSs. The nonrandom binding of TFs or CRMs to the CREs determines the transcriptional regulatory activity of them. Therefore, identifying and characterizing these modules is highly essential to understand the gene regulation mechanism. The genome of Drosophila contains around 14,000 genes, and ∼708 genes are TFs (Hammonds et al., 2013; Rhee et al., 2014). However, to the best of our knowledge, the experimentally determined genome-wide binding information is available only for the 38 TFs (Macarthur et al., 2009; Contrino et al., 2012). Therefore, the associations among other factors have not been determined. The factors examined in these experimental studies were mostly developmental factors. The studies investigated the binding site association of these 38 factors and disclosed that some of these factors are associated strongly with other factors, such as zld, z, ubx, and zfh. This strong association may indicate that these factors play an important role in a concerted manner during development. To compensate the limit of 38 experimentally studied factors, computational prediction of the binding sites was therefore integrated with experimental data to compensate for this gap in our knowledge.

The primary objective of this study was to identify the novel association and interactions of the TFBSs. This integrative analysis identified novel associations, such as Mad-sna-brk-CTCF, hkb-Mad-btd, pnr-slp1-BEAF_32, and CAD-CG4328-hb. From the gene ontologies from FlyBase (flybase.org) (Gramates et al., 2017), it is evident that these factors play a role in development. Therefore, it is crucial to study these associations and determine the role they play together. In addition, CRMs reported in previous studies were also detected in our study. For example, factors defining the anterior–posterior axis of the embryo Bicoid (bcd), Caudal (cad), hunchback (hb), Kruppel (kr), and Knirps (kni) were detected. Factors participating in development and differentiation, such as repo, zen, unpg, and Lim1, were also observed as CRMs. Apart from the integrative analysis, the genome-wide binding profile of 38 TFs in Drosophila was exploited to determine the associations among them.

In addition to the identification of the existing associations such as daughterless-twist (da-twi), we discovered novel associations such as between kn and zfh. These factors are individually involved in the formation and development of somatic muscle and wing pattern formation (De Celis, 2003; Leatherman and Dinardo, 2008). Involvement of this combination in any biological or developmental activity of Drosophila was not found in the curated Drosophila transcriptional CRM database, “REDfly” (v5.2.2) (Gallo et al., 2011). From FlyBase (Gramates et al., 2017), however, it was found that kn and zfh take part in the developmental process independently. It needs to be established, what function they perform being together. Some genes, such as Lamp1, sxc, Eph, Hand, and CalpA, were found to have this combination in their proximal promoter, and FlyBase reveals that these genes are involved in the developmental processes of Drosophila. Moreover, expression data reveal that these genes are expressed during the early stages of development. This observation suggests that the novel combination, kn-zfh, may regulate the expression of these genes. We do not yet know what function they perform in combination. Other enriched pairs of TFs were also observed with Ubx. Marin et al. (2012) demonstrated that Ubx plays a critical role in conferring segment-appropriate morphology. The factors paired with Ubx were as follows: bab, cnc, GAF, and Distal-less (Dll).

Previous studies have revealed that these factors are involved in the developmental processes, including limb formation, labral and mandibular development, and pattern formation (Kopp et al., 2000; Lewis et al., 2000; Kojima, 2004; Coulcher and Telford, 2012). Since these factors individually are involved in the developmental process, they might cooperate with each other to accomplish their function, and their associations should be studied further to investigate their combinatorial role in development. The co-occurrence of central regulatory factors, such as dorsal or zld, was also observed in the early hours of development but not in later stages. Dorsal was found to be paired with a number of other common factors, including Zelda, Twist, Dichaete, and Daughterless. Dorsal is also known from previous studies as the focal protein for dorsoventral polarity in developing Drosophila fly (Roth et al., 1989; Rushlow et al., 1989; Steward, 1989; von Ohlen and Doe, 2000). The observation of Dorsal as the central factor in the present study corroborates these other findings concerning its importance during development.

In the present study, the co-occurrence of the factors was investigated in the enhancers. It was found that Zelda remained as the most connected factor. Factors co-occurring with Zelda are known for their involvement in development. For example, Dichaete is involved in central nervous system (CNS) development in Drosophila (Ma et al., 2000; Overton et al., 2002), and daughterless is involved in sex determination and neural differentiation (Vaessin et al., 1994; Zeitlinger et al., 2007). The higher occurrence of Zelda was followed by Dichaete or Sox Box protein of high-mobility group in enhancers. Previous studies have shown that this factor, along with enhancers-binding protein, is capable of binding and bending DNA (Travis et al., 1991). Enhancers are also known to be able to regulate multiple genes (Mohrs et al., 2001; Pennacchio et al., 2013). Furthermore, from a previous study (Liang et al., 2008), it is clear that Zelda initiates the transcriptional activation during the initial stages of development. The observed positioning of Zelda and Dichaete binding sites at enhancers may, therefore, be strategically necessary for the initiation of transcriptional activation of a vast number of genes during the early stage of development. This observation corroborates the findings of previous work that regulatory elements involved in early eukaryotic development are located at remote sites, far away from the TSS of the target genes (Weigel et al., 1990; Lettice et al., 2003; Nobrega et al., 2003; Woolfe et al., 2005; Borok et al., 2010).

Investigating the number of co-occurrence of each factor with others in the network (Fig. 1A) revealed that the factors prd and zld were most highly connected factors. However, little information regarding the interaction of prd with other TFs has been reported in the literature to date. Figure 1D and E reveals that these factors combine with other factors in an overlapping manner. Higher connections with zld were expected, as a previous study using global expression profiling confirmed that zld has a major role in the activation of the early zygotic genome and suggested that zld may also regulate maternal RNA degradation during the maternal-to-zygotic transition (Liang et al., 2008).

By using these 38 TFBSs from the experimental data, we were able to determine a large number of HOT sites. It was observed that the motifs of known TFs were depleted in these HOT regions. The absence of a known motif was not due to the low quality of the motif because most of the motifs correlated well with a single factor binding site or region with a lower HOTness value. Therefore, to understand the phenomenon, we searched for a de novo motif for each TF among the HOT sites and single binding sites of the corresponding TF. De novo motif predictions also exhibit the same phenomenon. This indicates that the binding of TFs to HOT sites is a more complex process than being merely by sequence specificity. Binding of one TF to the genome may recruit additional TFs in the neighborhood through protein–protein interactions. TFs in the HOT sites tend to be highly interactive with other TFs in the protein–protein interaction network (Fig. 5).

FIG. 5.

Protein–protein interaction network. Protein interaction networks based on human interologs data collected from DroPNet (Yu et al., 2008; Renaud et al., 2012) and Cytoscape (Shannon et al., 2003) were generated by using the information for known interactions of human proteins to show the interaction of their orthologs in Drosophila.

For example, Mad, Ubx, dl, and bcd all seem to be highly interactive with other TFs; this was also shown in the computationally generated network (Figs. 1A and 2A). Furthermore, to determine the missing interactions in the integrated experimentally and computationally generated networks, we explored protein–protein interaction networks. In doing so, we found that twist (twi) and daughterless (da), as well as knirps (kni) and Kruppel (Kr), were involved in the protein–protein interaction. The protein–protein interaction network also suggests the presence of intermediary TFs that might help to recruit other TFs to the HOT sites by protein–protein interactions. More experimental data would provide us with the true structure and function of TF networks and their role in development. The HOT regions were also found to be enriched with the Giant, Dichaete, and Knirp binding motifs. A common TAG team motif, CAGGtA, was observed at both promoters and introns HOT regions, which might play a role in recruiting other TFs.

5. CONCLUSIONS

Our results were drawn from the resources available including the information of the TFBS, which has yet to be completed. The efficiency of such an approach depends primarily on the data generated from experimental studies. However, by integrating the existing resources with the ChIP sequencing data and the computational prediction of TFBSs, we have been able to identify novel combinations. These hypothetical CRMs so discovered should be studied further to determine their role in the development of Drosophila. Although we only may claim co-occurrence of proteins in this study, it may actually point to their interaction (as known interaction proteins typically co-occur together). To check the latter hypothesis, experimental verification is required. As more information becomes available in the future, more precise predictions will be possible. Such enriched information could then lead to a better understanding of developmental processes and improve our knowledge of any interactions involved. Such an approach would also be of great benefit in elucidating the interacting pairs of factors involved in diseases. Although we discovered novel interactions in this study, further study will be required to establish the full range of functions these factors perform when they operate cooperatively.

Footnotes

ACKNOWLEDGMENT

We thank Dr. Mads Kaern for critically reading the article, providing helpful comments, editing, and proofreading the article.

AUTHOR DISCLOSURE STATEMENT

The authors declare they have no conflicting financial interests.

FUNDING INFORMATION

S.N. was supported by a Ramalingaswami Fellowship (BT/RLF/Re-entry/48/2013) from the Department of Biotechnology, Government of India. The funding body had no role in the design of the study, data collection, analysis, interpretation of data, writing of the article, or the decision to publish.

Supplementary Material

References

Adryan

, and Teichmann

S.A.

2010. The developmental expression dynamics of Drosophila melanogaster transcription factors. Genome Biol. 11, R40.

Aerts

2012. Computational strategies for the genome-wide identification of cis-regulatory elements and transcriptional targets. Curr. Top Dev. Biol. 98, 121–145.

Aguilar

, and Oliva

2008. Topological comparison of methods for predicting transcriptional cooperativity in yeast. BMC Genomics, 9, 137.

Berman

B.P.

, Nibu

, Pfeiffer

B.D.

, et al. 2002. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. U. S. A. 99, 757–762.

Blanchette

, Bataille

A.R.

, Chen

, et al. 2006. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 16, 656–668.

Bonacich

2007. Some unique properties of eigenvector centrality. Soc. Netw. 29, 9.

Borok

M.J.

, Tran

D.A.

, Ho

M.C.W.

, et al. 2010. Dissecting the regulatory switches of development: Lessons from enhancer evolution in Drosophila. Development (Cambridge, England), 137, 5–13.

Bulyk

M.L.

2004. Computational prediction of transcription-factor binding site locations. Genome Biol. 5, 201.

Bylesjo

, Rantalainen

, Cloarec

, et al. 2006. OPLS discriminant analysis: Combining the strengths of PLS-DA and SIMCA classification. J. Chemometr. 20, 341–351.

10.

Calo

, and Wysocka

2013. Modification of enhancer chromatin: What, how, and why?. Mol. Cell. 49, 825–837.

11.

Cao

, Yao

, Sarkar

, et al. 2010. Genome-wide MyoD binding in skeletal muscle cells: A potential for broad cellular reprogramming. Dev. Cell. 18, 662–674.

12.

Castanon

, Von Stetina

, Kass

, et al. 2001. Dimerization partners determine the activity of the Twist bHLH protein during Drosophila mesoderm development. Development, 128, 3145–3159.

13.

Chiang

D.Y.

, Moses

A.M.

, Kellis

, et al. 2003. Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts. Genome Biol. 4, R43.

14.

Consortium

T.M.

2010. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science, 27, 157–187.

15.

Contrino

, Smith

R.N.

, Butano

, et al. 2012. modMine: Flexible access to modENCODE data. Nucleic Acids Res. 40, D1082–D1088.

16.

Coulcher

J.F.

, and Telford

M.J.

2012. Cap ‘n'collar differentiates the mandible from the maxilla in the beetle Tribolium castaneum. Evodevo, 3, 25.

17.

Czermin

, Melfi

, Mccabe

, et al. 2002. Drosophila enhancer of Zeste/ESC complexes have a histone H3 methyltransferase activity that marks chromosomal Polycomb sites. Cell, 111, 185–196.

18.

De Celis

J.F.

2003. Pattern formation in the Drosophila wing: The development of the veins. Bioessays, 25, 443–451.

19.

Durinck

, Moreau

, Kasprzyk

, et al. 2005. BioMart and Bioconductor: A powerful link between biological databases and microarray data analysis. Bioinformatics, 21, 3439–3440.

20.

Frith

M.C.

, Hansen

, and Weng

2001. Detection of cis-element clusters in higher eukaryotic DNA. Bioinformatics, 17, 878–889.

21.

Frith

M.C.

, Li

M.C.

, and Weng

2003. Cluster-Buster: Finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. 31, 3666–3668.

22.

Gallo

S.M.

, Gerrard

D.T.

, Miner

, et al. 2011. REDfly v3.0: Toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Res. 39, D118–D123.

23.

Gershenzon

N.I.

, Stormo

G.D.

, and Ioshikhes

I.P.

2005. Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites. Nucleic Acids Res. 33, 2290–2301.

24.

Giot

, Bader

J.S.

, Brouwer

, et al. 2003. A protein interaction map of Drosophila melanogaster. Science, 302, 1727–1736.

25.

Gotea

, and Ovcharenko

2008. DiRE: Identifying distant regulatory elements of co-expressed genes. Nucleic Acids Res. 36, W133–W139.

26.

Gramates

L.S.

, Marygold

S.J.

, Santos

G.D.

, et al. 2017. FlyBase at 25: Looking to the future. Nucleic Acids Res. 45, D663–D671.

27.

Graveley

B.R.

, Brooks

A.N.

, Carlson

J.W.

, et al. 2011. The developmental transcriptome of Drosophila melanogaster. Nature, 471, 473–479.

28.

Grove

C.A.

, and Walhout

A.J.M.

2008. Transcription factor functionality and transcription regulatory networks. Mol. Biosyst. 4, 309.

29.

Grundy

W.N.

, Bailey

T.L.

, Elkan

C.P.

, et al. 1997. Meta-MEME: Motif-based hidden Markov models of protein families. Comput. Appl. Biosci. 13, 397–406.

30.

Guhathakurta

, and Stormo

G.D.

2001. Identifying target sites for cooperatively binding factors. Bioinformatics (Oxford, England), 17, 608–621.

31.

Guruharsha

K.G.

, Rual

J.-F.

, Zhai

, et al. 2011. A protein complex network of Drosophila melanogaster. Cell, 147, 690–703.

32.

Hallikas

, Palin

, Sinjushina

, et al. 2006. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell, 124, 47–59.

33.

Hammonds

A.S.

, Bristow

C.A.

, Fisher

W.W.

, et al. 2013. Spatial expression of transcription factors in Drosophila embryonic organ development. Genome Biol. 14, R140.

34.

Hardison

R.C.

, and Taylor

2012. Genomic approaches towards finding cis-regulatory modules in animals. Nat. Rev. Genet. 13, 469–483.

35.

Heintzman

N.D.

, Stuart

R.K.

, Hon

, et al. 2007. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318.

36.

Hon

G.C.

, Hawkins

R.D.

, and Ren

2009. Predictive chromatin signatures in the mammalian genome. Hum. Mol. Genet. 18, R195–R201.

37.

Y.F.

2005. Efficient and high quality force-directed graph drawing. Math. J. 10, 34.

38.

Jiang

, and Singh

2014. CCAT: Combinatorial Code Analysis Tool for transcriptional regulation. Nucleic Acids Res. 42, 2833–2847.

39.

Kazemian

, Pham

, Wolfe

S.A.

, et al. 2013. Widespread evidence of cooperative DNA binding by transcription factors in Drosophila development. Nucleic Acids Res. 41, 8237–8252.

40.

Kellis

, Patterson

, Endrizzi

, et al. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature, 423, 241–254.

41.

Kim

, Cunningham

, James

, et al. 2010. Functional characterization of transcription factor motifs using cross-species comparison across large evolutionary distances. PLoS Comput Biol. 6, e1000652.

42.

Kojima

2004. The mechanism of Drosophila leg development along the proximodistal axis. Dev. Growth Differ. 46, 115–129.

43.

Kopp

, Duncan

, Godt

, et al. 2000. Genetic control and evolution of sexually dimorphic characters in Drosophila. Nature, 408, 553–559.

44.

Krivan

, and Wasserman

W.W.

2001. A predictive model for regulatory sequences directing liver-specific transcription. Genome Res. 11, 1559–1566.

45.

Kvon

E.Z.

, Stampfel

, Omar Yáññez-Cuna

, et al. 2012. HOT regions function as patterned developmental enhancers and have a distinct cis-regulatory signature. Genes Dev. 26, 908–913.

46.

Leatherman

J.L.

, and Dinardo

2008. Zfh-1 controls somatic stem cell self-renewal in the Drosophila testis and nonautonomously influences germline stem cell self-renewal. Cell Stem Cell. 3, 44–54.

47.

Lettice

L.A.

, Heaney

S.J.H.

, Purdie

L.A.

, et al. 2003. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735.

48.

Lewis

D.L.

, Decamillis

, and Bennett

R.L.

2000. Distinct roles of the homeotic genes Ubx and abd-A in beetle embryonic abdominal appendage development. Proc. Natl. Acad. Sci. U. S. A. 97, 4504–4509.

49.

Liang

H.-L.

, Nien

C.-Y.

, Liu

H.-Y.

, et al. 2008. The zinc-finger protein Zelda is a key activator of the early zygotic genome in Drosophila. Nature, 456, 400–403.

50.

Ludwig

M.Z.

, Patel

N.H.

, and Kreitman

1998. Functional analysis of eve stripe 2 enhancer evolution in Drosophila: Rules governing conservation and change. Development, 125, 949–958.

51.

, Certel

, Gao

, et al. 2000. Functional interactions between Drosophila bHLH/PAS, Sox, and POU transcription factors regulate CNS midline expression of the slit gene. J. Neurosci. 20, 4596–4605.

52.

Macarthur

, Li

X.Y.

, Li

, et al. 2009. Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol. 10, R80.

53.

Marin

E.C.

, Dry

K.E.

, Alaimo

D.R.

, et al. 2012. Ultrabithorax confers spatial identity in a context-specific manner in the Drosophila postembryonic ventral nervous system. Neural Dev. 7, 31.

54.

Markstein

, Markstein

, et al. 2002. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl. Acad. Sci. U. S. A. 99, 763–768.

55.

Marygold

S.J.

, Leyland

P.C.

, Seal

R.L.

, et al. 2013. FlyBase: Improvements to the bibliography. Nucleic Acids Res. 41, D751–D757.

56.

Mathelier

, Fornes

, Arenillas

D.J.

, et al. 2015. JASPAR 2016: A major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44, gkv1176.

57.

Matys

, Kel-Margoulis

O.V.

, Fricke

, et al. 2006. TRANSFAC and its module TRANSCompel: Transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110.

58.

Mohrs

, Blankespoor

C.M.

, Wang

Z.E.

, et al. 2001. Deletion of a coordinate regulator of type 2 cytokine expression in mice. Nat. Immunol. 2, 842–847.

59.

Moorman

, Sun

L.V.

, Wang

, et al. 2006. Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster. Proc. Natl. Acad. Sci. U. S. A. 103, 12027–12032.

60.

Nandi

, Blais

, and Ioshikhes

2013. Identification of cis-regulatory modules in promoters of human genes exploiting mutual positioning of transcription factors. Nucleic Acids Res. 41, 8822–8841.

61.

Nandi

, and Ioshikhes

2012. Optimizing the GATA-3 position weight matrix to improve the identification of novel binding sites. BMC Genomics. 13, 416.

62.

Negre

, Brown

C.D.

, Ma

, et al. 2011. A cis-regulatory map of the Drosophila genome. Nature, 471, 527–531.

63.

Nobrega

M.A.

, Ovcharenko

, Afzal

, et al. 2003. Scanning human gene deserts for long-range enhancers. Science, 302, 413.

64.

Ong

C.T.

, and Corces

V.G.

2011. Enhancer function: New insights into the regulation of tissue-specific gene expression. Nat. Rev. Genet. 12, 283–293.

65.

Overton

P.M.

, Meadows

L.A.

, Urban

, et al. 2002. Evidence for differential and redundant function of the Sox genes Dichaete and SoxN during CNS development in Drosophila. Development, 129, 4219–4228.

66.

Paris

, Kaplan

, Li

X.Y.

, et al. 2013. Extensive divergence of transcription factor binding in Drosophila embryos with highly conserved gene expression. PLoS Genet. 9, e1003748.

67.

Pennacchio

L.A.

, Bickmore

, Dean

, et al. 2013. Enhancers: Five essential questions. Nat. Rev. Genet. 14, 288–295.

68.

Philip

, Pettersson

, and Stenberg

2012. Sequence signatures involved in targeting the male-specific lethal complex to X-chromosomal genes in Drosophila melanogaster. BMC Genomics, 13, 97.

69.

Pilpel

, Sudarsanam

, and Church

G.M.

2001. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat. Genet. 29, 153–159.

70.

Rajewsky

, Vergassola

, Gaul

, et al. 2002. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics, 3, 30.

71.

Renaud

, Baillif

, Perez

J.-B.

, et al. 2012. DroPNet: A web portal for integrated analysis of Drosophila protein-protein interaction networks. Nucleic Acids Res. 40, W134–W139.

72.

Rhee

D.Y.

, Cho

D.Y.

, Zhai

, et al. 2014. Transcription factor networks in Drosophila melanogaster. Cell Rep. 8, 2031–2043.

73.

Roh

T.Y.

, Cuddapah

, and Zhao

2005. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping. Genes Dev. 19, 542–552.

74.

Roth

, Stein

, and Nüsslein-Volhard

1989. A gradient of nuclear localization of the dorsal protein determines dorsoventral pattern in the Drosophila embryo. Cell, 59, 1189–1202.

75.

Roy

, Ernst

, Kharchenko

P.V.

, et al. 2010. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science (New York, N.Y.), 330, 1787–1797.

76.

Rushlow

C.A.

, Han

, Manley

J.L.

, et al. 1989. The graded distribution of the dorsal morphogen is initiated by selective nuclear transport in Drosophila. Cell, 59, 1165–1177.

77.

Shannon

, Markiel

, Ozier

, et al. 2003. Cytoscape: A software Environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504.

78.

Sharan

, Ben-Hur

, Loots

G.G.

, et al. 2004. CREME: Cis-regulatory module explorer for the human genome. Nucleic Acids Res. 32, W253–W256.

79.

Siepel

, Bejerano

, Pedersen

J.S.

, et al. 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050.

80.

Siggers

, Duyzend

M.H.

, Reddy

, et al. 2011. Non-DNA-binding cofactors enhance DNA-binding specificity of a transcriptional regulatory complex. Mol. Syst. Biol. 7, 1–14.

81.

Sinha

, Liang

, Siggia

2006. Stubb: A program for discovery and analysis of cis-regulatory modules. Nucleic Acids Res. 34, W555–W559.

82.

Slattery

, Riley

, Liu

, et al. 2011. Cofactor binding evokes latent differences in DNA binding specificity between hox proteins. Cell, 147, 1270–1282.

83.

Small

, Blair

, and Levine

1992. Regulation of even-skipped stripe 2 in the Drosophila embryo. EMBO J. 11, 4047–4057.

84.

Spivak

A.T.

, and Stormo

G.D.

2016. Combinatorial Cis-regulation in Saccharomyces Species. G3 (Bethesda, Md.), 6, 653–667.

85.

Stanyon

C.A.

, Liu

, Mangiola

B.A.

, et al. 2004. A Drosophila protein-interaction map centered on cell-cycle regulators. Genome Biol. 5, R96.

86.

Steward

1989. Relocalization of the dorsal protein from the cytoplasm to the nucleus correlates with its function. Cell, 59, 1179–1188.

87.

Stormo

G.D.

2000. DNA binding sites: Representation and discovery. Bioinformatics, 16, 16–23.

88.

, Teichmann

S.A.

, and Down

T.A.

2010. Assessing computational methods of cis-regulatory module prediction. PLoS Comput. Biol. 6, e1001020.

89.

Suryamohan

, and Halfon

M.S.

2015. Identifying transcriptional cis-regulatory modules in animal genomes. Wiley Interdiscip. Rev. Dev. Biol. 4, 59–84.

90.

Tomancak

, Berman

B.P.

, Beaton

, et al. 2007. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 8, R145.

91.

Travis

, Amsterdam

, Belanger

, et al. 1991. LEF-1, a gene encoding a lymphoid-specific protein with an HMG domain, regulates T-cell receptor alpha enhancer function [corrected]. Genes Dev. 5, 880–894.

92.

Vaessin

, Brand

, Jan

L.Y.

, et al. 1994. Daughterless is essential for neuronal precursor differentiation but not for initiation of neuronal precursor formation in Drosophila embryo. Development (Cambridge, England), 120, 935–945.

93.

Von Ohlen

, and Doe

C.Q.

2000. Convergence of dorsal, dpp, and egfr signaling pathways subdivides the drosophila neuroectoderm into three dorsal-ventral columns. Dev. Biol. 224, 362–372.

94.

Wasserman

W.W.

, and Sandelin

2004. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5, 276–287.

95.

Weigel

, Seifert

, Reuter

, et al. 1990. Regulatory elements controlling expression of the Drosophila homeotic gene fork head. EMBO J. 9, 1199–1207.

96.

Wittkopp

P.J.

, and Kalay

2011. Cis-regulatory elements: Molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69.

97.

Wong

M.C.

, Castanon

, and Baylies

M.K.

2008. Daughterless dictates Twist activity in a context-dependent manner during somatic myogenesis. Dev. Biol. 317, 417–429.

98.

Woolfe

, Goodson

, Goode

D.K.

, et al. 2005. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3, e7.

99.

, Xie

2010. Hidden Markov model and its applications in motif findings. Methods Mol. Biol. 620, 405–416.

100.

Yao

Z.Z.

, Fong

A.P.

, Cao

, et al. 2013. Comparison of endogenous and overexpressed MyoD shows enhanced binding of physiologically bound sites. Skelet Muscle. 3, 8.

101.

, Pacifico

, Liu

, et al. 2008. DroID: The Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions. BMC Genomics. 9, 461.

102.

Yuh

C.H.

, Bolouri

, and Davidson

E.H.

1998. Genomic cis-regulatory logic: Experimental and computational analysis of a sea urchin gene. Science, 279, 1896–1902.

103.

Zeitlinger

, Zinzen

R.P.

, Stark

, et al. 2007. Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev. 21, 385–390.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.14 MB

0.21 MB

0.07 MB

0.10 MB

0.02 MB