Comparative Genomics Analysis of Trichoderma reesei Strains

Abstract

Trichoderma reesei is a key fungus for industrial production of lignocellulolytic enzymes. The genome sequences of the T. reesei hyper-cellulolytic strain RUT-C30 and its parental strain QM6a were compared at the nucleotide level. Approximately 97% of the 87 genomic-sequence scaffolds of T. reesei QM6a (33Mb) were found to have the corresponding nucleotide in the 182 genome-sequence scaffolds of RUT-C30 (32Mb). There are 455 loci within the QM6 sequence not detected in the RUT-C30 sequence. Regions at the termini of QM6a scaffolds as well as 14 small scaffolds do not have corresponding regions in RUT-C30 genomic scaffolds. Seventy-eight protein-encoding genes are included within these regions. Mutated nucleotide(s) in 2,371 positions, including short insertion/deletions (indels), were detected in the aligned regions. The predicted protein-coding regions of 97 gene models contain mutations, 34 of which were not previously described. Twenty-seven out of 34 newly discovered genes were found to have mutations in the peptide amino acid sequence. This is in addition to 63 genes described in a previous study based on low coverage sequencing of RUT-C30. These newly identified proteins are involved in signal transduction, transcription, RNA processing and modification, and post-translational modification according to their annotations. Similar distributions of eukaryotic orthologous group (KOG) categories between the mutated and all other proteins suggest random mutation. The roles of the mutated genes and potential regulatory regions in the observed phenotype of RUT-C30 remain to be explored in a targeted fashion.

Introduction

Trichoderma reesei (teleomorph Hypocrea jecorina) is a soft-rot ascomycete fungus that has been used as an industrial production host for cellulytic and hemicellulolytic enzymes. The wild type strain QM6a was collected in the Solomon Islands during World War II and is the parent of a number of high enzyme-secreting strains derived by classical mutagenesis. Because it is the origin of several hyper-cellulolytic mutants, it is well regarded as a model in industrial fungal biotechnology.

A wide variety of filamentous fungi, including T. reesei, are used as production hosts for a multitude of proteins.^1,2 Strains of T. reesei are reported to produce a complex mix of cellulolytic enzymes in excess of 100 g/L.³ The amount of the secreted enzymes is significant as it is not only elevated for cellulase production, but also more generally for protein secretion.

The majority of T. reesei mutants used in industry have been generated from the wild type QM6a strain. Because RUT-C30 and other T. reesei strains were generated by traditional mutagenesis and screening, the nature of the mutations is not fully characterized.⁴ Knowledge of the genetic factors responsible for the cellulase hyper-producing phenotype is important for improving biomass saccharification as well as for enhancing the expression of heterologous proteins. Genome-wide analysis of the mutations in RUT-C30 lays a genetic foundation that may be translated into more economical production of enzymes needed for renewable bioenergy applications.

Prior to the genome-wide mutation analysis using massively parallel DNA sequencing, three mutations in T. reesei RUT-C30 were known: a truncation of the cre1 gene (tre120117), which is a key regulator in carbon catabolite repression; a frameshift mutation in the glucosidase II alpha subunit gene gls2 (tre121351) involved in protein glycosylation; and an 85-kb deletion that eliminated 29 genes.^5

–8 However, the mutant RUT-C30 has been known to differ from wild type in its growth morphology, as well as its hyper-cellulase-producing and catabolite-repressing phenotypes. There are likely to be multiple genes that, when mutated, account for these phenotypes.

The genome of the wild type strain QM6a was sequenced by the conventional method in 2008.⁹ The comparison of genome sequences between the hyper-cellulose-producing mutant RUT-C30 and the wild type QM6a has been intriguing, with two studies published so far.^5,10 Vitikainen et al. used the array comparative genomic hybridization method to analyze the genome with 45–85 nucleotide probes and a mean tiling distance of ∼15 bp.¹⁰ They found five rearrangement break points and 126 positions mutated in RUT-C30. Le Crom et al. mapped read sequences generated from massively parallel sequencing of the wild type QM6a genome, They identified 223 single nucleotide variations between the two strains, and found that 42 proteins were affected by the mutation.⁵ In this study, we generated a de novo genomic assembly and a set of gene models for RUT-C30 to show the results of comparative genomic analysis between this strain and the wild type T. reesei strain QM6a.

Materials and Methods

Genome Assembly

The T. reesei RUT-C30 genome was sequenced using the Illumina (San Diego, CA) platform (2×150 0.2kb ill-std-pe IlluminaGAIIx, 2×76 4kb ill-pe-crelox IlluminaGAIIx, 2×76 16kb ill-pe-jmp IlluminaGAIIx) (Table 1). All general aspects of library construction and sequencing can be found at the Department of Energy Joint Genome Institute (JGI) website.¹¹ Each fastq file was filtered for artifact/process contamination and subsequently assembled with AllPathsLG release version R36892 (Broad Institute, Cambridge, MA).¹² This resulted in a 208x coverage assembly with 182 scaffolds, 1,015 kb scaffold N50, 3,2689,233 bp total scaffold, 623 contigs, 111.8 kb contig N50, and 3,2495,743 bp total contig. No expressed sequence tag (EST) data were available for analysis.

Table 1.

Library Coverage

TYPE	lib_name	lib_stats	n_reads	%USED	scov	n_pairs	pcov
frag	1462.2.1389	−42+/−10	44,549,998	93.6	160.3	20,224,898	129.0
jump	1513.2.1414	3317+/−587	36,414,408	92.1	47.4	14,283,852	1484.2
jump	16kbCLIP	16270+/−1842	3,938,850	3.0	0.2	57,502	31.7

n_reads, number of reads in input; %_used, % of reads assembled; scov, sequence coverage; n_pairs, number of valid pairs assembled; pcov, physical coverage.

Genome Annotation

The genome was annotated using the JGI annotation pipeline, which combines several gene prediction and annotation methods and integrates the annotated genome into MycoCosm, a web-based fungal resource for comparative genomics.^13,14

Before gene prediction, assembly scaffolds were masked using RepeatMasker, RepBase library, and most frequent (>150 times) repeats recognized by RepeatScout.^15
–17 The following combination of gene predictors was run on the masked assembly: ab initio Fgenesh and GeneMark; homology-based Fgenesh+ and Genewise seeded by BLASTx alignments against National Center for Biotechnology Information (NCBI) non-redundant (NR) database; and transcriptome-based CombEST.^18

–21 In addition to protein-coding genes, tRNAs were predicted using tRNAscan-SE.²² All predicted proteins were functionally annotated using SignalP for signal sequences, TMHMM for transmembrane domains, InterProScan for integrated collection of functional and structural protein domains, protein alignments to NCBI NR, SwissProt, KEGG for metabolic pathways, and KOG for eukaryotic clusters of orthologs.^23

–29 Interpro and SwissProt hits were used to map gene ontology terms.³⁰ For each genomic locus, the best representative gene model was selected based on a combination of protein homology and EST support, which resulted in the final set of 6,903 gene models used for further analysis.

Comparative Genomic Analysis

Genomic scaffold sequences of QM6a were aligned against those of RUT-C30 using BLAST+ version 2.2.27 (NCBI) with default parameters and without using a mask of sequences.³¹ Subsequently, aligned pair-wise sequences with significant homologies were identified; only aligned regions with significant similarity (E-value <1e-10 and mutation <0.01%) were included, and overlaps of these regions were eliminated manually. Unaligned regions were submitted for further homology search using the same programs, and 96.8% of the scaffold sequences were detected to have corresponding regions in RUT-C30. In the final round of analysis, regions with homology (mutation <0.1%) were included. The relation of the scaffold regions was verified with the alignment analysis of the reverse relation (RUT-C30 sequences were aligned against QM6a sequence) and the alignments derived by MUMmer 3.0.³²

The dot plots were generated by MUMmer 3.0 and some breakpoints were detected by the plot. The positions and sequences of mutations were extracted using our own script from the alignments derived from Blastn analysis with the b-top option. The positions of mutations/deletions relating to exons and introns were checked using the gtf information derived from the JGI's Genome Portal.³³ The nucleotide mutations were translated to the amino acid sequence and checked manually to determine if the mutation affected the amino acid sequences. Mutations and deletions in non-coding regions upstream 1,000 bp and downstream 1,000 bp were also detected.

Eukaryotic orthologous groups (KOG), protein domain, and signal sequence information were derived from the Genome Portal. The “best hits” against Saccharomyces cerevisiae coding sequences (CDSs) were detected by Blastp with the cutoff of E-value 1e-5. The annotation of the yeast CDSs was derived from the Saccharomyces Genome Database.³⁴

Results and Discussion

Comparative Genomic Analysis

A variety of methods has been used to shed light on the biology of enhanced enzyme secretion in T. reesei RUT-C30 (Table 2).^{4,6,7,35

–39} Different morphologies for the endoplasmic reticulum and Golgi apparatus have been described in mutant strains, but it is difficult to associate these phenotypes with specific genetic mutations. Some genetically engineered mutants with altered production of enzymes have been reported.^40,41 It is interesting to consider the ways in which the morphological and secretion phenotypes of the RUT-C30 are due to alterations in the production level of the enzymes.

Table 2.

Phenotypes of RUT-C30 and Comparison with QM6a

PHENOTYPE	DESCRIPTION	GENE INVOLVED
Cellulose activity	RUT-C30 yields similar enzyme productivity to RUT-NG14—15 filter paper units/mL under controlled fermentor conditions, which is 15–20 times that of QM6a grown in shake flasks.³⁵
Catabolite derepression	RUT-C30 was screened as resistant to catabolite repression by glucose or glycerol. RUT-C30 makes as much filter paper degrading activity in the presence of 5% glycerol as it does during growth on cellulase alone.³⁵	cre (tre120117)⁶
Cellulase secretion	When grown in fermentor on roll-milled cotton for 14 days, RUT-C30 secreted 19 mg/mL of cellulase, which was 2.7 times that of QM6a (7 mg/mL).³⁶
Phospholipid	The phospholipid content of the mycelium in RUT-C30 was approximately double that of QM6a cells grown for 4 days.³⁷
Endoplasmic reticulum (ER) content	The quantitative electron microscopic analysis showed 6–7 fold higher ER content for RUT-C30 than for the wild type QM6a during the secretory phase.³⁷
Mycelial protein content	The cell growth, expressed as mycelial protein content, was 4–5 times higher for RUT-C30 than for QM6a.³⁸
ER morphology	QM6a showed only short stretches of ER. The ER of RUT-C30 was remarkably different from QM6a because the membrane profiles were long and frequently these could be traced throughout an entire thin section plane.³⁸
Glycosylation	The glucose end-capped a1-3 branch was observed for Cel7A protein derived from RUT-C30 strain, which prevents a possible second phophomannosyl transfer. Glycans carrying two mannophosphoryl groups were detected with wild type QM6a.³⁹	gla2 (tre121351)⁷
Lamellae structure	RUT-C30 seems to lack annulate lamellae,which is abundant in the wild-type QM6a, and possesses excessive transverse parallel cisternae and a small amount of punctate-like bodies.⁴
Pigment	The conidia of RUT-C30 have a lighter green color and the bottom view of colonies growing on an agar plate lack the yellow color pigment typical for the wild-type.⁴

The genome of the wild type strain QM6a was sequenced by the Sanger method.⁹ The genome of T. reesei was generated by shotgun sequencing to approximately 9× from three libraries differing by insert size. These data were combined with more than 6,000 bacterial artificial chromosome (BAC)-end sequences, and a high-quality draft assembly was created using the JGI shotgun assembler Jazz. The number of scaffolds is 87, and the total length of the scaffold sequences is approximately 32.7 Mb (Table 3). These scaffold sequences are maintained in the JGI Genome Portal site.³³

Table 3.

Statistics of Genome Sequencing of the Two Strains

STRAIN	QM6a	RUT-C30
Description	Wild type	Hyper-cellulolytic mutant
Method	Sanger method	Massively parallel sequencing
Size (bp)	33,454,791	32,689,233
No. scaffolds	87	182
N50 of scaffolds	9	10
L50 of scaffolds (bp)	1,219,543	1,023,047
No. genes	9,129	9,852
% guanine-cytosine content	52.0%	53.3%
% N	0.1%	0.6%

Le Crom et al. mapped read sequences generated from massively parallel sequencing to the wild type QM6a genome sequence.⁵ In this study, the genome of the RUT-C30 was determined using the same massively parallel sequencing method as was used in the mapping study by Le Crom et al., while the coverage of the genome by the sequencing read was higher (208x). Thus, a more detailed comparison at the single nucleotide base level is expected.

Due to the short read length (∼100 bp) in the sequencing, we expected that the average scaffold length would be shorter and the number of scaffolds greater than those typically associated with Sanger sequencing (∼800 bp). The number of scaffolds of the RUT-C30 genome (182) is more than that of QM6a (87), and the L50 value of the RUT-C30 genome (1.02 Mb) is less than that of QM6a (1.22 Mb). However, the cumulative lengths of the largest 10 scaffolds are nearly the same for RUT-C30 (17.1 Mb) and QM6a (17.9 Mb). Though the sequencing methods applied to the two genomes were different, and the length of the massively parallel sequencing is shorter than the conventional Sanger method, quality of the de novo assembled genome sequence of the RUT-C30 strain is high and sufficient for comparison to that of QM6a.

We performed an alignment of the assembled scaffolds with the reference genome sequence using the Blastn program with default settings to detect the difference between the two strains. Based on the output, we picked up aligned regions that are 1,000 bp or longer with mutation of less than 0.1% of the region. From the mapping comparison of the two genomes, we calculated a mutation rate of 6.56×10⁻⁶ per whole genome.⁵ Thus, an aligned region with mutation of more than 0.1% (1×10⁻³) is not expected in the mutation process. In fact, we were able to map QM6a scaffolds to RUTC30 scaffolds one-to-one at a single base level except for the two base pairs at a rearrangement point. Using this mapping method, 32,460,861 bp of the QM6a genome sequence (97.0%) were covered, indicating that nearly all of the mutations were scattered randomly. Data are included in the Supplementary Materials section, Supplementary Table S1. (Supplementary Materials are available online at www.liebertpub.com/ind).

Fourteen scaffolds of QM6a do not have counterparts in RUT-C30: QM6a scaffold numbers 65, 66, 67, 71, 75, 76, 79, 80, 81, 82, 83, 84, 85, and 86. On the other hand, 23 scaffolds of RUT-C30 do not have counterparts: RUT-C30 scaffold numbers 91, 99, 101, 103, 106, 114, 117, 119, 120, 121, 122, 124, 126, 132, 134, 136, 137, 142, 145, 147, 161, 176, and 181. Only a few RUT-C30 scaffolds have weak homology with terminus of QM6a scaffolds (Supplementary Table S2), and nearly all the termini of QM6a scaffolds have no counterpart in RUT-C30.

Rearrangement

Rearrangements generated from multiple rounds of mutagenesis were detected as breakpoints in dot-plot comparisons to QM6a scaffolds with significant alignment (Fig. 1). The detailed breakpoints were analyzed from the list of scaffolds matched between the two strains (Supplementary Table S3). If a single scaffold of RUT-C30 had more than two counterpart QM6a scaffolds, it was considered to be indicative of a rearrangement. We identified 11 potential rearrangement points. At least some of these may be due to an assembly error, although three were previously identified by Vitikainen et al. using the array comparative genome hybridization method.¹⁰ An example of the complicated relationship of breakpoint and connections of scaffolds is shown in (Supplementary Fig. S1).¹⁰

Fig. 1.

Dot plot of the scaffolds of the two T. reesei strains. Scaffolds were sorted by their size. Three breakpoints (arrows) indicate where chromosomal rearrangements might occur.

With the advent of electrophoretic karyotyping, researchers are able to use contour-clamped homogeneous electric field gel electrophoresis to compare the genomes of QM6a, RUT-C30, and other hyper-cellulolytic strains.^42,43 Prior to genome analysis by the comparative genomic array hybridization method, some rearrangements had been predicted by electrophoretic karyotyping.¹⁰ Carter et al., reported that seven DNA bands were seen for both QM6a and RUT-C30, but the separation between chromosome V and VI was less in RUT-C30 than the wild type, indicating the chromosomal rearrangement in RUT-C30.⁴³ They reported the genome sizes of 32.5 Mbp and 34.7 Mbp for QM6a and RUT-C30, respectively. Interestingly, the cumulative size of scaffolds of RUT-C30 (32 Mb) is less than that of QM6a (33 Mb) in this study.

Mutation and Deletion

By aligning RUT-C30 and QM6a scaffold sequences at the single base level, we identified 2,371 sites with differences in sequence (Supplementary Table S4). Many are contiguous sequences of a single nucleotide (e.g., CCCCC). Ninety-seven of the 2,371 sites are inside protein-coding regions. It is interesting that the ratio 97/2,371 (4.1%) is much less than the ratio of coding region in the whole genome sequence (40.40%), indicating that most mutations took place in the intergenic regions.⁹ Sixty-three out of the 97 protein genes were detected in the previous analysis by mapping.⁵ Of the 34 newly identified mutated genes, 27 are predicted to harbor nonsynonymous mutations.

There are 455 loci for which no homologous region was detected by the alignment in the RUT-C30 scaffolds, indicating the possibility of deletion (Supplementary Table S5). In some cases, the “missing” sequence of the RUT-C30 scaffold is a contiguous sequence of N, while in other cases, the length of the correspondent scaffold is significantly different and no alignment was constructed. We note that the mutation ratio is much higher than that reported by Le Crom et al. (6.56×10⁻⁶).⁵ A few thousand bases of every terminus of QM6a scaffolds were missing. There are 65 potential deletions of loci that contain protein-coding regions of 78 genes, excluding the known large deletion of 85 kb and cre1. Twenty of the 78 genes have partial sequences in RUT-C30 scaffolds.

List Of Genes Affected

A list containing all of the affected genes is shown in Table 4, including silent mutations.^5

–8,10 The list also contains mutated and deleted genes described previously. In total, 108 genes have been found to have deleted sequences, including cre1 and 29 genes in an 85-kb deletion discovered previously.⁸ While several affected genes were found in the previous analysis to be related to the phenotype of the strain, we found a similar tendency in the annotation descriptions of silent mutation genes.⁵ We found the distributions of mutated genes by the KOG categories to be similar to those of all the genes in the genome (Supplementary Table S6), indicating the mutagenesis event took place randomly over the whole genome.

Table 4.

Proteins Affected by Mutation/Deletion Between the Two Strains

ID	SIGNAL^a	KOG ANNOTATION (CATEGORY)	Pfam ANNOTATION	YEAST HOMOLOG	YEAST ANNOTATION	REFERENCE
1. Proteins in the region that were not aligned between the two strains
120117		FOG:Zn-finger (R)	Zn-finger, C2H2 type			6
4726	S	Cytochrome P450 CYP2 subfamily (Q)	Cytochrome P450			8
25224		Acid trehalase (G)	Coagulation factor 5/8 type, C-terminal, central catalytic, and N-terminal	YPR026W	Acid trehalase required for utilization of extracellular trehalose	8
43418	S					8
49946		Glutathione S-transferase (O)	Glutathione S-transferase, N-terminal			8
64906	S			YBR056W	Putative cytoplasmic protein of unknown function	8
64956		Voltage-gated shaker-like K⁺ channel, subunit beta/KCNAB (C)	Aldo/keto reductase	YDL243C	Putative aryl-alcohol dehydrogenase	8
64959		Predicted phosphatase (R)		YKR070W	Putative protein of unknown function	8
64971		Amino acid transporters (E)	Amino acid permease-associated region	YPL265W	Dicarboxylic amino acid permease	8
65036	S	Cytochrome P450 CYP4/CYP19/CYP26 subfamilies (IQ)	Cytochrome P450	YDR402C	N-formyltyrosine oxidase involved in the production of N,N-bisformyl dityrosine	8
65041		Putative N2,N2-dimethylguanosine tRNA methyltransferase (A)		YBR271W	S-adenosylmethionine-dependent methyltransferase	8
65067		Voltage-gated shaker-like K⁺ channel, subunit beta/KCNAB (C)	Aldo/keto reductase	YFL056C	Putative aryl-alcohol dehydrogenase; involved in oxidative stress response	8
65097		Zinc-binding oxidoreductase (CR)	Zinc-containing alcohol dehydrogenase superfamily	YMR152W	Protein of unknown function	8
65117		CASK-interacting adaptor protein (caskin) and related proteins with ankyrin repeats and SAM domain (T)	Ankyrin	YIL112W	Subunit of the Set3 complex, which is a meiotic-specific repressor of sporulation specific genes	8
65142		Aldehyde dehydrogenase (C)	Aldehyde dehydrogenase	YMR170C	Cytoplasmic aldehyde dehydrogenase	8
65172		Animal-type fatty acid synthase and related proteins (I)	Zinc-containing alcohol dehydrogenase superfamily & Beta-ketoacyl synthase	YER061C	Mitochondrial beta-keto-acyl synthase	8
65191		Predicted transporter (major facilitator superfamily) (R)	General substrate transporter	YBR298C	Maltose permease	8
65215			Polysaccharide deacetylase			8
71817	S	Synaptic vesicle transporter SVOP and related transporters (major facilitator superfamily) (R)		YPR156C	Polyamine transport protein specific for spermine	8
71823			Fungal specific transcription factor			8
79725			Fungal transcriptional regulatory protein, N-terminal & Fungal specific transcription factor	YLR228C	Sterol regulatory element binding protein	8
79726						8
109199						8
109201			FAD linked oxidase, N-terminal			8
109206			Heterokaryon incompatibility			8
109211		Monocarboxylate transporter (G)		YNL125C	Protein with similarity to monocarboxylate permeases	8
109219						8
109221		Phospholipase/carboxyhydrolase (E)				8
						8
122778		Aldo/keto reductase family proteins (R)	Aldo/keto reductase	YOR120W	Putative NADP(+) coupled glycerol dehydrogenase	8
122780	S		Glycoside hydrolase, family 28	YJR153W	Endo-polygalacturonase, pectolytic enzyme	8
34413		Similar to cyclophilin-type peptidyl-prolyl cis-trans isomerase (O)				Current study
42942						Current study
43199		Mitogen-activated protein kinase kinase kinase kinase (MAP4K), germinal center kinase family (T)				Current study
43302						Current study
43392		Integrin beta subunit (N-terminal portion of extracellular region) (TW)				Current study
43427		Transcriptional effector CCR4-related protein (K)				Current study
48080				YGL079W	Putative protein of unknown function; likely member of BLOC complex involved in endosomal cargo sorting	Current study
48266						Current study
61593				YLR154W-A	Dubious open reading frame	Current study
61995				YLR154W-E	Dubious open reading frame	Current study
71126	S					Current study
71146		FOG :Zn-finger (R)				Current study
71154		FOG :Zn-finger (R)				Current study
71166		Mitogen-activated protein kinase (MAP4K), germinal center kinase family (T)				Current study
71167	S	FOG :Zn-finger (R)				Current study
71177		FOG :Zn-finger (R)				Current study
71180						Current study
71192						Current study
71196		Neural adherens junction protein Plakophilin and related Armadillo repeat proteins (TW)				Current study
71212		Integrin beta subunit (N-terminal portion of extracellular region) (TW)				Current study
73103	S	C-type lectin (V) and C-type lectin (T)				Current study
103043						Current study
103045						Current study
104046						Current study
104343						Current study
104501						Current study
104576						Current study
104716						Current study
107285						Current study
107346						Current study
109071		Vesicle-associated membrane protein involved in inositol metabolism (U)				Current study
109120						Current study
109607	S					Current study
110688						Current study
111318						Current study
111588						Current study
111730						Current study
112030						Current study
112103						Current study
112558						Current study
112601						Current study
112602						Current study
112603						Current study
112629						Current study
112649	S					Current study
112651						Current study
112674						Current study
112675		Dystroglycan (W)				Current study
112676						Current study
112677		Predicted RNA-binding protein (R)				Current study
112678						Current study
112680						Current study
112681		Uncharacterized conserved protein (S)				Current study
112682						Current study
112683						Current study
112687						Current study
112688						Current study
112689		E3 ubiquitin ligase, Cullin 1 component (O)				Current study
112695	S					Current study
58404		Uncharacterized conserved protein (S)	Protein of unknown function UPF0041	YGL080W	Highly conserved subunit of the mitochondrial pyruvate carrier	Current study^b
72076			Fungal transcriptional regulatory protein, N-terminal and fungal specific transcription factor			10^b
103044						Current study^b
104435						Current study^b
104577						Current study^b
104623						Current study^b
104715						Current study^b
105216		Ankyrin (M)	Ankyrin			Current study^b
106252						Current study^b
106253						Current study^b
107347	S					Current study^b
109267						Current study^b
109670						Current study^b
109903		Nucleolar GTPase/ATPase p130 (Y)				Current study^b
110735						Current study^b
111861						Current study^b
112146						Current study^b
112559						Current study^b
112609		Splicing coactivator SRm160/300, subunit SRm300 (A)				Current study^b
2. Proteins with nucleotide and amino acid sequence changed
121351	S	Glucosidase II catalytic (alpha) subunit	Glycoside hydrolase, family 31	YBR229C	Glucosidase II catalytic subunit	7
3400		FOG/RRM domain (R)				5
22294		RasGAP SH3 binding protein rasputin, contains NTF2 and RRM domains (T)	RNA-binding region RNP-1 (RNA recognition motif) & Nuclear transport factor 2	YNR051C	Ubiquitin protease cofactor, forms deubiquitination complex	5
22365						5
23001		Low density lipoprotein B-like protein (I)				5
28409		Synaptic vesicle transporter SVOP and related transporters (major facilitator superfamily) (R)		YHR048W	Presumed antiporter of the DHA1 family	5
35768						5
43599		Vacuolar sorting protein VPS1, dynamin, and related proteins (UR)	Dynamin & Dynamin central region & Dynamin GTPase effector	YKR001C	Dynamin-like GTPase required for vacuolar sorting	5
52368			Fungal transcriptional regulatory protein, N-terminal & Fungal specific transcription factor	YBR297W	MAL-activator protein, part of complex locus MAL3	5
54157		Predicted hydrolase related to dienelactone hydrolase (R)		YAL049C	Cytoplasmic protein involved in mitochondrial function or organization	5
57609		Heterochromatin-associated protein HP1 and related CHROMO domain proteins (B)		YML034W1		5
58073		Uroporphyrinogen decarboxylase (H)	Uroporphyrinogen decarboxylase (URO-D)	YDR047W	Uroporphyrinogen decarboxylase, catalyzes the fifth step in the heme biosynthetic pathway	5
58391		Uncharacterized conserved protein WDR8, contains WD repeats (R)				5
58790		Predicted haloacid-halidohydrolase and related hydrolases (R)	Haloacid dehalogenase-like hydrolase	YER062C	One of two redundant DL-glycerol-3-phosphatases	5
59388		Predicted transporter (major facilitator superfamily) (R)	General substrate transporter	YBR298C	Maltose permease, high-affinity maltose transporter (alpha-glucoside transporter)	5
59801						5
59952		Amino acid transporters (E)	Amino acid permease-associated region	YGL077C	Choline/ethanolamine transporter	5
60458		Non-ribosomal peptide synthetase/alpha-aminoadipate reductase and related enzymes (Q)	AMP-dependent synthetase and ligase & Phosphopantetheine-binding & Condensation domain	YBR115C	Alpha aminoadipate reductase	5
63935		Isoleucyl-tRNA synthetase (J)	Aminoacyl-tRNA synthetase, class Ia & FPG and IleRS zinc finger	YPL040C	Mitochondrial isoleucyl-tRNA synthetase	5
64375		KEKE-like motif-containing transcription regulator (Rlr1)/suppressor of sin4 (K) & Putative RNA binding protein (R)		YLR300W	Major exo-1,3-beta-glucanase of the cell wall	5
65104		Vacuolar protein sorting-associated protein (U)	Protein of unknown function DUF1162	YLL040C	Protein of unknown function involved in sporulation, vacuolar protein sorting, prospore membrane formation and protein-Golgi retention	5
65106		Carbon-nitrogen hydrolase (E)	Nitrilase/cyanide hydratase and apolipoprotein N-acyltransferase	YIL164C	Nitrilase, member of the nitrilase branch of the nitrilase superfamily	5
66895		Exosomal 3'-5' exoribonuclease complex, subunit Rrp45 (J)	3' exoribonuclease	YDR280W	Exosome non-catalytic core component	5
70071		Putative protein methyltransferase involved in meiosis and transcriptional silencing (Dot1) (KD)	Fungal transcriptional regulatory protein, N-terminal & Fungal specific transcription factor	YIL130W	Zinc cluster protein proposed to function as a transcriptional regulator involved in the stress response	5
74570		Predicted unusual protein kinase (R)	ABC-1	YLR253W	Putative protein of unknown function	5
77513			Fungal specific transcription factor	YLR014C	Zinc finger transcription factor containing a Zn(2)-Cys(6) binuclear cluster domain	5
78158		Karyopherin (importin) beta 3 (YU)		YMR308C	Karyopherin/importin that interacts with the nuclear pore comple	5
79014		Vacuolar H+-ATPase V0 sector, subunits c/c' (C)	H+-transporting two-sector ATPase, C subunit	YEL027W	Proteolipid subunit of the vacuolar H(+)-ATPase V0 sector
79304		5'-3' exonuclease (L)	XPG N-terminal & XPG I & Ubiquitin interacting motif	YGR258C	Single-stranded DNA endonuclease	5
80691		Uncharacterized conserved protein (S)	YjeF-related protein, N-terminal	YEL015W	Non-essential conserved protein of unknown function, plays a role in mRNA decapping	5
82499	S	Predicted integral membrane protein (R)				5
106009			Fungal specific transcription factor	YIL130W	Zinc cluster protein proposed to function as a transcriptional regulator involved in the stress response	5
107078		Nucleolar GTPase/ATPase p130 (Y)
108962		Uncharacterized conserved protein with TLDc domain (S)	TLDc	YOR118W	Protein of unknown function	5
109619		Class 2 transcription repressor NC2, beta subunit (Dr1) (K)	Transcription factor CBF/NF-Y/archaeal histone	YDR397C	Subunit of a heterodimeric NC2 transcription regulator complex with Bur6p	5
110423		mRNA deadenylase subunit (A)	Ribonuclease CAF1	YNR052C	RNase of the DEDD superfamily	5
112231		Kinesin-like protein (Z)	Kinesin, motor region	YBL063W	Kinesin-related motor protein required for mitotic spindle assembly, chromosome segregation	5
112346		Nuclear protein Ataxin-7 (B)		YGL066W	SAGA complex subunit	5
119999		Splicing coactivator SRm160/300, subunit SRm300 (A)	Myb, DNA-binding			5
120806		Serine/threonine protein kinase Chk2 and related proteins (D)	Protein kinase	YPL153C	Protein kinase, required for cell-cycle arrest in response to DNA damage	5
122689		WD-repeat protein WDR6, WD repeat superfamily (R)	G-protein beta WD-40 repeat	YPL183C	WD40 domain-containing protein involved in endosomal recycling	5
123126		Kinesin light chain (Z)				5
123786		Non-ribosomal peptide synthetase/alpha-aminoadipate reductase and related enzymes (Q)	Beta-ketoacyl synthase & AMP-dependent synthetase and ligase & Phosphopantetheine-binding & Condensation domain & Acyl transferase region	YBR115C	Alpha aminoadipate reductase	5
3830		MEKK and related serine/threonine protein kinases (T)	Protein kinase	YJL095W	Mitogen-activated protein (MAP) kinase kinase kinase acting in the protein kinase C signaling pathway	Current study
42752		Trypsin (E)				Current study
43472		Uncharacterized conserved protein (S)				Current study
45689		Rac GTPase-activating protein BCR/ABR (T)	Pleckstrin-like & Fungal transcriptional regulatory protein, N-terminal & RhoGAP & Fungal specific transcription factor	YPL115C	Rho GTPase activating protein (RhoGAP) involved in control of the cytoskeleton organization	Current study
53811		Clathrin-associated protein medium chain (U)	Acyl transferase region & Clathrin adaptor complex, medium chain	YPL259C	Mu1-like medium subunit of the clathrin-associated protein complex (AP-1)	10
57857		Predicted beta-mannosidase (G)	Glycoside hydrolase family 2, immunoglobulin-like beta-sandwich domain			Current study
61350			von Willebrand factor, type A			Current study
65055		Delta 6-fatty acid desaturase/delta-8 sphingolipid desaturase (I)	Cytochrome b5 & Fatty acid desaturase			Current study
75152						Current study
76238		Plasma membrane H+-transporting ATPase (P)	E1-E2 ATPase-associated region & Cation transporting ATPase, N-terminal & Haloacid dehalogenase-like hydrolase	YGL008C	Plasma membrane H+-ATPase	Current study
103330		Ribosome biogenesis protein - Nop58p/Nop5p (JA)				Current study
104168						Current study
104762		Splicing coactivator SRm160/300, subunit SRm300 (A)				Current study
104898						Current study
105054		Protein interacting with poly(A)-binding protein (A)	Ataxin-2, N-terminal	YGR178C	Component of glucose deprivation induced stress granules, involved in P-body-dependent granule assembly	Current study
105158		Halotolerance protein HAL3 (contains flavoprotein domain) (DP)				Current study
105342		von Willebrand factor and related coagulation proteins (VW)				Current study
106492						Current study
108642	S	von Willebrand factor and related coagulation proteins (VW) & Rho GTPase effector BNI1 and related formins (TZ)		YDR077W	Major stress-induced structural GPI-cell wall glycoprotein in stationary-phase cells	Current study
108721		RNA helicase (A)				Current study
109304						Current study
109512		Putative transcription factor HALR/MLL3, involved in embryonic development (R)	AT-rich interaction region			Current study
109801		Transcription factor containing C2HC type Zn finger (K)				Current study
110056	S	E3 ubiquitin ligase, Cullin 1 component(O)				Current study
111253						Current study
112679	S					Current study
120086						Current study
3. Proteins with nucleotide mutation but no mutation in amino acid sequence (silent mutation)
3027		Predicted hydrolase involved in interstrand cross-link repair (L)	DNA repair metallo-beta-lactamase	YMR137C	Nuclease required for a post-incision step in the repair of DNA single and double-strand breaks	5
22841				YMR098C	Mitochondrial protein required for the stability of Oli1p (Atp9p) mRNA and for the Oli1p ring formation	5
22994		AAA+-type ATPase (O)	AAA ATPase, central region&AAA ATPase VAT, N-terminal	YDL126C	AAA ATPase involved in multiple processes; subunit of a polyubiquitin-selective segregase complex involved in ERAD, cell wall integrity during heat stress and mitotic spindle disassembly	5
26255			Fungal transcriptional regulatory protein, N-terminal & Fungal specific transcription factor	YIL130W	Zinc cluster protein proposed to function as a transcriptional regulator involved in the stress response	5
44956		Permease of the major facilitator superfamily (R)		YCR023C	Vacuolar membrane protein of unknown function	5
58628		RNA polymerase II, large subunit (K)	Zn-finger, C2H2 type			5
63702		N-end rule pathway, recognition component UBR1 (O)	Zn-finger (putative), N-recognin	YGR184C	E3 ubiquitin ligase (N-recognin)	5
65773		AAA+-type ATPase (O)	AAA ATPase, central region	YGR028W	Mitochondrial protein involved in sorting of proteins in the mitochondria	5
67732		Multidrug resistance-associated protein/mitoxantrone resistance protein, ABC superfamily (Q)	ABC transporter & ABC transporter, transmembrane region	YLL048C	Transporter of the ATP-binding cassette (ABC) family	5
69437		Cyclin B and related kinase-activating proteins (D)	Cyclin, N-terminal & Cyclin, C-terminal	YLR210W	B-type cyclin involved in cell cycle progression	5
75105		Poly(A) polymerase and related nucleotidyltransferases (A)	Endonuclease/exonuclease/phosphatase	YKR002W	Poly(A) polymerase, one of three factors required for mRNA 3'-end polyadenylation	5
78465		Predicted transporter (major facilitator superfamily) (R)		YEL065W	Ferrioxamine B transporter, member of the ARN family of transporters	5
82153		Protein phosphatase 1, regulatory subunit, and related proteins (T)	Leucine-rich repeat	YOR373W	Component of the spindle pole body outer plaque	5
103061						5
105391						5
110570		SWI-SNF chromatin-remodeling complex protein (B)				5
110882		Transcription-coupled repair protein CSB/RAD26 (contains SNF2 family DNA-dependent ATPase domain) (KL)	SNF2-related & Helicase, C-terminal	YJR035W	Protein involved in transcription-coupled nucleotide excision repair of UV-induced DNA lesions	5
111236		N-acetyltransferase (R)	GCN5-related N-acetyltransferase			5
121453		beta-1,6-N-acetylglucosaminyltransferase, contains WSC domain (OG)				5
122212		Predicted signal transduction protein(R)	BRO1	YPL084W	Cytoplasmic class E vacuolar protein sorting (VPS) factor that coordinates deubiquitination in the multivesicular body (MVB) pathway	5
44330		Dual-specificity tyrosine-phosphorylation regulated kinase (R)	Protein kinase	YJL141C	Serine-threonine protein kinase	Current study
62964		DNA damage-responsive repressor GIS1/RPH1, jumonji superfamily (L)	HMG-I and HMG-Y, DNA-binding & Transcription factor jumonji, jmjC & Transcription factor jumonji, JmjN	YER169W	JmjC domain-containing histone demethylase; specifically demethylates H3K36 tri- and dimethyl modification states	Current study
79244	S	Serine/threonine protein kinase (T)				Current study
80265		SWI-SNF chromatin-remodeling complex protein (B)				Current study
108232		Putative signal transduction protein involved in RNA splicing (AT)				Current study
112283		Dynactin, subunit p25 (Z)	Bacterial transferase hexapeptide repeat			Current study
119768		Nuclear receptor coregulator SMRT/SMRTER, contains Myb-like domains (K)	Myb, DNA-binding	YCR033W	Subunit of the Set3C deacetylase complex	Current study

Sorting signal was detected by SignalP program.

Gene has partial sequences homologous to the corresponding QM6a gene.

Many genes are thought to be relevant to the high cellulolytic enzyme secretion phenotype of Trichoderma, and many were found to have mutations or deletions in the RUT-C30 sequence (Supplementary Table S7). Four of the 228 CAZy genes have deletions, and three genes have mutations in their exons.⁴⁴ Four genes having deletions and two genes having point mutations are regulated by Lae1.⁴⁵ Several listed genes have deletions upstream or downstream of the genes and could be affected at the level of expression (Supplementary Table S7).⁴⁴ It is interesting to note that a Ras GTPase (tre120150), which is reported to modulate morphogenesis, sporulation, and cellulase gene expression in another strain (QM9414), has mutation in its upstream region.⁴⁶

Conclusions

Comparative genomic analysis of two strains of T. ressei has revealed several different mutations between the strains. From these mutations we hope to find the genetic factors that underlie their differences in phenotypes.

Traditionally the function of a gene is tested by disruption using gene-targeting techniques and is verified by a complementation test that involves introducing wild type gene(s) into the mutant. It is possible to test the function of the all of the mutated genes by introducing QM6a genes into RUT-C30 and observing whether the phenotype of RUT-C30 changes to that of QM6a. The function of the cre1 deletion of RUT-C30 was tested using this method.⁶ Several tools of genetic engineering have been established for this fungus.^47,48 The comparative genomic analysis of the two strains shows mutations in 2,371 loci and deletions in 455 loci, including the 178 affected protein genes. While the relatively modest size of the T. reesei research community may prevent the generation of a genome-wide knockout collection, it would be feasible to generate a collection of knock-out strains for the 178 genes of the two strains.⁴⁹ Moreover, the discovery of genetic crossing using QM6a enables the combination of different properties of strains by sexual crossing and makes a forward genetic strategy based on identifying genes associated with hyperproduction of cellulolytic enzymes a possibility.^48,50

Genome-wide, high-throughput analysis of additional types of omics analysis data may help to narrow down the possible candidates. Studies focused on the proteomic analysis of cellulolytic enzymes for direct comparison between T. reesei RUT-C30 and QM6a were performed. Herpoel-Gimbert et al. analyzed the proteins secreted by another efficient cellulolytic T. reesei strain, CL847, using two-dimensional electrophoresis (2DE) and matrix-assisted laser desorption-ionization time-of-flight mass spectrometry (MALDI-TOF) or liquid chromatography-tandem mass spectrometry (LC-MS/MS).⁵¹ They identified a total of 22 protein species, and based on the identified spots on 2DE, they compared the secreted proteins between RUT-C30 and CL847 produced on lactose as the only carbon source. They found that the 2DE profiles of the two strains were very different in terms of both spot numbers and protein composition.⁵¹ It is interesting that differences in protein spots representing a higher percentage of the total spot volume are due to the presence of several Cel7A isoforms for CL847, while a single and bigger spot is visible for RUT-C30. They also found that trypsin was absent in RUT-C30 and hypothesized that the presence of the protease resulted in the presence of degraded forms of proteins in CL847 and a much higher number of spots.

The production of cellulolytic enzymes is tightly regulated by a variety of transcription factors and other proteins such as Lae1.⁴⁵ Transcriptional regulation plays a key role in determining the cellulolytic phenotypes, and it is likely that there are differences in transcriptional regulation between strains. Moving forward, direct RNA-seq-based comparisons of the gene-expression differences between QM6a and RUT-C30 will deepen our understanding of the known genetic differences reported between these two strains.

Arvas et al., found that in certain gene clusters, expression of all of the genes expressed, had significant positive/negative correlation to protein secretion.⁵² One such gene cluster is located very near to the rearrangement point on scaffold 1 (data not shown), though other clusters are not explained by the rearrangements. It is interesting to note that genes whose expression was regulated by Lae1 have mutations or deletions in exons as well as upstream and/or downstream regions.⁴⁵

The genome assembly and annotations described here can be interactively accessed through the JGI fungal genome portal MycoCosm.^53,54

Because of its high secretion level of cellulases and hemicellulases, T. reesei has become a model research system for fungal biotechnology, especially with regard to protein production.⁵⁵ The amount of cellulolytic enzymes produced by T. reesei is significantly high, not only for cellulase production in fungi but also for protein production by microorganisms in general. Much research has been conducted to reveal the biology that underlies its high protein production and secretion mechanisms. Studies involving functional genomic analysis of T. reesei have continued to increase since the genome sequence of wild type QM6a was reported.⁹ It is interesting that several hyper-cellulolytic mutants have been derived from the wild type by classical mutagenesis methods; knowledge and understanding of the genetic factors responsible for the differences between the wild type and hyper-cellulolytic mutants will continue to be especially valuable. Among the mutants, RUT-C30 is one of the most productive. Comparative genomic analysis between RUT-C30 and QM6a has revealed a significant list of mutations for the whole genome and tools to explore these genes in greater detail, so their relationship to phenotypes of high interest can be established.⁴⁹ Using a combination of cellular and molecular biology tools set against a robust genomics landscape, the genetic factors responsible for the hyper-cellulolytic phenotype are expected to be elucidated.

Footnotes

Acknowledgments

A portion of this research was performed at Environmental Molecular Sciences Laboratory (ESML; Richland, WA), a national scientific user facility sponsored by the Department of Energy's (DOE) Office of Biological and Environmental Research and located at Pacific Northwest National Laboratory (PNNL). PNNL is a multiprogram national laboratory operated by Battelle Memorial Institute for the DOE under Contract DE-AC05-76RLO 1830. Hideaki Koike is an EMSL Wiley Visiting Scientist and is also supported by the grant-in-aid of the Ministry of Economy, Trade, and Industry, Japan. The work conducted by the US DOE JGI is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC02-05CH11231. The authors thank Mark Butcher for editorial assistance.

Author Disclosure Statement

No competing financial interests exist.

References

Nevalainen

, Te'o

, Bergquist

. Heterologous protein expression in filamentous fungi. Trends Biotechnol, 2005; 23(9):468–474.

, Schmitz

, Zhang

, et al. Heterologous gene expression in filamentous fungi. Adv Appl Microbiol, 2012; 81:1–61.

Cherry

, Fidantsef

. Directed evolution of industrial enzymes: An update review article. Curr Opin Biotechnol, 2003; 14(4):438–443.

Peterson

, Nevalainen

. Trichoderma reesei RUT-C30—thirty years of strain improvement. Microbiology, 2012; 158(Pt 1):58–68.

Le Crom

, Schackwitz

, Pennacchio

, et al. Tracking the roots of cellulase hyperproduction by the fungus Trichoderma reesei using massively parallel DNA sequencing. Proc Natl Acad Sci USA, 2009; 106(38):16151–16156.

Ilmén

, Thrane

, Penttilä

. The glucose repressor gene cre1 of Trichoderma: Isolation and expression of a full-length and a truncated mutant form. Mol Gen Genet, 1996; 251(4):451–460.

Geysens

, Pakula

, Uusitalo

, et al. Cloning and characterization of the glucosidase II alpha subunit gene of Trichoderma reesei: A frameshift mutation results in the aberrant glycosylation profile of the hypercellulolytic strain Rut-C30. Appl Environ Microbiol, 2005; 71(6):2910–2924.

Seidl

, Gamauf

, Druzhinina

, et al. The Hypocrea jecorina (Trichoderma reesei) hypercellulolytic mutant RUT C30 lacks a 85 kb (29 gene-encoding) region of the wild-type genome. BMC Genomics, 2008; 9:327.

Martinez

, Berka

, Henrissat

, et al. Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina). Nat Biotechnol, 2008; 26(5):553–560. Erratum in: Nat Biotechnol 2008; 26(10):1193.

10.

Vitikainen

, Arvas

, Pakula

, et al. Array comparative genomic hybridization analysis of Trichoderma reesei strains with enhanced cellulase production properties. BMC Genomics, 2010; 11:441.

11.

US Department of Energy. DOE Joint Genome Institute. Available at: http://jgi.doe.gov (Last accessed November 2013 ).

12.

Gnerre

, Maccallum

, Przybylski

, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA, 2011; 108(4):1513–1518.

13.

Grigoriev

, Aerts

, Terry

, et al. JGI Genome Annotation Pipeline. In: Plant & Animal Genomes XVI Conference. San Diego, CA 2008; W61.

14.

Grigoriev

, Nordberg

, Shabalov

, et al. The Genome Portal of the Department of Energy Joint Genome Institute. Nucleic Acids Res, 2011; 40:D26–D32.

15.

Smit

AFA

, Hubley

, Green

. RepeatMasker Open-3.0. (1996–2010) Available at http://repeatmasker.org (Last accessed November 2013 ).

16.

Jurka

, Kapitonov

, Pavlicek

, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res, 2005; 110:462–467.

17.

Price

, Jones

, Pevzner

. De novo identification of repeat families in large genomes. Bioinformatics, 2005; 21(Suppl 1):i351–358.

18.

Salamov

, Solovyev

. Ab initio gene finding in Drosophila genomic DNA. Genome Res, 2000; 10:516–522.

19.

Ter-Hovhannisyan

, Lomsadze

, Chernoff

, Borodovsky

. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res, 2008; 18(12):1979–1990.

20.

Birney

, Clamp

, Durbin

. GeneWise and Genomewise. Genome Res, 2004; 14:988–995.

21.

Zhou , et al. US Department of Energy Joint Genome Institute. (personal communication/unpublished work).

22.

Lowe

, Eddy

. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res, 1997; 25(5):955–964.

23.

Nielsen

, Engelbrecht

, Brunak

, von Heijne

. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng, 1997; 10(1):1–6.

24.

Bendtsen

, Nielsen

, von Heijne

, Brunak

. Improved prediction of signal peptides: SignalP 3.0. J Mol Biol, 2004; 340(4):783–795.

25.

Melen

, Krogh

, von Heijne

. Reliability measures for membrane protein topology prediction algorithms. J Mol Biol, 2003; 327(3):735–744.

26.

Quevillon

, Silventoinen

, Pillai

, et al. InterProScan: Protein domains identifier. Nucleic Acids Res, 2005; 33:W116–120.

27.

SwissProt. UniProtKB/Swiss-Prot. Available at: http://expasy.org/sprot (Last accessed November 2013 ).

28.

Kanehisa

, Goto

, Hattori

, et al. From genomics to chemical genomics: New developments in KEGG. Genome Biol, 2006; 5:R7.

29.

Koonin

, Fedorova

, Jackson

, et al. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol, 2004; 5(2):R7.

30.

Ashburner

, Ball

, Blake

, et al. Gene ontology: Tool for the unification of biology. Nat Genet, 2000; 25(1):25–29.

31.

Camacho

, Coulouris

, Avagyan

, et al. BLAST+: Architecture and applications. BMC Bioinformatics, 2009; 10:421.

32.

Kurtz

, Phillippy

, Delcher

, et al. Versatile and open software for comparing large genomes. Genome Biol, 2004; 5:R12.

33.

JGI. Trichoderma reesei v.2.0. Available at: http://genome.jgi-psf.org/Trire2/Trire2.home.html (Last accessed November 2013 ).

34.

Saccharomyces Genome Database. Available at: http://yeastgenome.org (Last accessed November 2013 ).

35.

Montenecourt

, Eveleigh

. Selective screening methods for the isolation of high yielding cellulase mutants of Trichoderma reesei . In: Brown

, Jurasek

, eds. Hydrolysis of Cellulose: Mechanisms of Enzymatic and Acid Catalysis, Advances in Chemistry Series, Washington, DC: American Chemical Society. 1979; 181:289–301.

36.

Ryu

DDY

, Mandels

. Cellulases: Biosynthesis and applications. Enzyme Microb Technol, 1980; 2(2):91–102.

37.

Ghosh

, Al-Rabiai

, Ghosh

, et al. Increased endoplasmic reticulum content of a mutant of Trichoderma reesei (RUT-C30) in relation to cellulase synthesis. Enzyme Microb Technol, 1982; 4(2):110–113.

38.

Ghosh

, Ghosh

, Trimino-Vazquez

, et al. Cellulase secretion from a hypercellulolytic mutant of Trichoderma reesei Rut-C30. Arch Microbiol, 1984; 140(2,3):126–133.

39.

Stals

, Sandra

, Devreese

, et al. Factors influencing glycosylation of Trichoderma reesei cellulases. II: N-glycosylation of Cel7A core protein isolated from different strains. Glycobiology, 2004; 14(8):713–724.

40.

Brody

, Maiyuran

. RNAi-mediated gene silencing of highly expressed genes in the industrial fungi Trichoderma reesei and Aspergillus niger . Ind Biotechnol, 2009; 5(1):53–60.

41.

Kautto

, Grinyer

, Paulsen

, et al. Stress effects caused by the expression of a mutant cellobiohydrolase I and proteasome inhibition in Trichoderma reesei Rut-C30. Nat Biotechnol, 2013; 30(2):183–191.

42.

Mäntylä

, Rossi

, Vanhanen

, et al. Electrophoretic karyotyping of wild-type and mutant Trichoderma longibrachiatum (reesei) strains. Curr Genet, 1992; 21(6):471–477.

43.

Carter

, Allison

, Rey

, Dunn-Coleman

. Chromosomal and genetic analysis of the electrophoretic karyotype of Trichoderma reesei: Mapping of the cellulase and xylanase genes. Mol Microbiol, 1992; 6(15):2167–2174.

44.

Hakkinen

, Arvas

, Oja

, et al. Re-annotation of the CAZy genes of Trichoderma reesei and transcription in the presence of lignocellulosic substrates. Microb Cell Fact, 2012; 11:134.

45.

Seiboth

, Karimi

, Phatale

, et al. The putative protein methyltransferase LAE1 controls cellulose gene expression in Trichoderma reesei . Mol Microbiol, 2012; 84(6):1150–1164.

46.

Zhang

, Zhang

, Zhong

, et al. Ras GTPases modulate morphogenesis, sporulation and cellulose gene expression in the cellulolytic fungus Trichoderma reesei . PLoS One, 2012; 7:e48786.

47.

Schuster

, Bruno

, Collet

, et al. A versatile toolkit for high throughput functional genomics with Trichoderma reesei . Biotechnol Biofuels, 2012; 5(1):1.

48.

Seidl

, Seiboth

. Trichoderma reesei: Genetic approaches to improving strain efficiency. Biofuel, 2010; 1(2):343–354.

49.

Kubicek

. Systems biological approaches towards understanding cellulose production by Trichoderma reesei . J Biotechnol, 2013; 163(2):133–142.

50.

Seidl

, Seibel

, Kubicek

, Schmoll

. Sexual development in the industrial workhorse Trichoderma reesei . Proc Natl Acad Sci USA, 2009; 106(33):13909–13914.

51.

Herpoël-Gimbert

, Margeot

, Dolla

, et al. Comparative secretome analyses of two Trichoderma reesei RUT-C30 and CL847 hypersecretory strains. Biotechnol Biofuels, 2008; 1(1):18.

52.

Arvas

, Pakula

, Smit

, et al. Correlation of gene expression and protein production rate—A system wide study. BMC Genomics, 2011; 12:616.

53.

JGI. Fungal Genomics Program. Available at: http://jgi.doe.gov/fungi (Last accessed November 2013 ).

54.

, Singh

, Himmel

. Perspectives and new directions for the production of bioethanol using consolidated bioprocessing of lignocelluloses. Curr Opin Biotechnol, 2009; 20(3):364–371.

55.

Grigoriev

, Nordberg

, Shabalov

, et al. The genome portal of the Department of Energy Joint Genome Institute. Nucleic Acids Res, 2012; 40:D26–32.