The organization of the mammalian genome into gene subsets corresponding to specific functional classes has provided key tools for systems biology research. Here, we have created a web-accessible resource called the Mammalian Metabolic Enzyme Database (https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/MetabolicEnzymeDatabase.html) keyed to the biochemical reactions represented on iconic metabolic pathway wall charts created in the previous century. Overall, we have mapped 1,647 genes to these pathways, representing ~7 percent of the protein-coding genome. To illustrate the use of the database, we apply it to the area of kidney physiology. In so doing, we have created an additional database (Database of Metabolic Enzymes in Kidney Tubule Segments: https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/), mapping mRNA abundance measurements (mined from RNA-Seq studies) for all metabolic enzymes to each of 14 renal tubule segments. We carry out bioinformatics analysis of the enzyme expression pattern among renal tubule segments and mine various data sources to identify vasopressin-regulated metabolic enzymes in the renal collecting duct.
- collecting duct
the organization of the mammalian genome into specific functional classes has provided key tools for systems biology research. For example, a list of all protein kinases expressed in the mammalian genome has been curated (13) and made available online (http://www.uniprot.org/docs/pkinfam). This list was an essential component of Bayesian analysis of cell signaling in the renal collecting duct (3, 27). Similar lists of transcription factors (http://www.bioguo.org/AnimalTFDB/), proteases (http://merops.sanger.ac.uk/), protein phosphatases (https://hpcwebapps.cit.nih.gov/ESBL/Database/Phosphatases/) (12), and ubiquitin E3 ligases (https://hpcwebapps.cit.nih.gov/ESBL/Database/E3-ligases/) (14) have been crucial to other systems-level studies. The goal in this paper is to map the set of all metabolic enzymes to transcriptomic data for individual renal tubule segments to provide a starting point for systems-level analysis of metabolic pathways along the nephron. The first step in this task is to obtain a closed list of genes (and official gene symbols) that code for metabolic enzymes in mammalian genomes. Typically, systems-level studies utilize official gene symbols to specify individual proteins. Historically, however, metabolic enzymes have not been classified using official gene symbols, but instead have been classified by EC numbers from the International Enzyme Commission (http://www.chem.qmul.ac.uk/iubmb/enzyme/history.html) or from lists of the reactions that the enzymes catalyze (24). Curation of metabolic enzymes has also been done in the form of iconic wall charts from Roche [http://biochemical-pathways.com/#/map/1] and Sigma-Aldrich [http://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Sigma/General_Information/metabolic_pathways_poster.pdf], but these do not include official gene symbols. Thus, before undertaking a mapping of metabolic enzymes to transcriptomic data from kidney, we needed to create a list of official gene symbols for all metabolic enzymes that catalyze the reactions represented on the wall charts.
This paper consists of three parts: 1) a newly curated database listing official gene symbols for mammalian metabolic enzymes; 2) a database showing the mRNA expression levels for all metabolic enzymes in each renal tubule segment; and 3) a bioinformatics analysis of the enzyme expression pattern among renal tubule segments with a view toward understanding the physiology of the nephron and the actions of vasopressin in the renal collecting duct.
List of metabolic enzymes and biological pathways.
To create a closed list of metabolic enzymes and their corresponding pathways, we used two established metabolic pathway charts from Roche (available at http://biochemical-pathways.com) and Sigma-Aldrich (available at http://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Sigma/General_Information/metabolic_pathways_poster.pdf). These were supplemented with data from Sigma-Aldrich “MiniMaps” associated with the larger metabolic pathways chart (available via http://www.iubmb-nicholson.org/minimaps.html). These charts do not list official gene symbols but rather EC numbers (Sigma-Aldrich) or enzyme names (Roche). These were mapped to gene symbols as described in the following sections.
Roche metabolic pathways chart.
From the Roche metabolic pathways chart (http://biochemical-pathways.com), we manually extracted three data elements, viz. enzyme names along with the classification terms that we refer to as “Sector” and “Subsector” (Fig. 1). Roche Sector terms were determined by the clearly delineated, light blue areas on the map. The Subsector terms were orange, boxed pathways indicated on the chart. The enzyme names were mapped to EC numbers using the ExPASy Enzyme Nomenclature Database (http://enzyme.expasy.org/ or ftp://ftp.expasy.org/databases/enzyme/enzyme.dat). To do this, we simplified the enzyme name strings by eliminating all spaces and punctuation (commas, apostrophes, and dashes) used in the wall chart. For those enzymes that didn’t map to EC numbers via string comparison, a manual search of the ExPASy database was performed to assign EC numbers using modified enzyme-name strings.
Sigma-Aldrich metabolic pathways chart and minimaps.
From the Sigma-Aldrich metabolic pathways chart (http://www.sigmaaldrich.com/content/dam/sigma-aldrich/docs/Sigma/General_Information/metabolic_pathways_poster.pdf), we manually extracted three data elements, viz. EC number, the substrate type, and the general descriptor. The substrate type was determined by the color of the substrate molecules, corresponding to a molecule class in the chart legend (Fig. 2). The general descriptor refers to the terms listed on the margins of the chart, associated with broad regions. For each of the Sigma MiniMaps (http://www.iubmb-nicholson.org/minimaps.html), all listed EC numbers were recorded with the title of the MiniMap as the associated data.
EC number to official gene symbol conversion.
Upon compiling a list of EC numbers mappable to one or both charts, EC numbers were converted to mammalian official gene symbols as follows. The ExPASy Enzyme Nomenclature Database, listing EC numbers and associated UniProt entries, was parsed to link EC numbers to official gene symbols. Those numbers that were not linked to mammalian genes using the ExPASy database were searched manually in NCBI Protein (https://www.ncbi.nlm.nih.gov/protein) to find associated gene symbols. Some enzymes were determined to be bacterial, and were thus eliminated. Paralogs were added to the list of genes (e.g., adding Acss2 when Acss1 and Acss3 were listed), completing the list of mammalian metabolic enzyme gene symbols.
Using EC-to-gene connections, the completed list of mammalian official gene symbols was linked back to the Roche and Sigma chart information. Protein annotations and GI numbers were identified for each official gene symbol using our Automated Bioinformatics Extractor (ABE, https://hpcwebapps.cit.nih.gov/ESBL/ABE/). Using the MitoCarta database of mitochondrial proteins (15) and the Recon2 database (“Virtual Metabolic Human”) (24), we added an additional 333 genes that code for metabolic enzymes or cofactors that were not identified in the mapping from metabolic wall charts.
Bioinformatics analysis and statistics.
Data mining utilized the Biological Information Gatherer (28) (https://big.nhlbi.nih.gov/), the Knowledge Base of Vasopressin Actions in Kidney (21) (https://helixweb.nih.gov/ESBL/TinyUrls/Vaso_portal.html) and the STRING Database version 10.0 (http://string-db.org/). To test whether a given descriptor is present more frequently in association with a given set of genes (vs. a suitably chosen control gene set) we used Fisher’s exact test. We used Medusa (downloadable from https://sourceforge.net/projects/graph-medusa/) to display data in the form of undirected graphs.
RESULTS AND DISCUSSION
Mammalian Metabolic Enzyme Database.
We used the information from two iconic metabolic pathway charts, namely the Roche and Sigma-Aldrich charts, to identify a set of 1,647 mammalian metabolic enzymes that we mapped to official gene symbols (see methods). To create a resource available to other investigators, we created a database webpage called the Mammalian Metabolic Enzyme Database (https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/MetabolicEnzymeDatabase.html).
The Mammalian Metabolic Enzyme Database web site (screen shot in Fig. 3) displays the data in a tabular format in abbreviated form and also offers the complete data set for download (red box A). On the web page, the columns may be sorted by clicking the column headers (red box B). The data may be filtered by user-selected terms, using the “Search” box (red box C). In the example, a filter has been executed by entering the string “adh”, revealing entries that contain that string. The table displays the locations of the enzymes in the Roche and Sigma metabolic pathway charts. Each entry is linked to the corresponding NCBI protein page (red box D) and to the appropriate NCBI BioSystems page (red box E). Gene symbols are formatted as rat gene symbols, but include genes common to most mammals.
Metabolic enzyme expression in renal tubule segments.
A screenshot image of the Database of Metabolic Enzymes in Kidney Tubule Segments (https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/) is presented in Fig. 4. The web site displays the data in a tabular format similar to Fig. 3 (red boxes A–E). A dropdown menu allows users to sort the genes by sector terms from the Roche metabolic pathways chart (red box F). The table displays transcriptomic data for each of 14 renal tubule segments, obtained from the single-tubule RNA-seq studies of Lee et al. (11). The values are in RPKM (reads per kilobase exon length per million reads). The data can be sorted according to mRNA abundance in any designated segment. The data can be downloaded into an electronic spreadsheet to allow users to carry out more complex analyses.
Figure 5 summarizes the distributions along the renal tubule of transcripts for key metabolic enzymes in various pathways (see Fig. 5 legend for segment abbreviations). Four pathways appear to be limited to the proximal tubule segments S1, S2, and S3, viz., fructolysis (Dak, Khk), gluconeogenesis (Fbp1, Pck1, Pck2), glucose dephosphorylation (G6pc), and uric acid metabolism (Xdh), reflecting known metabolic functions of the renal proximal tubule (10). In contrast, hexokinases (Hk1, Hk2, and Hk3) are selectively absent from the proximal tubule segments in accord with the inability of proximal tubule cells to metabolize glucose (10). Instead, proximal tubule cells rely on fatty acids and amino acids for ATP generation. Interestingly, Ptgs1 and Ptgs2 (also known as COX1 and COX2), the enzymes that are rate limiting in prostanoid synthesis, have differing distributions. Ptgs1 is present throughout the collecting duct system and in thin limbs of Henle’s loop, whereas Ptgs2 is only detected in the cortical thick ascending limb, which includes the macula densa. This distribution is similar to that found by Vitzthum et al. (26) by RT-PCR in microdissected rat renal tubule segments.
Figure 6 summarizes the Roche Sector terms whose associated genes were significantly more frequent in a given renal tubule segment than for all segments taken together (Fisher Exact test, P < 0.05). Seven renal tubule segments were found to have no enriched Roche sector terms: S2, LDLIM, cTAL, DCT, CNT, CCD, and OMCD. The Roche sector term “Citrate and Glyoxalate” was significantly enriched in six renal tubule segments: S1, S3, SDL, LDLOM, mTAL, and IMCD. “Amino Acid Metabolism: Leucine, Isoleucine, Valine” was enriched significantly in two renal tubule segments: LDLOM and IMCD, suggesting a special role for branched chain amino acid metabolism in these segments. “Nucleotide Metabolism: Purines” was significantly enriched in only one renal tubule segment, the IMCD. The Roche term “Carbohydrate Metabolism: Inositol” was also significantly enriched in only one renal tubule segment, the thin ascending limb of Henle (tAL).
We also present a similar web page reporting normalized values for all metabolic enzyme mRNAs expressed in biochemically isolated proximal tubules, medullary thick ascending limbs of Henle, and inner medullary collecting ducts from Affymetrix expression array studies in rats (https://hpcwebapps.cit.nih.gov/ESBL/Database/MetabolicEnzymes/MetabolicEnzymesinKidneyTubuleSegments.html). As with the previous web page, data can be selectively sorted or downloaded into a spreadsheet.
Metabolic enzymes regulated by vasopressin.
Vasopressin is a major regulator of transport in the distal nephron and collecting duct. Changes in transport rates may require changes in ATP production, which may be achieved in part through regulation of metabolic enzymes. We used the newly curated Mammalian Metabolic Enzyme Database, described above, to interrogate existing experimental data using the Biological Information Gatherer (BIG) (28), and the Knowledge Base of Vasopressin Actions in Kidney (21) to identify vasopressin-regulated metabolic enzymes in the renal collecting duct (Tables 1 and 2). Table 1 lists mRNA species found to be increased in abundance in response to vasopressin, while Table 2 lists proteins increased in abundance by vasopressin. Interestingly, six genes are listed on both tables (asterisks) indicating coordinate regulation of mRNA and protein levels. [Presumably other mRNA/protein pairs are actually correlated but their correlations not detected due to sensitivity limits on the methods used for transcriptomics and proteomics quantification. However, in a combined transcriptomics/proteomics study in cultured mpkCCD cells, more than one-third of proteins whose abundances were regulated by vasopressin were found to lack changes in the corresponding mRNA species despite sufficient statistical power (8).] Also, the abundance of glyceraldehyde-3-phosphate dehydrogenase (GAPDH) increases in response to vasopressin (Table 2), raising questions about the appropriateness of using measurements of GAPDH to normalize experimental data. Several gene products in Tables 1 and 2 play recognized regulatory roles in collecting duct cells [corticosteroid 11-beta-dehydrogenase, nitric oxide synthase 2 (iNOS), nitric oxide synthase 3 (eNOS), cyclic nucleotide phosphodiesterase 4B]. In addition, Tables 1 and 2 include several enzymes involved in amino acid metabolism (arginase 2, branched chain aminotransferase 1, iNOS, eNOS, glutaminase, glutamate dehydrogenase). Table 3 lists phosphorylation sites that are increased by vasopressin and Table 4 lists proteins whose translation rates are increased by vasopressin in collecting duct cells. Among proteins listed in Table 4, several are also present in Tables 1 and 2 as indicated.
To further classify the vasopressin-regulated gene products, we used STRING (http://string-db.org/) to create a relational network of genes and added labels for specific groups of vasopressin-regulated targets using a combination of terms based on the Mammalian Metabolic Enzyme Database (Fig. 7). In general, this analysis provided evidence compatible with vasopressin-regulated glucose metabolism, fatty acid metabolism and amino acid metabolism that could play a role in the regulation of ATP production. Also interesting is the large number of upregulated glutathione S-transferases (GSTs). GST proteins conjugate the tripeptide glutathione to a variety of substrates, including cysteines present in proteins. We previously demonstrated that vasopressin generally increases protein glutathionylation in collecting duct cells (20). Beyond this, Tamma et al. (23) demonstrated that the vasopressin-regulated water channel aquaporin-2 becomes glutathionylated in collecting duct principal cells.
In this paper, we have curated a database of genes that code for mammalian metabolic enzymes using the iconic Roche and Sigma wall charts as a starting point. To provide an unambiguous data “key” for information lookups, we have converted enzyme lists to a list of official gene symbols, including all paralogous genes that code for enzymes that catalyze the same biochemical reactions. The compiled list contains a total of 1,647 genes, which correspond to ~7 percent of protein coding genes in the mammalian genomes. The gene lists have been made available to investigators via publicly accessible web pages. These databases are designed for bioinformatics analysis of proteomics and transcriptomics data. They can also be used in metabolomics studies, as well, in combination with the Biochemical, Genetic and Genomic (BiGG) Knowledge Base at http://bigg.ucsd.edu/ described by King et al. (9). An examination of expression of these metabolic enzyme genes throughout the body can provide information regarding the metabolic functions performed by different cell types, which may serve to reinforce past studies and prompt new investigations. Applying this concept to a specific biological problem, we have mapped the set of metabolic-enzyme-encoding genes to transcriptomic data in each of the 14 nephron segments. This analysis has allowed us to elucidate expression patterns within the 14 segments that relate to differences in biological function. Furthermore, we have used data mining tools to identify a set of vasopressin-regulated gene products in the renal collecting duct.
The work was funded by the Division of Intramural Research, NHLBI (Project ZA1-HL-001285; M. A. Knepper). C. C. Corcoran was supported by the NHLBI Summer Internship Program (H. Geller, Director).
No conflicts of interest, financial or otherwise, are declared by the author(s).
C.C.C., C.R.G., and M.A.K. performed experiments; C.C.C., C.R.G., T.P., J.P., and M.A.K. analyzed data; C.C.C., C.R.G., T.P., and M.A.K. interpreted results of experiments; C.C.C. and C.R.G. prepared figures; C.C.C. and M.A.K. drafted manuscript; C.C.C., C.R.G., T.P., J.P., and M.A.K. edited and revised manuscript; C.C.C., C.R.G., T.P., J.P., and M.A.K. approved final version of manuscript; M.A.K. conceived and designed research.
We thank Dr. Elizabeth Murphy of the National Heart, Lung, and Blood Institute (NHLBI) for helpful suggestions.
- Copyright © 2017 the American Physiological Society