Appendix 1: Flux balance analysis primer

The computational analysis of genome sequence information is beginning to reveal the complete set of molecular components involved in cellular activities. However, it is clear that cellular functions are intricate, and the integrated function of biological systems involves many complex interactions among the molecular components within the cell. To understand the complexity inherent in cellular networks, approaches that focus on the systemic properties of the network are required ^{1} . The focus of such research represents a departure from the classical *reductionist approach* to the *integrated approach* ^{2} to understanding the interrelatedness of gene function and the role of each gene in the context of multi-genetic cellular functions or *genetic circuits* ^{3,4} .

The engineering approach to analysis and design is to have a mathematical or computer model; e.g. a dynamic simulator, of a cellular process that is based on fundamental physicochemical laws and principles, and there has been a long history of mathematical modeling of metabolic systems, which dates back to the 1960s ^{5-7} . However, the availability of enzyme kinetic information was fragmented and attention turned to developing methods that could shed light on the relative importance of various metabolic events. Methods for sensitivity analysis of metabolic regulation began in the 1960s ^{8} and continued into the 1970s ^{9,10} , leading to the biochemical systems theory (BST) and metabolic control analysis (MCA).

Although the ultimate goal is the development of dynamic models for the complete simulation of cellular systems ^{11} , the success of such approaches has been severely hampered by the current lack of kinetic information on the dynamics and regulation of metabolic reactions. However, in the absence of kinetic information it is still possible to accurately assess the theoretical capabilities and operative modes of metabolic systems using metabolic flux balance analysis (FBA) ^{4,12-16} . FBA is based on the fundamental physicochemical constraints on metabolic networks. FBA only requires information regarding the stoichiometry of metabolic pathways and the metabolic demands; furthermore, FBA can incorporate additional information when it is available. FBA is particularly applicable for post-genomic analysis, because the stoichiometric parameters can be defined from the annotated genome sequence ^{14} . In this appendix, we will describe the basic concepts of FBA and how it relates to genomics.

The complete genome for organisms with a genome size of approximately a few million base pairs can be rapidly sequenced, and currently many are available online (The Institute for Genomic Research). The annotated whole genome sequence of an organism can be used to reconstruct the metabolic network, and this process involves several challenges ^{17-19} .

The first step toward reconstructing the metabolic network is to identify the coding regions or open reading frames (ORFs) within a genomic sequence. Subsequently, each ORF is searched against databases with the goal of identifying homologous genes. Homology often provides the first clues regarding the functionality of a newly sequenced gene. Through such analysis of the genome sequence, a large fraction of the genes can be assigned a putative function. It is to be expected that over the coming years, the ability to identify functionally related genes will improve.

We have constructed a database of known metabolic reactions from the extensive literature regarding the metabolism of *E. coli* ^{20} and several online databases ^{21-23} . The reaction database contains the following information: the substrates, products, and stoichiometry of each metabolic reaction, the name of the enzyme catalyzing the reaction, the genes that code for the respective enzymes, the EC number of each metabolic reaction. The Supplementary Table 1 list of reactions is available online.

All of the metabolic genes in the cell compose a subset of the full genotype. This subset will be referred to as the metabolic genotype of a particular organism, and the *in silico* representation of the metabolic genotype will be referred to as the *in silico* metabolic genotype. The gene products derived from the genes in the metabolic genotype carry out all of the enzymatic reactions and transport processes that occur within the cell. For example, the *E. coli* *in silico* metabolic genotype included the genes involved in central metabolism, amino acid metabolism, nucleotide metabolism, fatty acid and lipid metabolism, carbohydrate assimilation, vitamin and cofactor biosynthesis, energy and redox generation, and macromolecule production (i.e. peptidoglycan, glycogen, RNA, and DNA). A hypothetical metabolic genotype is shown in Figure 2a.

This hypothetical metabolic genotype is used to reconstruct the hypothetical metabolic network (Figure 2b) and to define the stoichiometric matrix (Figure 4).

The basic methodology used to construct the *E. coli* metabolic genotype is defined below. First, the annotated *E. coli* K12 (ref 24 ) genome sequence was searched against our database. This process selected all metabolic reactions from our database. However, there still remained metabolic genes that were identified in *E. coli*. Therefore, genes that were annotated with a metabolic function but not identified in our database were flagged for further investigation. Subsequently, each of the flagged genes was researched (in the literature and the online databases) to determine whether the gene/reaction should be included. Therefore, our database was constructed and updated to be a complete database of the metabolic reactions in *E. coli*.

At this point, a few *E. coli* metabolic genes/reactions were still not included in the metabolic genotype. One reason for this may be uncharacterized genes that perform a known biochemical conversion. Therefore, upon careful review of the existing biochemical literature we added the necessary genes/reactions to the metabolic genotype. A complete list of the reactions is available (Table 1). See Covert *et al* for a more detailed description of this process ^{19} .

All of the information in the metabolic genotype regarding the stoichiometry of the metabolic reactions can be used to give an *in silico* representation of the metabolic network, or the *in silico* metabolic genotype. Given the myriad of details required to model cellular behavior, modeling cellular functions has proved a difficult task. However, given a complete list of the molecular components in a cellular system, we can constrain cellular behavior and define the systemic capabilities/constraints of the metabolic network. The capabilities of the metabolic network can then be analyzed, and the optimal characteristics within the capabilities can be identified. Below, we will discuss the methodology we used to convert the metabolic genotype into an *in silico* representation of the metabolic capabilities/constraints.

The fundamentals of flux balance analysis (FBA) have been reviewed ^{12,13,15} . Below we describe the procedure we used to construct the *in silico* representation of the capabilities/constraints of the *E. coli* metabolic network and discuss the fundamentals of FBA.

A flux balance was written for each metabolite (*X*_{i}) within the metabolic network to yield the dynamic mass balance equation for each metabolite in the network. Figure 1 depicts an example system of fluxes (*V _{syn}, V_{deg}, V_{trans}, V_{use}*) affecting a particular metabolite (

Equation 1

where the subscripts ?*syn?* and ?*deg?* refer to the synthesis and degradation reactions of metabolite *X _{i}*. The metabolic fluxes

Equation 2

where *b _{i}* is the net transport of

Equation 3

where** X** is an

The time constants characterizing metabolic transients are typically very rapid compared to the time constants of cell growth and process dynamics, (e.g, ref 30 ); therefore, the transient mass balances were simplified to only consider the steady-state behavior. Eliminating the time derivative in Equation 3 (assuming a steady-state) and rearranging the equation yielded:

Equation 4

where I is the identity matrix. This equation states that on time scales longer than the doubling time, all the formation, degradation, utilization, and transport fluxes are balanced. Otherwise, significant amounts of the metabolite would accumulate inside the cell.

Not all of the metabolites are capable of transport into or out of the cell; therefore, the **I?b **term was simplified by removing the rows in the **b** vector that correspond to metabolites that are not transported across the cell membrane, forming a vector **b*** _{r}*. Additionally, the corresponding columns in

Equation 5

let_{ }

Therefore, we generated the following equation (Note, in the literature, this equation is written S?v=0 for simplicity):

Equation 6

where S? is the *m ? n?* stoichiometric matrix where *n?* is the total number of fluxes (this includes fictitious fluxes that only transport material across the system boundary). Every metabolite inside the system boundary corresponded to a row in the stoichiometric matrix; however, some of these metabolites were intracellular and some were extracellular (Figures 2b & 4 show how the metabolic network in converted into the stoichiometric matrix while considering extracellular metabolites). The stoichiometric matrix was arranged such that the *m _{i}* internal metabolites were entered first, and then

Equation 7

where **U** is an *m *?* m _{e}* matrix,

Equation 6 defined the mass, energy, and redox potential constraints on the metabolic network; thus effectively defining the capabilities and constraints of the metabolic genotype. All vectors, **v?**, that satisfied Equation 6 (nullspace of **S?**) were steady-state metabolic flux distributions that did not violate the mass, energy, or redox balance constraints. However, many vectors within the nullspace were not physiologically feasible, and additional constraints were placed on the metabolic network.

Equation 6 defined the mass, energy, and redox balance constraints on the metabolic system. Additional constraints were also placed on the metabolic network, and in the limiting case where all the constraints on the metabolic network are known (as well as the initial conditions), the intersection between the nullspace and the region defined by all other constraints may be reduced to a point. Herein, we have considered the stoichiometric constraints (mass, energy, and redox balance constraints), capacity constraints on the exchange fluxes, and a limited set of the physicochemical constraints that includes basic thermodynamics (reversibility and irreversibility of the metabolic reactions).

The capacity and thermodynamic constraints were realized by constraining the value of the flux through the metabolic reactions by using linear inequalities (** _{ }** ;

The exchange fluxes for inorganic phosphate, ammonia, carbon dioxide, sulfate, potassium, and sodium were unrestrained (*a _{i}* = -

The formalism described above constrained the operation of the metabolic network. With this formalism, we have defined the capabilities of the metabolic network, therefore defining what it can and cannot do. The results produced a feasible region in multidimensional space within which the steady-state flux vector, **v?**, must lie. Adding additional constraints can further reduce the size of the space, and if all constraints are considered (including initial conditions), the feasible region may be reduced to a point. Herein, we considered the stoichiometric, capacity, and thermodynamic constraints. These constraints enforced simultaneously, led to the definition of the feasible region that contains all feasible steady-state flux vectors that satisfy the imposed constraints. Within this set, we can find a particular steady-state metabolic flux vector that maximizes/minimizes an objective function.

Herein, we utilized an objective function and linear programming to find a feasible steady-state flux vector that maximizes an objective function. The solution to Equation 6, subject to the inequality constraints, was formulated as a linear programming (LP) problem. Mathematically, the LP problem was stated as;

Equation 8

where *Z* is the objective function that was represented as a linear combination of metabolic fluxes *v _{i}.* For our analysis, the vector

The growth flux was defined in terms of the biosynthetic requirements based on the biomass composition defined in the literature ^{25,27,28} . Thus, biomass generation was defined as a reaction flux draining the intermediate metabolites in the appropriate ratios (Figure 2b & 4), and this flux was defined as the objective function ^{25,29} . A commercially available package was used to solve the LP problem (LINDO, Lindo Systems Inc.).

The methodology for formulating genomically derived *in silico* metabolic genotypes described above provided a computational method for the analysis of the metabolic physiology and the systemic metabolic constraints.

The assignment of gene functions based solely on sequence similarity often provides an incomplete set of metabolic pathways. However, a metabolic pathway reconstruction takes into consideration the comprehensiveness of the entire metabolic network or metabolic pathway ^{17,18,31,32} . Within the framework of metabolic pathway reconstructions, the accuracy of each functional assignment is examined in the context of the holistic function of the metabolic network. For example, an amino acid biosynthesis pathway may be complete except for a single aminotransferase; in this case, it is likely that the respective biochemical activity is assumed by a nonspecific aminotransferase ^{17} . Furthermore, functional assignments for which only a single reaction in a pathway is identified should be further investigated for accuracy.

The metabolic pathway reconstruction that we have used was examined by comparing the known behavior and nutritional requirements of *E. coli* to the *in silico* analysis. We have utilized FBA to assist in our metabolic pathway reconstruction. The capability of the metabolic network to synthesize each metabolite in the biomass requirements on various carbon sources was examined. Inconsistencies between the *in silico* analysis and the experimental observations were further investigated. The complete list of reactions that were used in the analysis is available on the web.

All feasible *E. coli* *in silico* metabolic flux distributions are mathematically confined to the feasible set, which is a region in flux space (?^{n}), where each solution in this space corresponds to a particular internal metabolic flux distribution (or a particular metabolic phenotype) ^{15} . Optimal metabolic behavior under specified growth conditions can be determined from this set of all possible phenotypes using LP.

Phenotype phase planes (PhPPs): PhPPs are essentially two (or three) -dimensional representations of the feasible set and the formalism for constructing the PhPP is briefly discussed next. Two parameters that describe the growth conditions (such as substrate and oxygen uptake rates) were defined as the two axes of the two-dimensional space. The optimal flux distribution was calculated (using LP) for all points in this plane by repeatedly solving the LP problem while adjusting the exchange fluxes defining the two-dimensional space. A finite number of qualitatively different metabolic pathway utilization patterns were identified in such a plane ^{33} , and lines were drawn to demarcate these regions. Each phase is denoted by Pn_{x,y}, where ?n? denotes the number of the demarcated phase (see Figure 5 for an example), and ?x,y? denotes the two uptake rates on the axis of the PhPP. The PhPP can also be generated for a mutant genotype; represented as P^{gene}n_{x,y}.

One demarcation line in the PhPP is defined as the line of optimality (LO). This line represents the optimal relation between respective metabolic fluxes. The LO is identified by varying the x-axis flux and calculating the optimal y-axis flux with the objective function defined as the growth flux ^{33} .

** **

** **

**References**

1. Weng, G., Bhalla, U. S. & Iyengar, R. Complexity in biological signaling systems. *Science* **284**, 92-6 (1999).

2. Kanehisa, M. Databases of biological information. *Trends Guide to Bioinformatics*, 24-26 (1998).

3. Palsson, B. O. What lies beyond bioinformatics? *Nature Biotechnology* **15**, 3-4 (1997).

4. Edwards, J. S. & Palsson, B. O. How will bioinformatics influence metabolic engineering? *Biotechnology and Bioengineering* **58**, 162-169 (1998).

5. Hess, B. & Boiteux, A. Oscillatory organization in cells, a dynamic theory of cellular control processes. *Hoppe-Seylers Zeitschrift fur Physiologische Chemie* **349**, 1567 - 1574 (1968).

6. Tyson, J. J. & Othmer, H. G. The dynamics of feedback control circuits in biochemical pathways. *Progress in Theoretical Biology* **5**, 1 - 62 (1978).

7. Goodwin, B. C. Oscillatory organization in cells, a dynamic theory of cellular control processes. *Academic Press, New York* (1963).

8. Savageau, M. A. Biochemical systems analysis. I. Some mathematical properties of the rate law for the component enzymatic reactions. *J Theor Biol* **25**, 365-9 (1969).

9. Heinrich, R., Rapaport, S. M. & Rapaport, T. A. Metabolic regulation and mathematical models. *Progress in Biophysics and Molecular Biology* **32**, 1 - 82 (1977).

10. Kacser, H. & Burns, J. A. The control of flux. *Symposium for the Society of Experimental Biology* **27**, 65 - 104 (1973).

11. Tomita, M. et al. E-CELL: software environment for whole-cell simulation. *Bioinformatics* **15**, 72-84 (1999).

12. Bonarius, H. P. J., Schmid, G. & Tramper, J. Flux analysis of underdetermined metabolic networks: The quest for the missing constraints. *Trends in Biotechnology* **15**, 308-314 (1997).

13. Edwards, J. S., Ramakrishna, R., Schilling, C. H. & Palsson, B. O. in *Metabolic Engineering* (eds. Lee, S. Y. & Papoutsakis, E. T.) 13-57 (Marcel Deker, 1999).

14. Edwards, J. S. & Palsson, B. O. Systems Properties of the *Haemophilus influenzae *Rd Metabolic Genotype. *Journal of Biological Chemistry* **274**, 17410-17416 (1999).

15. Varma, A. & Palsson, B. O. Metabolic Flux Balancing: Basic concepts, Scientific and Practical Use. *Bio/Technology* **12**, 994-998 (1994).

16. Sauer, U., Cameron, D. C. & Bailey, J. E. Metabolic capacity of *Bacillus subtilis* for the production of purine nucleosides, riboflavin, and folic acid. *Biotechnology and Bioengineering* **59**, 227-238 (1998).

17. Bono, H., Ogata, H., Goto, S. & Kanehisa, M. Reconstruction of amino acid biosynthesis pathways from the complete genome sequence. *Genome Research* **8**, 203-10 (1998).

18. Selkov, E., Maltsev, N., Olsen, G. J., Overbeek, R. & Whitman, W. B. A reconstruction of the metabolism of Methanococcus jannaschii from sequence data. *Gene* **197**, GC11-26 (1997).

19. Covert, M. W. et al. Metabolic Modeling of Microbial Strains *in silico*. *Trends in Biochemical Sciences* **In Press** (2001).

20. Neidhardt, F. C. (ed.) *Escherichia coli and Salmonella: cellular and molecular biology* (ASM Press, Washington, D.C., 1996).

21. Karp, P. D., Riley, M., Paley, S. M., Pellegrini-Toole, A. & Krummenacker, M. EcoCyc: Encyclopedia of *Escherichia coli* genes and metabolism. *Nucleic Acids Research* **26**, 50-3 (1998).

22. Selkov, E., Jr., Grechkin, Y., Mikhailova, N. & Selkov, E. MPW: the Metabolic Pathways Database. *Nucleic Acids Research* **26**, 43-5 (1998).

23. Kanehisa, M. A database for post-genome analysis. *Trends in Genetics* **13**, 375-6 (1997).

24. Blattner, F. R. et al. The complete genome sequence of *Escherichia coli* K-12. *Science* **277**, 1453-74 (1997).

25. Pramanik, J. & Keasling, J. D. Stoichiometric model of *Escherichia coli* metabolism: Incorporation of growth-rate dependent biomass composition and mechanistic energy requirements. *Biotechnology and Bioengineering* **56**, 398-421 (1997).

26. Varma, A. & Palsson, B. O. Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type *Escherichia coli* W3110. *Applied and Environmental Microbiology* **60**, 3724-3731 (1994).

27. Neidhardt, F. C., Ingraham, J. L. & Schaechter, M. *Physiology of the bacterial cell *(Sinauer Associates, Inc., Sunderland, MA, 1990).

28. Ingraham, J. L., Maalce, O. & Neidhardt, F. C. Growth of the bacterial cell. *Sinauer associates Inc., Sutherland, Massachusetts* (1983).

29. Varma, A. & Palsson, B. O. Metabolic capabilities of *Escherichia coli*: II. Optimal growth patterns. *Journal of Theoretical Biology* **165**, 503-522 (1993).

30. Vallino, J. & Stephanopoulos, G. Metabolic Flux Distributions in *Corynebacterium glutamicum* During Growth and Lysine Overproduction. *Biotechnology and Bioengineering* **41**, 633-646 (1993).

31. Overbeek, R., Larsen, N., Smith, W., Maltsev, N. & Selkov, E. Representation of function: the next step. *Gene* **191**, GC1-GC9 (1997).

32. Selkov, E. et al. The metabolic pathway collection from EMP: the enzymes and metabolic pathways database. *Nucleic Acids Res* **24**, 26-8 (1996).

评论这张

<#--最新日志，群博日志-->
<#--推荐日志-->
<#--引用记录-->
<#--博主推荐-->
<#--随机阅读-->
<#--首页推荐-->
<#--历史上的今天-->
<#--被推荐日志-->
<#--上一篇，下一篇-->
<#-- 热度 -->
<#-- 网易新闻广告 -->
<#--右边模块结构-->
<#--评论模块结构-->
<#--引用模块结构-->
<#--博主发起的投票-->

## 评论