Chapter 9 Web Topics

9.1 Equations of Change and Evolutionary Models

Introduction

A standard way to study dynamic systems is to derive a suitable equation that uses current values of one or more variables in the system (right-hand side) to predict the variable values in the next time interval (left-hand side). One then assigns initial values to the system variables, and uses the equation iteratively to track the trajectory of the system over successive time intervals. Because the output values for one application of the equation are then used as the inputs for the next iteration, the process is said to be recursive and the relevant equation is called a recursion equation. The right hand side of a full equation combines the current values of the variables with a rate of change term. In many cases, modelers make the rate of change term the left side and put the variables that affect that rate of change on the right side. This is defined as a difference equation when time intervals are discrete, and a differential equation for continuous time analysis. The variable values in the next time interval are obtained by adding the rate term to the prior values if using a difference equation, or by integrating over a time period given certain initial values if using a differential equation. A good introduction to the use of recursion and rate equations in ecological and evolutionary biology is provided by Nowak (2006).

Evolutionary modeling often uses recursion equations to track potential evolutionary trajectories. Approaches differ in whether the right hand side of the equation includes both genetic and environmental terms, and if both are present, how they are defined and weighted. They may also differ in whether they limit consideration to linear effects or instead allow for nonlinear processes (see Web Topic 2.8).

Some approaches use recursion equations to identify stable equilibria by looking for values of the system variables at which the rate of change equals zero and slight perturbations off the equilibrium will return to the equilibrium point. This is typical of evolutionary game theory modeling and some applications of quantitative genetics. In contrast, practitioners of adaptive dynamics are interested in all trajectories, whether they lead to equilibrium points or not. A number of fairly sophisticated tests can be applied to recursion equations to predict types of trajectories without having to actually track examples.

Below, we provide an introduction to several types of recursion equations used to model evolutionary processes. Each is given in the rate of change format. Only basic introductions to the methods are provided; all of them can become quite complicated as more details are added. Interested readers can consult the cited references for examples.

Quantitative Genetics

The key recursion equation in quantitative genetics is the breeder’s equation (Bulmer 1980; Falconer 1996). When only a single continuously variable phenotypic trait is considered, this is

Δz = h²S

where the response to selection, Δz, is the change in the mean phenotypic value of the trait between the parental and the offspring generations, h², is the narrow-sense heritability of the trait (measured as the ratio of additive genetic variation in the trait to the overall phenotypic variation in the trait), and S is the selection differential measured classically as the difference in the mean trait value for those that breed and the mean trait value in the parental generation.

This equation ignores the likely possibility that a focal trait might be affected both by direct selection on it and by indirect effects propagated to it by genetic correlations with other traits also under selection. Relevant genetic correlations would include pleiotropy and linkage disequilibrium. Assuming weak selection, resilience of the additive genetic variance in the face of selection, and roughly normal (Gaussian) distributions of trait values, a polygenic version of the breeder’s equation is

Δz = GP⁻¹S

where Δz is now a vector containing the changes in mean trait value for each of a set of genetically correlated traits, G, is a square matrix containing the additive genetic variance for each trait along the main diagonal and the additive genetic covariances between traits in off-diagonal entries, P^-1 is the inverse of an equivalent matrix for phenotypic variation, and S is a vector containing the direct selection differentials on each trait (Lande 1979, 1980, 1981). The latter are typically computed as the covariance between each trait’s value and relative fitness. The equation can be further simplified by combining the product of P^-1S = β. β can then be interpreted as a vector of partial regression coefficients of each trait on relative fitness. The values of β can also be used to create an adaptive landscape with a particular combination of phenotypic trait values defining a location in that landscape and the height of the landscape at that point indicating relative fitness (Steppan et al. 2002; Arnold et al. 2008). For this reason, β is referred to as a selection gradient in contrast to the vector of selection differentials in the polygenic S. Evolutionary trajectories that lead the mean of a population to a peak or ridge in that landscape are likely to end there since on top of a peak, β = 0. This defines evolutionary equilibria in quantitative genetics models.

Ideally, this equation would be applied recursively to predict evolutionary trajectories of traits or set to zero to identify equilibria. One concern, however, is the degree to which G (called the G-matrix) remains untouched by selection between generations (Dieckmann et al. 2006; Pigliucci 2006). Considerable effort is currently underway to measure and compare G-matrices in real systems, use simulations to see under which conditions the G-matrix might or might not be stable, and otherwise test the assumptions of the polygenic recursion equation (Steppan et al. 2002; Revell 2007; Roff 2007; Arnold et al. 2008; Calsbeek and Goodnight 2009). An alternative is to relax the assumptions used in the classic polygenic equation, and use a series of concurrent recursion equations to track both phenotypic and genotypic changes across generations (Barton and Turelli 1991; Bürger 1991; Christiansen 2000). This takes extensive computation, but increasingly robust methods are now available including a set of functions for use in the commercial program Mathematica (Kirkpatrick et al. 2002).

Adaptive Dynamics

Recursion equations have long been used in ecology to describe not only the equilibrium points but also the dynamics of competition, predator/prey cycles, speciation, and community stability (Ellner 2006; Pastor 2008). It was only natural that the initial efforts to identify equilibria in behavioral strategies (e.g., evolutionarily stable strategies [Maynard Smith 1982]) would be extended to track the dynamics of such systems (Metz et al. 1996; Hofbauer and Sigmund 1998; Nowak 2006). This has in turn encouraged researchers to seek general recursion equations that could be applied to all types of evolutionary processes. One candidate is the canonical equation of adaptive dynamics (Dieckmann and Law 1996). This differential equation assumes that evolution is ultimately limited by low rates and low magnitudes of mutations. This assumption ensures ample time for concomitant ecological dynamics to play out before significant mutational change occurs. It thus divides evolution into two successive time scales and simplifies the mathematics. The canonical equation for a single continuous trait divides the rate of change in the average trait value (z) into two multiplicative components: an overall evolutionary rate coefficient, k, and a selection derivative, D:

\frac{dz}{dt} = kD

where k equals a scaled product of equilibrium population size, mean mutation rate/birth, and the variance in mutation rate for that trait, and D equals the change in fitness resulting from deviations of the trait value away from the current population mean. The latter is usually written as a partial derivative of fitness with respect to trait values and can be envisioned graphically as the slope of an adaptive surface in the vicinity of the current population mean. D is thus similar to the β of quantitative genetics. If D = 0, the population is at a singular point which may or may not be an equilibrium. In fact, at least eight theoretical, and in practice six realizable, types of singular points are possible for even a single continuous trait (Geritz et al. 1997; Apaloo et al. 2009). An evolutionarily stable strategy (ESS) is only one of these: it is defined as an outcome in which the most common strategy has the trait values defined by the singular point and no alternative strategy when rare can invade that population (Maynard Smith 1982). Another possible outcome is a convergently stable strategy (Eshel 1983). This is a singular point at which any mutant whose strategy is more similar to the singular point than the currently common strategies in a population can invade that population. Adaptive dynamics analyses show (surprisingly) that an ESS need not be convergently stable, and a convergently stable singular point need not be an ESS (Eshel 1983; Eshel and Feldman 1984; Geritz et al. 1997; Geritz et al. 1998). Another surprising outcome is possible bifurcation or branching of a population near a singular point into two coexisting strategies in the form of a stable polymorphism or even speciation. Simply looking for stable equilibria would usually miss most of these additional evolutionary possibilities.

As with the breeder’s equation, the canonical equation of adaptive dynamics can be generalized to accommodate simultaneous tracking of multiple traits. In the multiple trait version, dz/dt is now the rate of change in a vector (z) of values for the set of traits, the mutation rate in k depends on the values of the entire current trait vector (z) and the variance in the single-trait equation is here replaced by a variance/covariance matrix describing mutational correlations between traits. D is now a multidimensional measure of the fitness gradient around the current population mean (Dieckmann et al. 2006; Durinx et al. 2008; Apaloo and Butler 2009; Leimar 2009). Methods have been developed that allow one to predict patterns of stability or instability simply by examining the form of D (Leimar 2009).

This approach has been criticized because the rates of mutation assumed in the models are usually far smaller than is actually found in real systems (Abrams 2001, 2005; Barton and Polechova 2005; Abrams 2009). In addition, studies of quantitative trait loci indicate that some important traits depend on only a few genes and at least some of these appear to have arisen through mutations of large effect (Lynch and Walsh 1998; Roff 2007; Kelly 2009). However, the fact that applications of adaptive dynamics often give predictions similar to those of quantitative genetics suggests that its assumptions may not be that limiting (Leimar 2009).

The Price Equation

The Price equation (Price 1970, 1972) has quite general applicability. It is used to dissect the difference in average values of some property in two populations into additive components. The focal property considered can be anything: alleles, phenotypic traits, learned behaviors, types of poetry, etc. In evolutionary applications, the relevant populations are a parental generation and an offspring generation, and the property is usually some index of allele frequencies or phenotypic traits. The general form of the equation is

Δz = Cov(w,z) + E(wΔz)

where z is the value of the property of interest in a parental individual, w is that individual's relative fitness (e.g., its absolute fitness divided by the mean fitness of the parental population), Δz is the difference between a parent's property value and that of its offspring, and Δz is the difference in mean property value between the offspring and parental populations. The Cov(w,z) term is the covariance between relative fitness and property values in the parental population, and is thus equivalent to the selection component of the breeder's equation and adaptive dynamics.

The second term, E(wΔz), is called the transmission component, and is the weighted average (based on relative fitnesses) of the difference between a parent's property value and that of its offspring. If the property is allele frequencies, and parents produce gametes that accurately reflect their own allelic frequencies, random variations away from parental patterns summed across all parents in the population will usually cancel out and the second term will be negligible (Frank 1997; Grafen 2000,2006,2007b). Where this is true, the Price equation and phenotypic approaches such as evolutionary game theory converge on a common strategy for predicting evolution: in both cases, selection can then be described as an optimization process in which relative fitness is the payoff being maximized (Grafen 2000, 2002, 2007b, 2008). The transmission component will not be negligible if a species has sex ratio or other allelic distorters during gamete production. It then must be included in modeling. It can also serve as an “error term” that provides better matches between model predictions and real systems when the underlying genetics are complicated. For example, by defining the relevant property as some higher power version of allelic frequencies, the transmission term can be used to include epistasis, nonrandom mating, and other processes into models (Frank and Slatkin 1990; Frank 1995, 1997); most quantitative genetic and adaptive dynamics models avoid these complications. Finally, because the Price equation is linear and thus additive, either the selection term or the transmission term (or both) can be further partitioned into additive sub-components to build more complicated models. For example, the selection term can be divided into group and individual components for modeling group selection processes (Wilson 1975; Wade 1985; Queller 1992).

There is a cost to the great flexibility of the Price equation: its application only provides information about the mean property values in the next generation; it cannot compute the variances or covariances in the offspring generation that would be required to predict a subsequent generation (Barton and Turelli 1987). This has led a number of critics to question the utility of the Price equation in evolutionary modeling (Lewontin 1974; Gould and Lewontin 1979; Ewens 2004). However, it should be pointed out that the only reason alternative approaches such as quantitative genetics and adaptive dynamics can plot multiple generation trajectories is because they make some rather stringent assumptions about the underlying genetics (e.g., constancy of the G-matrix in quantitative genetics and low mutation rates in adaptive dynamics).

A clear example of how to use the Price equation for a single locus situation can be found in Box 3, p. 1246 of (Grafen 2007b). For a two-allele locus in a diploid animal, Grafen suggests assigning each individual i a “p-score,” p_i, which takes the value 1 if that individual has the same focal allele on both chromosomes, 0.5 if it has the focal allele on only one chromosome, and 0 if it lacks the focal allele on both chromosomes. The left side of the Price equation, Δp, then represents the change in the mean value of the focal allele in the population across generations. Like the other methods, the single trait version of p scores can be generalized by tracking the dynamics of several loci concurrently. Each individual has only one relative fitness value, w_i, no matter how many loci and associated alleles are considered. It is thus feasible to expand the computation of each individual’s p-score as a linear sum of the values assigned to alleles at each of many loci (Grafen 2008). This sum will be similar to the combined additive genetic components of traits used in quantitative genetics models. Because the Price equation is linear and additive throughout, it is possible to accommodate uncertainty into appropriate terms by weighting alternative values by the probabilities each will occur, and adding these products together to obtain an expected value (Grafen 2000). Extensions of the Price equation approach have also been developed to accommodate cooperative behavior among relatives, different classes of individuals (age, sex, or status) within evolving populations, social interactions in networks, and evolutionary competition between groups (Frank 1997, 1998; Grafen 2006, 2007c, a; Grafen and Archetti 2008; Frank 2009; Gardner and Grafen 2009; Grafen 2009).

Literature Cited

Abrams, P.A. 2001. Adaptive dynamics: Neither F nor G. Evolutionary Ecology Research 3: 369–373.

Abrams, P.A. 2005. ‘Adaptive Dynamics’ vs. ‘adaptive dynamics.’ Journal of Evolutionary Biology 18: 1162–1165.

Abrams, P.A. 2009. Analysis of evolutionary processes: The adaptive dynamics approach and its applications. American Journal of Human Biology 21: 714–715.

Apaloo, J., J.S. Brown and T.L. Vincent. 2009. Evolutionary game theory: ESS, convergence stability, and NIS. Evolutionary Ecology Research 11: 489–515.

Apaloo, J. and S. Butler. 2009. Evolutionary stabilities in multidimensional-traits and several-species models. Evolutionary Ecology Research 11: 637–650.

Arnold, S.J., R. Burger, P.A. Hohenlohe, B.C. Ajie and A.G. Jones. 2008. Understanding the evolution and stability of the G-matrix. Evolution 62: 2451–2461.

Barton, N.H. and J. Polechova. 2005. The limitations of adaptive dynamics as a model of evolution. Journal of Evolutionary Biology 18: 1186–1190.

Barton, N.H. and M. Turrelli. 1987. Adaptive landscapes, genetic distance and the evolution of quantitative characters. Genetical Research 49:157–173.

Barton, N.H. and M. Turelli. 1991. Natural and sexual selection on many loci. Genetics 127: 229–255.

Bulmer, M.G. 1980. The Mathematical Theory of Quantitative Genetics. Oxford UK: Clarendon Press.

Bürger, R. 1991. Moments, cumulants, and polygenic dynamics. Journal of Mathematical Biology 30: 199–213.

Calsbeek, B. and C.J. Goodnight. 2009. Empirical comparison of G matrix test statistics: finding biologically relevant change. Evolution 63: 2627–2635.

Christiansen, F.B. 2000. Population Genetics of Multiple Loci. New York NY: John Wiley and Sons.

Dieckmann, U., M. Heino and K. Parvinen. 2006. The adaptive dynamics of function-valued traits. Journal of Theoretical Biology 241: 370–389.

Dieckmann, U. and R. Law. 1996. The dynamical theory of coevolution: A derivation from stochastic ecological processes. Journal of Mathematical Biology 34: 579–612.

Durinx, M., J. Metz and G. Meszena. 2008. Adaptive dynamics for physiologically structured population models. Journal of Mathematical Biology 56: 673–742.

Ellner, S.P. 2006. Dynamic models in biology. Princeton NJ: Princeton University Press.

Eshel, I. 1983. Evolutionary and continuous stability. Journal of Theoretical Biology 103: 99–111.

Eshel, I. and M.W. Feldman. 1984. Initial increase of new mutants and some continuity properties of ESS in two-locus systems. The American Naturalist 124: 631–640.

Ewens, W.J. 2004. Mathematical population genetics. I. Theoretical introduction. Berlin Germany: Springer.

Falconer, D.S. 1996. Introduction to quantitative genetics, 4th Edition. Harlow UK: Pearson Education Limited.

Frank, S.A. 1995. George Price’s contributions to evolutionary genetics. Journal of Theoretical Biology 175:373–388.

Frank, S.A. 1997. The Price Equation, Fisher’s fundamental theorem, kin selection, and causal analysis. Evolution 51: 1712–1729.

Frank, S.A. 1998. The Foundations of Social Evolution. Princeton NJ: Princeton University Press.

Frank, S.A. 2009. Natural selection maximizes Fisher information. Journal of Evolutionary Biology 22: 231–244.

Frank, S.A. and M. Slatkin. 1990. The distribution of allelic effects under mutation and selection. Genetical Research 55: 111–117.

Gardner, A. and A. Grafen. 2009. Capturing the superorganism: a formal theory of group adaptation. Journal of Evolutionary Biology 22: 659–671.

Geritz, S.A.H., E. Kisdi, G. Meszena and J.A.J. Metz. 1998. Evolutionarily singular strategies and the adaptive growth and branching of the evolutionary tree. Evolutionary Ecology 12: 35–57.

Geritz, S.A.H., J.A.J. Metz, E. Kisdi and G. Meszena. 1997. Dynamics of adaptation and evolutionary branching. Physical Review Letters 78: 2024–2027.

Gould, S.J. and R.C. Lewontin. 1979. Spandrels of San Marco and the panglossian paradigm-a critique of the adaptationist program. Proceedings of the Royal Society of London Series B-Biological Sciences 205: 581–598.

Grafen, A. 2000. Developments of the Price equation and natural selection under uncertainty. Proceedings of the Royal Society of London Series B-Biological Sciences 267: 1223–1227.

Grafen, A. 2002. A first formal link between the price equation and an optimization program. Journal of Theoretical Biology 217: 75–91.

Grafen, A. 2006. Optimization of inclusive fitness. Journal of Theoretical Biology 238: 541–563.

Grafen, A. 2007a. Detecting kin selection at work using inclusive fitness. Proceedings of the Royal Society of London Series B-Biological Sciences 274: 713–719.

Grafen, A. 2007b. The formal Darwinism project: a mid-term report. Journal of Evolutionary Biology 20: 1243–1254.

Grafen, A. 2007c. An inclusive fitness analysis of altruism on a cyclical network. Journal of Evolutionary Biology 20: 2278–2283.

Grafen, A. 2008. The simplest formal argument for fitness optimization. Journal of Genetics 87: 421–433.

Grafen, A. 2009. Formalizing Darwinism and inclusive fitness theory. Philosophical Transactions of the Royal Society B-Biological Sciences 364: 3135–3141.

Grafen, A. and M. Archetti. 2008. Natural selection of altruism in inelastic viscous homogeneous populations. Journal of Theoretical Biology. 252: 694–710.

Hofbauer, J. and K. Sigmund. 1998. The Theory of Evolution and Dynamical Systems. Cambridge UK: Cambridge University Press.

Kelly, J.K. 2009. Connecting QTLS to the G-matrix of evolutionary quantitative genetics. Evolution 63: 813–825.

Kirkpatrick, M., T. Johnson and N. Barton. 2002. General models of multilocus evolution. Genetics 161: 1727–1750.

Lande, R. 1979. Quantitative genetic analysis of multivariate evolution, applied to brain-body size allometry. Evolution 33: 314–334.

Lande, R. 1980. The genetic covariance between characters maintained by pleiotropic mutation. Genetics 94: 203–215.

Lande, R. 1981. Models of speciation by sexual selection on polygenic traits. Proceedings of the National Academy of Sciences of the United States of America 75: 3721–3725.

Leimar, O. 2009. Multidimensional convergence stability. Evolutionary Ecology Research 11: 191–208.

Lewontin, R.C. 1974. The Genetic Basis of Evolutionary Change. New York NY: Columbian University Press.

Lynch, M. and B. Walsh. 1998. Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer Associates.

Maynard Smith, J. 1982. Evolution and the Theory of Games. Cambridge UK: Cambridge University Press.

Metz, J.A.J., S.A.H. Geritz, G. Meszéna, F.J.A. Jacobs and J.S. van Heerwaarden. 1996. Adaptive dynamics, a geometrical study of the consequences of nearly faithful reproduction. In, Stochastic and Spatial Structures of Dynamical Systems (van Strien, S.J. and S.M. Verduyn Lunel, eds.). Amsterdam, Netherlands: North-Holland Publishing Company. pp. 183–231.

Nowak, M.A. 2006. Evolutionary Dynamics: Exploring the Equations of Life. Cambridge, MA: Belknap (Harvard University) Press.

Pastor, J. 2008. Mathematical Ecology of Populations and Ecosystems. Oxford UK: Wiley-Blackwell.

Pigliucci, M. 2006. Genetic variance-covariance matrices: A critique of the evolutionary quantitative genetics research program. Biology and Philosophy 21: 1–23.

Price, G.R. 1970. Selection and covariance. Nature 227: 520–521.

Price, G.R. 1972. Extension of covariance selection mathematics. Annals of Human Genetics 35: 485–490.

Queller, D.C. 1992. Quantitative genetics, inclusive fitness, and group selection. The Amercian Naturalist 139: 540–558.

Revell, L.J. 2007. The G matrix under fluctuating correlational mutation and selection. Evolution 61: 1857–1872.

Roff, D.A. 2007. A centennial celebration for quantitative genetics. Evolution 61: 1017–1032.

Steppan, S.J., P.C. Phillips and D. Houle. 2002. Comparative quantitative genetics: evolution of the G matrix. Trends in Ecology and Evolution 17: 320–327.

Wade, M.J. 1985. Soft selection, hard selection, kin selection, and group selection. The American Naturalist 125: 61–73.

Wilson, D.S. 1975. A theory of group selection. Proceedings of the National Academy of Sciences of the United States of America 75: 420–422.

9.2 Computing the Value of Information

Introduction

Animals rely on a variety of sources of information in making a decision about subsequent actions. Likely sources include genetic biases, probabilities of events or conditions occurring based on prior sampling, concurrent or recent cue information, and signals. Some sources of information are more costly to exploit than others. We expect evolutionary selection to favor receiver attention to combinations of sources where benefits outweigh costs on average. The value of information provides a measure to compare two or more alternative combinations of information source. In biology, it is usually given as the difference in average fitness between two alternative strategies. The concept originated in economics (Savage 1954; Good 1966; Raiffa 1968; Gould 1974; Chavas and Pope 1984; Ramsey 1990), but has recently been applied to a number of problems in evolutionary biology (Stephens 1989; Bradbury and Vehrencamp 2000; Koops 2004; Lachmann and Bergstrom 2004; Szalai and Szamado 2009; Donaldson-Matasci et al. 2010; McLinn and Stephens 2010; McNamara and Dall 2010; Schmidt et al. 2010).

In the context of animal communication, one might want to compute the value of information when an animal uses a particular signal set versus when that same party does not use it. In this case, we would compare the average fitness of relying on that signal set with the average fitness derived from using some non-signal combination of information sources to make the same decisions. Alternatively, we might consider the value of information provided by one type of signal set, say continuous and graded signals, when compared with an alternative discrete set. Each strategy would provide its own average amounts of information, have its own reliabilities, and result in different performance costs. The value of information would combine the benefits and costs of each strategy and compute the difference between them. In the following sections, we first apply the value of information to compare reliance by a receiver on a discrete dyadic signal set with the alternative strategy of ignoring those signals. This will demonstrate some of the utility of the approach. We will then review more formal and complex applications and conclude with some Matlab M-files for computing the value of information with more than two alternative signals and actions.

The value of information for discrete dyadic signal sets

Consider a situation in which a receiver must make a decision whether to perform action A1 or A2. A1 yields a higher payoff when condition C1 is true, and A2 has the highest payoff when condition C2 is true. We shall assume that this information is already available to the receiver. The actual prior probabilities that C1 and C2 occur can be denoted by p for C1 and (1–p) for C2. We can summarize this information in a payoff matrix with the priors indicated in a lower row:

Different receivers may have access to different information prior to making a decision. A receiver that ignores signals and cues would rely only on its best estimates, based on prior sampling, that C1 and C2 occur on average. If it does this well, these will be close to the prior probabilities listed in the table. A receiver that uses its prior estimates and any recent cues might have somewhat different information on which to base its decision. And a receiver that has access to signals may have different information. Each type of receiver then makes a decision on what to do next (A1 or A2). We shall denote the fraction of time it makes the correct decision, (e.g. the one that gives the highest payoff), when C1 is true by ϕ₁ and the fraction of time it makes the right decision when C2 is true by ϕ₂. We can add this to our table as follows:

The average payoff to a receiver making a decision given the variables in this table is:

PO = p(ϕ₁R₁₁+(1–ϕ₁)R₂₁) + (1–p)(ϕ₂R₂₂+(1–ϕ₂)R₁₂)

The values of the priors and payoffs are assumed to be the same for all receiver strategies; what differs are the values of ϕ₁and ϕ₂resulting from their particular access to information. We can compute the value of information for any two strategies by finding the difference in the average payoffs (using the above formula) for the two strategies.

Suppose we have two alternative strategies that differ in the nature and amount of information used in making the same decision. We assume the payoffs of different actions and the prior probabilities of conditions C1 and C2 are fixed. Suppose receiver strategy S1 uses one set of information sources at some cost of getting that information of –K₁, (where K₁≥ 0) and makes correct decisions fractions ϕ₁and ϕ₂of the time. Its average payoff per decision is:

PO(S1) = p(ϕ₁R₁₁+(1 – ϕ₁)R₂₁) + (1 – p) (ϕ₂R₂₂+(1 – ϕ₂)R₁₂) – K₁

Other receivers use a different strategy S2 which relies on a different set or weighting of information sources at a different sampling cost –K₂resulting in different probabilities of correct choices θ₁ and θ₂. Their average payoff is:

PO(S2) = p(θ₁R₁₁+(1 – θ₁)R₂₁) + (1 – p) (θ₂R₂₂+(1 – θ₂)R₁₂) – K₂

The value of using different information, say for using S2 instead of S1, is then:

VI(S2,S1) = PO(S2) – PO(S1)
= p(θ₁ – ϕ₁) (R₁₁ – R₂₁) + (1 – p)(θ₂ – ϕ₂) (R₂₂ – R₁₂) – (K₂ – K₁)

We can simplify this further by letting ΔR₁ = (R₁₁– R₂₁) and ΔR₂ = (R₂₂– R₁₂).

If VI(S2,S1) is positive, then S2 will be favored by selection over S1. If it is negative, then S1 is favored over S2. By definition, p, (1 – p), ΔR_1,ΔR_2,K₁, and K₂ are all ≥0. Therefore, VI(S2,S1) can only be positive if at least one of (θ₁ – ϕ₁), (θ₂ – ϕ₂), or (K₂ –K₁) is also positive. And that term, in combination with the corresponding prior probabilities and payoff differences of right versus wrong decisions must be large enough to offset any other negative terms. The terms (θ₁ – ϕ₁) and (θ₂ – ϕ₂) are the differences in reliabilities for the two strategies when C1 and C2 respectively are true. Reliabilities depend upon the degree to which alternative cues and signals are correlated with conditions, the levels of distortion during stimulus propagation, the sensory abilities of the receiver, its ability to classify stimuli, its relative weighting of alternative stimuli such as cues and signals, its method of updating probabilities given recent stimuli (Bayesian updates, linear operators, rules-of-thumb, etc.) and any associated biases, its estimates of relative payoffs, and its rules for making decisions. Any of these could vary for the two strategies.

Signaling versus ignoring signals

Let us now apply the general result above to a specific situation. Consider two classes of receivers of the same species. They are faced with the same decision, have exactly the same prior estimates of which condition is most likely true, and would get the same payoffs for each combination of action and condition. Strategy S1 ignores all signals. As discussed in the text, it will optimally invoke a “red line” approach in which it always does A1 if its current estimate that C1 is true is above a threshold value, and it always does A2 if it is not. In the text, the adopted action is called the “default strategy.” The value at which the red line is set depends on the relative differences in payoff between correct versus wrong decisions for the two conditions (see Web Topic 8.1). Suppose that prior sampling, recent cues, and current payoff value assessments place the current estimated probability that C1 is true above the red line. Receivers using the S1 strategy will always do A1. This means that it always makes a correct decision when C1 is true, but always makes the wrong decision when C2 is true. For this strategy, ϕ₁ = 1 and ϕ₂= 0 (note that we could have assumed that A2 was the preferred action for S1 receivers; the conclusions below would be similar).

Now consider an alternative strategy S2 which has access to all the same prior and cue information that an S1 receiver uses, but in addition, S1 receivers attend to signals produced by senders that are correlated with whether C1 or C2 is currently true. Again, we denote the reliabilities of these receivers’ decisions by θ₁ and θ₂. We shall also assume that S1 receivers expend the same costs as S1 receivers to monitor cues and sample past events. The critical difference is then the additional costs that an S2 receiver expends to attend to and process signals.

The value of information for a receiver that switches from S1 to S2 will then be:

VI(S2,S1) = p(θ₁ – 1)ΔR₁ + (1 – p)(θ₂ – 0)ΔR₂ – (K₂ – 0)

The first term on the right hand side of this expression will be negative unless reliability using signals is 100%, in which case the term will equal zero. This makes sense as a receiver using S1 will always make the correct choice when C1 is true whereas one using signals, which are usually imperfect, will have a reliability less than 100%. Thus an S1 receiver that switches to attending to signals will make more errors when C1 is true than it did before. On the other hand, the second term in the expression is likely to be positive since an S1 receiver never makes the right decision when C2 is true, whereas an S2 receiver at least gets it right θ₂ of the time. The third term will be negative as long as there are additional costs of attending to signals. For S2 to be favored by selection, the middle term must more than compensate for the two negative terms.

Note that increasing overall reliability, for example by improving a receiver’s sensory equipment or spending more time sampling signals, will increase both θ₁ and θ₂. This will increase the magnitude of the positive term while reducing the magnitude of the first negative term. Taken alone, these effects will increase the likelihood that S2 will be favored over S1. However, it should be obvious that some minimal level of reliability must be attained before the positive term exceeds the negative one. In addition, it is likely that the costs of participating in communication (–K₂) will have to rise to accomplish any improved reliability. Whether S2 is favored or not will then depend on which terms rise faster.

We can show both points graphically as follows. Suppose the reliabilities are the same for C1 and C2: e.g., let Q = θ₁ = θ₂, and let us ignore the cost term for the moment. Any benefit of using signaling over ignoring signaling can then be written as:

B(S2, S1) = p(Q – 1)ΔR₁ + (1 – p)(Q – 0) ΔR₂

or rewriting

B(S2, S1) = Q(pΔR₁ + (1 – p) ΔR₂) – pΔR₁

This is an equation for a straight line relating B(S2,S1) to Q with slope of pΔR₁ + (1 – p) ΔR₂) and intercept – pΔR_1.On a graph (red line) this is:

This shows that average reliability must be at least equal to Q_c before there can be any benefit to attending to signals.

Now we can add the absolute value of the costs, K₂, to the same graph. Costs are likely to rise with Q in an accelerating way: it may not take much investment to improve reliability a bit, but the remaining error becomes increasingly costly to remove. Adding such costs (blue line) to the benefit plot gives:

The value of information to an S2 receiver is the difference between the red curve and the blue curve. In this example, it is maximal at an intermediate reliability value (Q_opt) which is greater than the minimal threshold value at which the value of information becomes positive. If the blue curve were to rise faster, this would push the optimal Q to lower values, and eventually the red curve would be entirely beneath the blue curve: costs of improved reliability rise faster than the benefits and selection will not favor S2 receivers over S1 receivers.

We can derive similar functions, and similar graphs for senders. The value of information for senders is similar to that of senders except for the relative payoffs of right and wrong decisions and the costs of participating in communication; prior probabilities and the reliabilities for a given signal exchange would be the same. Because the payoffs differ, the slope and intercept for the benefits line may differ between senders and receivers. Similarly, the cost curve for increasing Q is likely to differ for the two parties. This can create a different optimum Q, and thus different optimum investments for sender and receiver. See Bradbury and Vehrencamp (2000) and Koops (2004) for further details on this type of model.

More complex models and alternative applications

Here, we summarize several recent publications using the value of information in behavioral ecology and ecology. Stephens (1989) resurrected an earlier paper in economics by Gould (1974) defining the value of information. Whereas our example above assumed discrete conditions, signals, and actions, Stephens lets alternative actions be continuous (at least up to an interval scale), allowing him to approximate the costs of small deviations from the optimal action for each condition with a Taylor expansion. His analysis shows that the value of information depends upon the variance in optimal actions and, verifying a claim by Gould, not on the variance of alternative conditions (unless conditions and actions are sufficiently linked that increasing condition variance automatically increases optimal action variance). His approach allows one to predict whether the value of information increases or decreases if the relative shapes and heights of the payoff function curves for different conditions are varied.

Koops (2004) adopted Stephen’s notation and approach and divided the value of information into the value of perfect information and the value of incorrect information. This allowed him to examine the ratio of additional benefits to additional costs experienced by an animal relying on signals when compared to one using a default strategy. He then computed the minimum reliability required to justify engaging in communication both with and without additional costs of signaling and reception (K₂ and K₁ in our model above), and assuming reasonable scalings of the latter costs with reliability, derived optimal reliability values for each party.

Lachmann and Bergstrom (2004) use the value of information to compare non-combinatorial with combinatorial signaling schemes. Their analysis demonstrates the greater vulnerability of combinatorial coding schemes to potential deception. Donaldson-Matasci et al. (2010) focus on cues and community ecology, but their conclusions are just as applicable to animal signals. They note that when the value of information is based on relative growth rates within populations across generations, increases in cue or signal reliability cause changes in the value of information that scale identically with the corresponding measures of the amount of information provided. Kåhre (2002) also argues that alternative weightings of reliabilities, properly defined, can preserve the relative scaling and properties of the unweighted reliabilities. McNamara and Dall (2010) prove that the benefit component of the value of information is never negative if the receiver is using Bayesian updating of cues and signals. McLinn and Stephens (2010) apply value of information models to experiments with jays and find that both the reliability of signals and the uncertainty of the alternative conditions play roles in determining whether receivers attend to signals or not. Finally, Schmidt et al. (2010) provide a broad view of the uses of information and its relative value at different levels in natural ecosystems.

Sample Matlab Routines for Discrete Signal Systems

Below, we provide some Matlab routines for computing the value of information for discrete signal and discrete action systems with more than two alternatives each. The contrast is between using a given signal set and not using it. The primary M-file, VI, requires access to several other M-files: Bayes, AbsRel, and getPC. The latter tries to estimate the minimum probability for each alternative condition, holding the others fixed at their prior relative probabilities, at which a receiver should switch to the corresponding optimal action. Note that this is just a guess: relative probabilities when there are more than 2 conditions can vary in complicated ways with successive sampling and updating. However, the computed value of information does not depend on these computations.

function B=Bayes (S,P)
 % S is the coding matrix with conditions as columns and signals as rows
 % P gives prior probabilities as row vector with columns as conditions
 % D.*S computes the "Condition AND Signal" matrix from S and P
 % S*P' computes the row totals in the AND matrix and thus the total
 % fraction of time that each signal is given across all conditions
 % The last half of the expression divides each AND cell value by
 % the corresponding row total; this is the actual Bayes calculation
 % Values in the output matrix cells are a posteriori probabilities of
 % each condition being true (rows) after having received the signal assigned
 % to that column. Note reversal of axis assignments from S matrix.
 % Inclusion of signals that are never used results in 0 Bayesian estimates.
 %
 [a b]=size(S);
D=ones(a,b);
 DD=D;
 for i=1:a
       D(i,:)=P;
 end
 AND=D.*S; %Compute joint (AND) matrix of conditions and signals
 SS=S*P'; %compute total fraction of time each signal is given
 for i=1:a
   if (SS(i)>0)
     SP(i)=1/SS(i);
   else
     SP(i)=0;
   end
 end
 for i=1:b
   DD(:,i)=SP;
 end
 B=AND.*DD; %compute a posteriori prob of conditions given signal
 B=B'; %Reverse axes so that original inputs and outputs reversed
     % and columns add to 1.0.

function [M N G]=AbsRel(S,P)
      % This routine takes a coding matrix S, in which inputs are listed as 
      % columns and outputs are listed as rows, and a horizontal vector P 
      % which summarizes the prior probabilities of each input listed in S,
      % and computes the a posteriori probabilities that a receiver using this
      % signal set and Bayesian updating will assign to each alternative input
      % option (rows) when a given input (columns) is in fact true.
      % These probabilities are summarized in a square matrix G in which
      % the main diagonal reports correct assignments, and off-diagonal
      % values are the probabilities of erroneous assignments.
      % If the output is given as the vector [M N G],a number of values is
      % output. M is the weighted average (by P) of the probabilities that
      % a receiver will identify the correct input after attending to
      % signals and updating. It will equal 1.0 for perfect coding, and 
      % the reciprocal of the number of inputs for chance coding matrices
      % (ones with the same value in all cells). N is a weighted average of the
      % changes in probabilities from priors when using signals compared to the
      % maximal change that would occur with perfect coding. 
      % It will vary between 0 and 1. It is assigned a value of 0 if any
      % prior probability is 1.0. If no assignment is given for this routine,
      % only M is generated. 
      % If several coding matrices are involved in a sequence, e.g. if there
      % is a sender matrix SS, a transmission matrix T, and a receiver matrix
      % R summarizing how receivers assign transmitted signals to expected
      % categories, the input to this routine should be the chain product
      % S=R*T*S, and the vector P lists the prior probabilities for the inputs
      % to SS. Note that this routine requires access to the Bayes function. 
      %  Jack Bradbury October 6, 2009 
      %
G=Bayes(S,P)*S;
GG=diag(G);
M=P*GG;
N=0;
[a b]=size(P);
ON=ones(a,b);
if min(P)<1.0
  W=(GG'-P)./(ON-P);
  N=W*P';
end

function [T MN I]=getPC(R,P,C)
      % This routine takes a payoff matrix R in which alternative conditions are
      % listed as columns and alternative actions as rows. Cell values are payoffs
      % of performing a given action when a given condition is true. Prior 
      % proabilities for each condition are given in horizontal vector P.
      % Routine finds minimum probability for a condition that must be true before
      % it pays to adopt that action which gives the maximal payoff for that condition.
      % Solution varies one probability holding relative values of others fixed
      % until average payoff of the corresponding action is the maximum among the
      % alternative actions. Result T is a matrix in which each row gives a
      % probability vector for the alternative conditions in which the main
      % diagonal gives the minimal probability for that condition to elicit that
      % action. MN gives the minimal probability that must be true for each column
      % before the action giving the highest PO for that column will be adopted.
      % I is the row that has the maximum payoff in R. C is an argument defining
      % how refined the search is.
[a b]=size(R); % get size of payoff matrix
[X,I]=max(R); % Find action (row) with max payoff for each column 
T=zeros(a,b); % set up empty final T matrix
MN=zeros(1,b); % set up empty final M matrix
for i=1:b  % start one of b runs varying one prob while others held constant
  G=P;  % Assign prior prob to each of b cells
  G(i)=0;  % Set focal cell to 0 to begin sampling
  SG=sum(G); % Compute sum of remaining cell values in vector 
  G=G/SG; % Standardize remaining cell values so sum is 1.0
  j=0;
  D=1/C; % define increments to increase focal cell probability  
  test=1;
  while (test~=0)
    j=j+1;
    PP=j*D; % define increment at this step 
    PQ=G*(1-PP); % decrease non-focal cell values by proportional amounts needed 
      % to insure that sum of b cells is 1.0 once focal cell is augmented in next step
    PQ(i)=PP; % Augment focal cell to next step value
    V=R*PQ'; % Compute average payoff given this set of cell values
    [X,II]=max(V); % Keep track of which combination was maximal
    test=ne(I(i),II);
  end  
   T(i,:)=PQ; % Move on to next one but store maximum row in T
end
for i=1:b
  MN(i)=T(I(i),i);
end
    
function [MV D VD]=VI(S,P,R)
      % This function computes the value of information, V, in which average
      % fitness using a signal set is compared to a default strategy not using
      % that signal set. It depends on a coding matrix S with conditions as 
      % columns and signals as rows, prior condition probabilities as a 
      % row vector P, and a payoff matrix R in which alternative conditions 
      % are listed as columns, alternative actions as rows,
      % and cells give payoffs. Ideally, payoff matrix reflects changes in payoffs
      % above some standard average fitness. Thus negative values imply decreases
      % in fitness and positive ones are increases due to taking that action when
      % that condition is true. 
      %
      % The routine computes the average payoff for each alternative action given 
      % priors and assigns action with maximal average payoff to default strategy. 
      % This is action animal always takes in absence of signals (or changes in 
      % priors). 
      %
      % It then holds relative probabilities of all but one condition fixed, and 
      % varies probability of remaining one from 0 up until adopting the action 
      % with the highest payoff for that condition just becomes the optimal 
      % strategy. This is the threshold probability for a switch from default to 
      % that strategy. 
      %
      % Routine then computes a posteriori probabilities of each condition when 
      % attending to signals. These are compared to threshold values and 
      % differences are output as matrix D. All positive values in D mean signals
      % led to updated probability greater than threshold-meaning signals should 
      % change choice of action. 
      %
      % Routine then computes reliability values: probabilities that receiver will
      % assign signals to particular conditions and then take corresponding 
      % ctions.
      %
      % Finally, routine computes average payoff when attending to signals. This
      % is the sum of the products between a particular payoff and the reliability
      % probability that that action will be instigated with a given condition is
      % true. The difference between each of these sums and the payoff if the
      % animal only adopted the default strategy are output in the vector VD.
      % These values for each condition are next discounted by their prior
      % probabilities to compute the overall average payoff. The value of
      % information is this value minus that when no signals are used and the
      % receiver always uses the default strategy. 
 
      % Computation ignores performance costs to either party of communicating. 
      %
C=50000;%Set high limit to esimate threshold probabilities
[a b]=size(S);
[V I]=max(R); %Identify optimal actions for each condition in R
[T M J]=getPC(R,P,C); %Find minimal probabilities to adopt actions
NI=R*P'; %Compute average payoffs of alt actions with no signals
[NPO K]=max(NI); %Compute PO and identify default action
B=Bayes(S,P); %Compute a posteriori probabilities of using this code
MM=zeros(a,b);
for i=1:a
  MM(i,:)=M; %Create threshold matrix
end
D=B-MM'; %Compute differences in a posteriori prob and thresholds
[A1 B1 RR]=AbsRel(S,P);
W=sum(R.*RR); %Compute products of reliability prob and payoff for each cell
VD=W-R(K,:);
IPO=W*P'; %Compute overall average payoff when using signal system
V=IPO-NPO; % Compute value of information as difference
titles=char('Default Action: ','Aver PO no Signals: ','Aver PO w/ Signals: ','Value of Info: ');
VV=num2str([K;NPO;IPO;V;],'%4f');
MV=[titles VV];

Literature Cited

Bradbury, J.W. and S.L. Vehrencamp. 2000. Economic models of animal communication. Animal Behaviour 59: 259–268.

Chavas, J.-P. and R.D. Pope. 1984. Information: its measurement and valuation. American Journal of Agricultural Economics 66: 705–711.

Donaldson-Matasci, M.C., C.T. Bergstrom and M. Lachmann. 2010. The fitness value of information. Oikos 119: 219–230.

Good, I.J. 1966. On the principle of total evidence. British Journal for the Philosophy of Science 17: 319–321.

Gould, J.P. 1974. Risk, stochastic preference, and the value of information. Journal of Economic Theory 8: 64–84.

Kåhre, J. 2002. The Mathematical Theory of Information. New York, NY: Springer Verlag.

Koops, M.A. 2004. Reliability and the value of information. Animal Behaviour 67: 103–111.

Lachmann, M. and C.T. Bergstrom. 2004. The disadvantage of combinatorial communication. Proceedings of the Royal Society of London Series B-Biological Sciences 271: 2337–2343.

McLinn, C.M. and D.W. Stephens. 2010. An experimental analysis of receiver economics: cost, reliability and uncertainty interact to determine a signal’s value. Oikos 119: 254–263.

McNamara, J.M. and S.R.X. Dall. 2010. Information is a fitness enhancing resource. Oikos 119: 231–236.

Raiffa, H. 1968. Decision Analysis: Introductory Lectures on Choices and Uncertainty. Reading, MA: Addison-Wesley.

Ramsey, F.P. 1990. Weight or value of knowledge. British Journal for the Philosophy of Science 41: 1–4.

Savage, L.J. 1954. The Foundations of Statistics. New York NY: Wiley.

Schmidt, K.A., S.R.X. Dall and J.A. van Gils. 2010. The ecology of information: an overview on the ecological significance of making informed decisions. Oikos 119: 304–316.

Stephens, D.W. 1989. Variance and the value of information. The American Naturalist 134: 128–140.

Szalai, F. and S. Szamado. 2009. Honest and cheating strategies in a simple model of aggressive communication. Animal Behaviour 78: 949–959.