Learning that occurs when stimuli that are similar but not identical to the cs produce the cr

The discovery of blocking and related cue-competition phenomena initially led students of traditional learning paradigms to describe the laws or principles of learning in informational terms (Rescorla, 1972). The underlying intuition was that for learning to occur a CS had to convey information about the US. Learning was driven by the correlation between the CS and the US rather than by their temporal pairing (Rescorla, 1968, 1972). Learning did not occur unless the CS conveyed new information —unless the US “surprised” the subject (Kamin, 1969a) because it was “unexpected.” There has also been some theorizing that the rewarding property of conditioned stimuli was related to the extent to which they reduced uncertainty about the delivery of primary reward (Bloomfield, 1972; Cantor & Wilson, 1981; Egger & Miller, 1962). These formulations have intuitive appeal, but they did not gain much traction because of both the resilience of contiguity-based theorizing and because there was no reason to think such processes could be instantiated in neurobiology (Clayton, Emery, & Dickinson, 2005). More recently, brain circuits that modulate learning by predictive error signals have been identified (Montague, Dayan, & Sejnowski, 1996; Montague, Hyman, & Cohen, 2004; Schultz, 2002; Takahashi et al., 2009). Thus the time seemed appropriate to rethink how an information-theoretic approach might be applied to a broad range of learning phenomena.

The just-reviewed characteristics of timing and temporal memory and their role in formation of conditioned behavior form the foundation of a quantitative formulation of these intuitions (Balsam & Gallistel, 2009). The account rests on the same information-theoretic conceptual foundations as quantitative analyses of the transmission of information by sequences of action potentials (Rieke, Warland, de Ruyter van Steveninck, & Bialek, 1997). Thus it may point to a relatively direct way to map information in the world to its representation in the nervous system.

The information that a signal (for example, a CS) communicates to a receiver (the subject in a conditioning experiment) is measured by the reduction in the receiver’s uncertainty regarding the state of some stochastic aspect of the world (Shannon, 1948). The amount of information that can be communicated is limited first by the available information (source entropy, how much variation there is in that aspect of the world) and, second, by the mutual information between the signal the subject gets and the variable state of the world (roughly, the correlation between the signal received and the state of the world). The foundation of an information-theoretic analysis is the specification and quantification of the relevant uncertainties: In the present case it is the timing of the US relative to the CSs. Effective Pavlovian CSs change the subject’s uncertainty about when the next US will occur.

In simple cases, the source entropy (available information) can readily be calculated. We begin by distinguishing between paradigms like the one used by Rescorla (1968), in which the CS signals a change in the rate parameter, and more conventional paradigms, in which the onset of the CS occurs at a fixed interval prior to the US.

In the experiment that Rescorla (1968) used to dissociate CS-US correlation from the temporal pairing of the CS and US, the US was generated by a random rate (Poisson) process, which is entirely characterized by the rate λ(the average number of USs per unit time). Random rate processes, which make the next occurrence equally probable at any moment in time, are of special interest in analyzing temporal uncertainty because they maximize the source entropy. For any other stochastic process there is less objective uncertainty about when the next event will occur; that is, there is less source entropy, less available information per event.

Rescorla (1968) held the US rate constant in the presence of the CS and varied the US rate in the periods when the CS was absent. Subjects developed a conditioned response to the CS, except in the critical condition, when the US rate was the same in the absence of the CS as in its presence (Figure 1). This result is not predicted by the hypothesis the temporal pairing of CS and US leads to the development of a conditioned response, because the temporal pairing between CS and US was the same in all conditions. We have previously shown that this result is predicted by considering the uncertainty about US timing in the presence and absence of the CS (Balsam & Gallistel, 2009), and we present a variant of that derivation here.

The quantity of uncertainty, H, is called the entropy. The differences in the amount of uncertainty about the timing of the next US in a given context and the uncertainty about the timing of the next US given a CS is the information that the CS conveys about the timing of the US. Entropies, hence also differences in entropy (information), are commonly measured in bits. The entropy rate for a Poisson process, which is the uncertainty per unit time, may also be thought of as the information available from that process per unit time, (see red plot in Figure 7):

where Δτ is the minimum difference in time that the subject can resolve, which is assumed to be much smaller than the average interval between events [see Rieke (1997), p. 116 & Appendix A10 for derivation]. The average interval between the events is Ī= 1/λ, so the average entropy per event (see green plot in Figure 7) is

Learning that occurs when stimuli that are similar but not identical to the cs produce the cr

Relations among information flow, uncertainty, and the US rate. The red curve is the plot of Equaton (1), information flow (bits per second) from a random rate process as a function of the rate. The green curve is the plot of Equation (2), uncertainty per event as a function of the rate. As the rate at which events occur goes up, the flow of information (bits per unit time) increases while the uncertainty about when the next event will occur decreases.

H¯=1λλlog2(eλΔτ)=k−log2λ

(2)

where k=log2(eΔτ). The difference in the per-event entropies is:

H¯C−H¯CS=(k−log2λC)−(k−log2λCS)=log2λCS−log2λC=log2(λCS/λC)=log2(I¯C/I¯CS)

where λC is the overall US rate (the context or background rate), λCS is the rate when the transient CS is also present, and ĪC and ĪCS are the reciprocals of these rates, that is, the expected intervals between USs. The critical condition in Rescorla’s (1968) experiment was the one where λCS = λC, in which case λCS/λC = 1 and H̄C − H̄CS = log2(1) = 0. In words, the presence of the CS conveys no information about the timing of the next US. Its onset does not increase the flow of information.

The Rescorla (1968) result has often been analyzed in terms of the conditional probabilities of the US in the presence and absence of the CS, but such an analysis is incomplete. Differences in rates cannot be straightforwardly reduced to differences in conditional probabilities, because there may be more than one occurrence of the US during a single occurrence of the CS. More importantly, an analysis in terms of differences in conditional probabilities does not reveal the critical role of the relative temporal intervals in the strength of the CR, nor does it clarify the meaning of contiguity.

The unusual methodology in Rescorla’s (1968) experiments calls attention to unresolved problems in specifying what constitutes temporal pairing. Traditionally, two stimuli are regarded as temporally paired when their onset asynchrony (the CS-US interval) falls within a window of associability (Gluck & Thompson, 1987; Hawkins & Kandel, 1984). However, there has been a longstanding inability to specify what that window is (Rescorla, 1972; Rescorla & Wagner, 1972). In the Rescorla (1968) experiment, unlike in most Pavlovian conditioning experiments, the temporal interval between CS onset and the US was not fixed; USs could and did occur at any time after CS onset —near the onset, in the middle of the CS or near its end. And, more than one US could occur within a single CS. This highlights the unanswered questions of where in time to position a window of associability relative to CS onset, how wide to make it, and what to do when there is more than one US within a single such window.

The difficulty of quantifying contiguity is also evident in studies of contextual learning. When subjects learn about CSs, they simultaneously learn about contexts: they become conditioned to the experimental chamber itself (Balsam, 1985). Contextual learning, also known as background conditioning, has become an important part of modern associative theorizing. So far as we know, no one has attempted to say how it could be understood in terms of a window of associability (Colwill, Absher, & Roberts, 1988), because a single experience of the chamber (one experimental session) encompasses many USs occurring at unpredictable intervals. The “CS” (the chamber itself) may last an hour or more, with many USs during that single CS. In short, at this time, there is no rigorously formulated notion of temporal pairing, despite the fundamental role that the notion of temporal pairing plays in associative theory.

If conditioning is seen as driven by the change that a CS produces in a subject’s uncertainty about the timing of the next US, there is no longer a theoretical problem. We have already seen that a simple formal development applies to the case in which the rate of US occurrence is conditioned on the presence or absence of the CS. The same analysis explains background conditioning, because placement in the experimental chamber changes the expected rate of US occurrence, hence, the subject’s uncertainty about when the next US will occur. More formally, the per-event entropy, conditioned on the subject’s being in the chamber, is less than the unconditioned per-event entropy over the course of days or longer (Balsam, 1985). Put yet another way, the flow of information from this random rate process increases when the subject is placed in the context where that process operates. We would expect the strength of anticipatory responding controlled by a context to be a function of the overall US rate in the context, and the empirical data are consistent with this expectation (Mustaca, Gabelli, Balsam, & Papini, 1991). Thus, the information-theoretic analysis readily applies to paradigms in which the US occurs repeatedly within a single occurrence of the CS and/or there is no fixed interval between CS onset and US onset, cases in which temporal pairing, as traditionally understood, is undefined.

If this conception is correct, then we should see that the strength of a CR after a fixed amount of training ought to be a function of the informativeness of the CS. We have previously shown this to be the case in the conditioning of pigeon keypecking (Balsam, Fairhurst, & Gallistel, 2006). Here we replot the original Rescorla (1968) contingency experiment (Figure 8). As Figure 8 shows, the degree of fear conditioning increases monotonically as a function of the bits of information conveyed by the CS. An additional implication of this view is that adding unsignaled reinforcers in the ITI is no more detrimental than the effects of massing trials, so long as the overall reinforcement rates are equivalent (Balsam, Fairhurst & Gallistel, 2006). Jenkins, Barnes & Berrera (1981) reported such a finding. In that experiment (Experiment 13), the percentage of ITI reinforcers preceded by the autoshaping cue was varied from 3 to 100%. All subjects that acquired keypecking did so after the same number of CS-US pairings. This is consistent with the view that the detrimental effect of adding reinforcers to the ITI is mediated by changes in background reinforcement rate.

Learning that occurs when stimuli that are similar but not identical to the cs produce the cr

The average suppression ratios for the experimental groups in which contingency was manipulated (Rescorla, 1968). The suppression ratios are plotted as a function of the bits of information that the CS conveys about the time of expected US presentation. The regression line, Y= −0.16X + 0.47, accounts for 88% of the variance. The number pairs by each datum give the λCS and λITI for the group of subject from with the datum comes (in USs/2 minutes)

One apparent contradiction to this conclusion comes from studies in which the ITI reinforcers are signaled by a cue that is different than the target cue (Durlach, 1983; Rescorla, 1972). If only the overall rate of reinforcement modulated acquisition it would not matter if the added reinforcers were unsignaled, signaled by the target cue (as in the Jenkins experiment), or signaled by a different cue. However, signaling the ITI reinforcers with a different cue does not decrement responding to the same extent as unsignaled US’s (Cooper, Aronson, Balsam, & Gibbon, 1990; Durlach, 1983; Rescorla, 1972). To deal with this effect Cooper et al. (1990) posited that when the ITI cue and target cue differ the two experiences are segregated. The differently signaled ITI reinforcers do not enter into the calculation of the overall rate for the target and the target reinforcers do not enter into the calculation for the alternative cue. Though the segregation of event streams into different representations was an appealing idea it did not follow in a principled way from any theoretical formulation.

The information theoretic analysis offers a principled account of these results by rationalizing the more or less arbitrary features of the Rescorla-Wagner formulation. The essential assumptions in the Rescorla-Wagner formulation are that associations combine additively and that their combined strength reduces the potential for further associative growth, because there is an upper limit on the sum of associative strengths. These assumptions are arbitrary in that nothing about the concept of associative strength, as traditionally understood, suggests that summing associative strengths is a meaningful operation or that there should be an upper limit on the sum (which is not itself an associative strength). By contrast, in the information theoretic formulation, the source entropy of the context (the amount of uncertainty per unit time regarding the next occurrence of the US) constitutes an objective limit on the amount of information about US timing that can be provided by all sources of information combined; they cannot provide more information than is available. And, entropies add. Thus, the information about US timing provided by, for example, the experimental context, is diminished when events that occur within that context provide more of the same information. If those events together provide more information about US timing than is available from the context, then the context directly provides no information about US timing.1 Put another way, the flow of information from the US-generating process is attributed to the stimuli that signal its operation.

Rate Estimation Theory (RET; Gallistel & Gibbon, 2000), which is similar in spirit to the present proposal, demonstrated that these properties —additivity with an inherent upper limit on the sum— suffice to explain why, for example, providing a second CS that predicts the USs not predicted by the first CS “rescues” the first CS in Rescorla’s truly random control (Durlach, 1983)—see Figure 9. In RET, what combines additively are the rates of US occurrence predicted by different CSs (including the context). The upper limit is imposed by the fact that the sum of the rates ascribed to two or more predictors must equal the rates observed during periods when predictors are simultaneously present. A spreadsheet implementation of RET (Gallistel, 1992) is available from CRG. Readers may use it to verify that it does predict the Durlach (1983) result. The present formulation predicts the result for the same reasons, as we now explain.

Learning that occurs when stimuli that are similar but not identical to the cs produce the cr

Timeline schematic of the Durlach (1983) experiment. Subjects were first trained with the white CS, which unfailingly predicted the food US (dot) at the end of its 10-s duration. Then, subjects were divided into two groups to be trained with the gray CS. For one group (top line), the US was no more frequent when the CS was on than when it was not on, as in Rescorla’s (1968) truly random control experiment. As in Rescorla’s experiment, this group did not develop a CR to the gray CS. The second group (second line) had the same ITI reinforcements (the reinforcements unsignaled by the gray CS), but these reinforcements were signaled by the white CS. This group did develop a conditioned response to the gray CS. The third line shows the effective event stream for the gray CS, when the white CS-US pairings (and the durations they consume) are excised. This represents the hypothesized treatment of the gray CS if its event stream is segregated from that of the white CS. Now the contingency between the gray CS and the remaining dots is obvious (the gray CS provides information about the temporal location of the reinforcements that are “unexplained” by the white CS). Note that the rate of CS reinforcement in the top line (random control group) is the same as the background rate of reinforcement in the absence of the CS. Thus, the expected time to the next reinforcement is independent of whether the CS is or is not present, making the CS uninformative. By contrast, in the bottom line, this same rate is considerably greater than the rate of occurrence of otherwise unexplained USs (reinforcements not attributable to the white CS). Thus, the expected time to reinforcement in the presence of this CS is shorter than in its absence, making the gray CS informative. Because information is both additive and limited by the available information, the information about US occurrence carried by the white CS in line 2 reduces the information provided by the simultaneously present background, thereby increasing the information provided by the gray CS, which competes with this same background.

We see from Equation (1) above (see also Figure 7) that the flow of information increases as the estimated rate of US occurrence increases. Information-conveying power accrues to a CS (to a potential predictor) insofar as the rate of information flow increases when that CS comes on. The accrual of information to one predictor comes at the expense of other competing predictors, because information (differences in entropy), like entropy itself, is both additive and limited by the available information. Thus, when the flow of information from the US-generating process is attributed to a transient CS, the flow attributed to the continuously present context is necessarily reduced. In Rescorla’s truly random control, the flow of information does not increase when the transient CS comes on; thus, none of the flow is attributed to it. In Durlach’s protocol, the flow of information increases dramatically when the non-target CS comes on (Figure 9, white CS). The resulting ascription of a high flow of information to the non-target CS must come at the expense of the simultaneously present context, reducing the flow ascribed to the context. But the information flow during the target CS is not affected (Figure 9, gray CS); the rate when the target CS (and the context) are present is the same as in Rescorla’s truly random control condition. Therefore, the information flow when the target CS comes on increases above that ascribed to the context, and this increase is ascribed to the target CS. In short, the non-target CS rescues the target CS by reducing the information flow ascribed to the context.

As we have already noted, the information-theoretic explanation of cue competition phenomena rests on similar mathematical foundations (additivity under a limit) as does the explanations offerered by the Rescorla-Wagner formulation and by RET. Unlike them, it gives an empirically supported definition of the elusive notion of “temporal contiguity,” as we now explain.

We consider now the application of an information-theoretic analysis to the traditional temporal pairing case, in which the US occurs a fixed time after CS onset. We assume a random rate of US occurrence while the subject is in the apparatus (hence, a variable intertrial interval), with an expected (average) interval between USs in that context of ĪC = 1/λC.

In the traditional temporal-pairing paradigm, the presence of a CS does not in one sense change the US rate. A subject that could not perceive the CS would detect no changes in US rate in this paradigm, whereas in some of Rescorla’s conditions, a subject that could not perceive the CS might nonetheless detect the changes in the US rate. (It might detect the otherwise undetectable presence of the CS by detecting the change in the US rate during the CS.) In the more traditional paradigm, the CS does not signal a change in rate; it signals when the US will occur, because each occurrence of the US is preceded by a CS of fixed duration, T, whose termination coincides with the US. If, however, the signaled interval is appreciably shorter than the otherwise expected interval to the next US, the CS does signal an apparent change in rate.

Given the empirically well established scalar uncertainty in subjects’ representation of temporal intervals (Gibbon, 1977), we assume that after CS onset a subject’s probability distribution for the time at which the US will occur is a Gaussian distribution with σ = wT, where w is the Weber fraction (coefficient of variation) and T is the duration of the CS-US interval – see the plot of the subject uncertainty about te in figure 11. The experimental value for w, based on the coefficient of variation in the stop times in the peak procedure, is about 0.16. It is surprisingly constant for widely differing values of T and subject species (Gallistel, King, & McDonald, 2004). The entropy of a Gaussian distribution is:

Learning that occurs when stimuli that are similar but not identical to the cs produce the cr

Reinforcement expectation as a function of the time after an event (CS or US) signaling a fixed interval, te, to the next reinforcement. The solid green plot is the inverse of the (objective) expected time to reinforcement, 1/(te − t), which is infinite when t = te and negative when t > te. The solid red plot is the subjective hazard function, g (t, te, wte)(1 − G(t, te, wte)), where g and G are the Gaussian PDF and CDF, with expectation te and standard deviation wte (w is the subject’s Weber fraction for temporal intervals. The dashed green plot is the inverse of the subjective survival function, 1/(te(1 − G(t, te, wte))). Note that the (subjective) conditional hazard function drops to zero at CS onset and rises as the time of reinforcement is approached and exceeded. The other two functions jump up from the unconditional (baseline) expectation to 1/te and rise further from there. The baseline expectation is, 1/ĪC, where ĪC is the average US-US interval. In the case of fixed-time reinforcements with no CS, t = 0 is the time of occurrence of the most recent US and te = ĪC.

Substituting wT for σ and expanding, we obtain an expression for the subject’s uncertainty about the timing of the next US after CS onset.

H=12log2⌊2πe(wT)2(Δτ)2⌋=12log22πe+12log2w2+12log2T2+12log2(1Δτ)2=12log22πe+log2T+log2w+log2(1Δτ)=12log22πe+log2T+log2w−log2(Δτ)

Equation (2), when written in terms of ĪC rather than λC, is

H¯C=log2(eΔτI¯C)=log2e+log2I¯C−log(Δτ)

The difference between this background uncertainty and the uncertainty immediately after CS onset is

(log2e+log2I¯C−log2(Δτ))−(12log22πe+log2T+log2w−log2Δτ)=log2e+log2I¯C−log2(Δτ)−12log22π−12log2e−log2T−log2w−log2Δτ=12(log2e−log22π)+log2I¯C−log2T−log2w=log2(I¯CT)+k,

(3)

where k=12log2(e2π)−logw.

Equation (3) gives the intuitively obvious result that the closer CS onset is to the US, the more it reduces the subject’s uncertainty about when the US will occur. We suggest that this intuition is what underlies the widespread but erroneous conviction that temporal pairing is an essential feature of conditioning. Importantly, Equation (3) shows that closeness is relative. What matters is not the duration of T, the CS-US interval, but rather ĪC/T, the CS-US interval relative to the average US-US interval. This explains why it is impossible to define a window of associability —that is, a range of CS–US intervals that support associative learning. There is no such window. The relevant quantity is a unitless proportion, not an interval. Moreover, there is no critical value for this proportion. Rather, the empirically determined “associability” of the CS and US is strictly proportional to the ĪC/T ratio, as we now explain. (In what follows, we define and use “associability” in a purely operational sense, without commitment to the hypothesis that there is an underlying associative connection, if by “associative connection” one understands a signal-conducting pathway whose conductance depends on past experience.)

Equation (3) gives a quantitative explanation for Figure 10 [replotted from Gibbon & Balsam (1981)], which is a plot of the number of CS-US pairings (“reinforcements”) required for the appearance of an anticipatory response to the CS, as a function of ĪC/T, on double logarithmic coordinates. It is commonly assumed that the less associable the CS and the US, the more CS-US pairings will be required to produce an anticipatory response. We make this assumption quantitative by assuming A= 1/NCS-US, where A = associability and NCS-US = the number of “reinforcements” (CS-US pairings) required before we observe an anticipatory response. Equation (3) says that the unitless ratio ĪC/T (the ratio of the average US-US interval to the average CS-US interval) is the protocol parameter that determines the amount of information that CS onset conveys about US timing. (The other relevant parameter is w, the measure of the precision with which a subject can represent a temporal interval.) This is the same quantity as ĪC/ĪCS, which proved to be critical in analyzing the information content of the CS in the Rescorla paradigm. We call this ratio the informativeness of the CS-US relation.

Learning that occurs when stimuli that are similar but not identical to the cs produce the cr

Reinforcements to acquisition as a function of the ratio between the average US-US interval (ĪC) and the CS-US interval (T) on double-logarithmic coordinates. These speed-of-acquisition data come from a form of Pavlovian conditioning with pigeons called autoshaping, in which the illumination of a small circular light CS is followed by a food US. The slope of the regression is not significantly different from -1. Based on an earlier plot (Gibbon & Balsam, 1981) with data from many different labs.

Figure 10 is a plot of log(NCS-US)against log(ĪC/T). Remarkably, its slope is approximately −1. Thus, empirically − log NCS-US = log(ĪC/ĪCS) = log(ĪC/T). Taking antilogs, 1/NCS-US = A∝ ĪC/ĪCS. In words, operationally defined associability is proportional to informativeness.

Our approach to the operational definition of associability parallels the strategy in which the sensitivity of a sensory mechanism is defined to be the reciprocal of the stimulus intensity required to produce a response (as, for example, in the determination of the scotopic spectral sensitivity curve or the spatial and/or modulation transfer functions in visual psychophysics). Our analog to the required stimulus intensity is the required number of reinforcements; our operational definition of associability as the reciprocal of the required number of pairings is the analog of sensitivity (the reciprocal of required intensity). In an associative conceptual framework, the associability is the rate of learning. In our framework, associability is the speed with which a behavioral response to a predictive relation emerges: the stronger the predictive relation between cue and consequence, the fewer repetitions of the experience are required before the subject decides to respond to it. In the usual Pavlovian experiment associability is the speed with which the subject decides that an anticipatory response is appropriate for a particular temporal arrangement of events.

Consider next the case in which we let US function as its own CS by fixing IC, the US-US interval. In this case, it is the preceding US that enables the subject to anticipate when the next US will occur. Now, T = IC. The informativeness (their ratio) is now 1, so log (IC/T) = 0, and Equation (3) says that the information conveyed by the preceding US is k=12log2e2π−logw=−.6−logwk=12log2(e2π)−logw=.6−logw. (Note that for w < 1, − log w > 0.)

The smaller w is (that is, the more precisely a subject can time and remember a fixed interval), the more information one US gives about the timing of the next US. For w= 0.16, k ≅ 2, fixing the US-US interval gives as much information about the timing of the next US as a 4-fold change of rate in the Rescorla paradigm. We know that subjects are sensitive to the information in a fixed US-US interval because there is an increase in anticipatory responding as the fixed interval between USs elapses (Kirkpatrick & Church, 2000a; Pavlov, 1927; Staddon & Higa, 1993). Sensitivity to fixed inter-reinforcement intervals is also apparent in the well known increased likelihood of responding in anticipation of the next reward, which is seen in fixed interval operant conditioning schedules (Ferster & Skinner, 1957; Gibbon et al., 1977). In an important set of experiments conducted in Doug Williams’ lab (Williams, Lawson, Cook, Mather, & Johns, 2008) subjects were exposed to a zero contingency procedure in which the rate of reinforcement in the presence of a CS was equal to the rate of reinforcement in the absence of the CS. When the CS signaled a fixed interval from its onset until US presentation, excitatory responding emerged, providing a dramatic demonstration that the information provided by fixing the CS-US interval contributes to the emergence of anticipatory conditioning beyond that contained in the simple rate ratio. The generality of such results and its consistency with quantitative information accounts remain to be explored.

An hypothesis that merits testing is that when there is a fixed interval between a CS and a US or between successive USs, the timing of the CR that anticipates the US is governed by the subjective hazard function. The subjective hazard function conditioned on the occurrence of a temporal landmark at a fixed interval before the US is the Gaussian subjective probability density function (PDF) divided by the Gaussian subjective survival function, g(t, te, wte)/(1 − G(t, te, wte)), where te is the expected time to reinforcement following a temporal landmark, t is the time elapsed since the onset of the preceding event at t= 0, w is the temporal Weber fraction, and g and G are the Gaussian probability density function and cumulative distribution function. The conditional subjective hazard function is the (subjective) probability that the next US (reinforcement) occurs at a given moment in the future divided by the (subjective) probability that it will not have occurred prior to that moment, conditioned on the occurrence of the temporal landmark at t= 0. Put less formally, it is the momentary expectation of reinforcement. When the USs are exponentially distributed, the unconditional (baseline) hazard function is flat; the expectation of reinforcement does not change from moment to moment. The conditional hazard drops to essentially zero immediately after the temporal landmark, when t ≪ te; that is, the onset of the CS temporarily reduces the momentary expectation of reinforcement. As time elapses, conditional hazard increases until at some point it exceeds the baseline hazard (see solid red curve in Figure 11). How far prior to the te crossing of the baseline hazard occurs depends on the subject’s Weber fraction. The more precisely it represents the interval, the later in the interval of anticipation the (subjective) conditional hazard will exceed the baseline hazard.

Kirkpatrick and Church (2003) suggest that the probability of initiating a bout of conditioned responding is inversely proportional to the expected time to reinforcement. The inverse of (te − t) is not well behaved when t reaches and exceeds te: it becomes infinite at te and negative thereafter (see solid green curve in Figure 11). A more plausible suggestion in the same spirit, a suggestion that takes account of the subject’s uncertainty about when te has been reached, is the inverse of the subjective survival function (see dashed green curve in Figure 11).

Unlike the subjective conditional hazard function, the inverse of the expected time to reinforcement and the inverse of the subjective survival function are never less than the baseline hazard; that is, the onset of a CS cannot decrease the subject’s expectation of reinforcement. Fairhurst, Gallistel and Gibbon (2003) trained short and long duration CSs that were sometimes presented in isolation and sometimes in compound (with asynchronous onsets, preserving their respective temporal relations to the US). When they were presented in compound, the probability of a bout of conditioned responding increased following the onset of the long-duration CS, but then, at the onset of the short-duration CS, it was strongly (but, of course, transiently) suppressed. In other words, the onset of the short-duration CS temporarily reduced the expectation of reinforcement. This result would seem to favor the assumption that the temporal control of conditioned responding is by the hazard function rather than by the inverse of the subjective survival function because, only the hazard function goes down at CS onset.

We return finally to the blocking, overshadowing and relative validity experiments that, in combination with the Rescorla (1968) experiment, originally inspired intuitions about the importance of the informativeness of the CS-US relation. These experiments showed that unless a CS is informative subjects do not develop an anticipatory response to it, no matter how often it is paired with the US. In the Blocking Protocol, the expected time to the next US given the blocking CS alone is the same as the expected time to the next US given both CSs. Therefore, the reduction factor for the blocked CS is 1 (no reduction), and its informativeness is 0. If either CS is attended to (i.e., if its informativeness is computed), then the informativeness of the other is 0; so subjects should learn to use only one of two perfectly redundant CSs to anticipate the US. Similarly, in each of the Relative Validity protocols only one CS can be fully informative. If the protocol consists of CS1+/CS1&CS2+ trials then if CS1 is attended to, CS2 has no informativeness. If the protocol consists of CS1& CS2+/CS2&CS3+, then CS1 and CS3 have no informativeness.

In their classic form, blocking and overshadowing experiments use the temporal pairing paradigm, and the competing CSs have the same onsets. Consequently, the entropy of the (subjective) US timing distribution given one CS onset is the same as the entropy given both CSs. Thus, processing both CSs does not yield any greater reduction in the uncertainty about the timing of the US than can be achieved by processing only one of them. We assume that there is a processing cost in handling the information about the timing of the next US. We further assume that subjects choose not to incur this cost unless it purchases a reduction in their uncertainty. These assumptions explain the blocking, overshadowing and relative validity effects, the principal effects in the extensive literature on “cue competition.” All of these procedures employ multiple cues that are redundant with respect to the information they provide about the time of expected outcomes.

This formulation predicts complete overshadowing when the training protocol makes two CSs completely redundant. Complete overshadowing and blocking have been observed (Kamin, 1967, 1969a; Rescorla, 1968; Reynolds, 1961; Wagner et al., 1968), although it is widely believed that overshadowing and blocking are incomplete (partial). There is not room here for a complete review of the extensive and complex literature on this question. We note, however, that in some commonly cited examples of partial overshadowing (Kehoe, 1982, 1986; Thein, Westbrook, & Harris, 2008), the two CSs were not completely redundant because, in addition to the compound trials on which the two CSs were presented together, there were repeated probe trials in which they were presented singly, either with or without reinforcement. Whether or not they are reinforced when presented singly, a repeated-probe-trial design makes the single CSs and their compound differentially informative. As a result, these results do not provide a clean test of whether overshadowing and blocking are complete when CS redundancy is complete.

Another issue in this complex literature is the effect of averaging across subjects and across trials. If CS1 completely overshadows CS2 for some subjects in a group, while for other subjects the reverse is true, the group average will imply partial overshadowing when in fact the overshadowing is complete in every subject (c.f. Reynolds, 1961). This problem is similar to the one encountered when averaging over blocks of trials before fitting an acquisition function (e.g., Thein et al., 2008) as it may give the misleading impression that the emergence of responding is continuous rather than step-like, as is often evident from an analysis of individual subjects (Balci et al., 2009; Estes, 1956; Estes & Maddox, 2005; Gallistel et al., 2004; Morris & Bouton, 2006; Papachristos & Gallistel, 2006).


Page 2

Learning that occurs when stimuli that are similar but not identical to the cs produce the cr

Schematic of the experimental protocols by which (Rescorla, 1968) demonstrated that CS-US contingency, not the temporal pairing of the CS and US, produces a US-anticipatory response (CR). The temporal pairing of CS and US is identical in the two groups, but there is no CS-US contingency in the second group (the truly random control), because the US occurs as frequently in the absence of the CS as in its presence, that is, λCS = λC. The subjects in the Group 1 develop a conditioned response to the CS; the subjects in Group 2 do not. (They did, however, develop a conditioned response to the context, that is, to the experimental chamber.) This was one of the findings that called into question the foundational assumption that the learning mechanism was activated by the temporal pairing of CS and US. CS=the conditioned stimulus (e.g., a tone); US = the unconditioned stimulus (e.g., shock to the feet); ITI=intertrial interval; T=duration of a CS presentation.


Page 3

Learning that occurs when stimuli that are similar but not identical to the cs produce the cr

Reinforcements to acquisition as a function of the ratio between the average US-US interval (ĪC) and the CS-US interval (T) on double-logarithmic coordinates. These speed-of-acquisition data come from a form of Pavlovian conditioning with pigeons called autoshaping, in which the illumination of a small circular light CS is followed by a food US. The slope of the regression is not significantly different from -1. Based on an earlier plot (Gibbon & Balsam, 1981) with data from many different labs.

  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr
  • Learning that occurs when stimuli that are similar but not identical to the cs produce the cr

Click on the image to see a larger version.