Chapter 8 - Surprise and Misrepresentation
Presented by Alex Kiefer
Chapter 8 of The Predictive Mind explores longstanding and unresolved debates about the nature of mental representation and representational content from the point of view of the Prediction Error Minimization framework. The chapter is concerned primarily with perceptual representation, in keeping with the emphasis on perception throughout the book.
Jakob offers novel perspectives on a wide range of topics in the theory of content and the philosophy of mind more generally. In this post I'll focus on the two topics that I take to be most crucial for characterizing the account of representational content that best fits with the PEM framework: misrepresentation and causal VS descriptive theories of content. The positions sketched in the chapter with respect to these topics can be summarized in the following two claims:
Misrepresentation: Misrepresentation is perceptual inference that minimizes short-term prediction error while undermining long-term prediction error minimization.
Causal VS descriptive theories: Cognitive systems that minimize prediction error represent the world by maintaining causally guided descriptions (modes of presentation) of states of affairs in the world.
In what follows I'll discuss these claims and Jakob's arguments for them in more detail, then consider challenges for each position as well as connections between the two topics.
Misrepresentation and the disjunction problem
A central goal of the chapter is to consider how a reductive theory of content might (a) explain the difference between accurate perception and misperception, (b) in terms of the PEM framework. As Jakob notes, such a theory must overcome what Fodor (1990) and others call “the disjunction problem”: suppose that a dog in a field at night is misperceived as a sheep. What makes it the case that the perceptual representation tokened on this occasion is an inaccurate representation of a dog as a sheep, rather than an accurate representation of a sheep-or-dog-at-night as such?
Jakob relies on the statistical theories of content offered by Eliasmith (2000) and Usher (2001) as points of departure for his own account. According to these theories, the content of a representation is (roughly) whatever in the world enjoys the highest average statistical dependence (specifically, mutual information (MI), definable in terms of either joint or conditional probabilities and marginal probabilities) with that representation under all stimulus conditions. Misrepresentation occurs on those occasions in which the content of a representation, so defined, differs from whatever has the highest MI with that representation under the then current stimulus conditions.
As Jakob discusses in chapter 2 of the book, a cognitive system that minimizes prediction error will exhibit high MI between the parameters of its internal model and states of the world. Given this, the contrast between short-term and on-average MI relations may map onto a contrast between short-term and on-average prediction error minimization. The hypothesis selected to explain away the sensory input caused by the dog in the example (i.e. that that thing is a sheep) will best minimize prediction error in the short term (otherwise it wouldn't have been selected), and the representational vehicle with that content will presumably enjoy higher MI with the presence of the dog than with any other external object under those perceptual circumstances. But that same vehicle carries the most information about the presence of sheep on average, so its tokening in the present situation will undermine prediction error minimization in the long term, which would best be served by representing the dog-at-night as a dog. Misperceptions can then be characterized as perceptual inferences that minimize prediction error in the short term but “undermine average, long-term prediction error minimization” (p. 176).
Causal and descriptive theories of content
As Jakob puts it, “philosophers typically are divided in two camps on representation: either aboutness [i.e. a representation's having content] is said to be a causal relation between things in the world and states of the mind, or it is said to be a relation that comes about when the state of the mind uniquely fits or describes the way the world is” (p. 173). One aim of Chapter 8 is to assess the extent to which the PEM framework favors one or the other of these theories. It seems that there are considerations that pull in both directions.
On the one hand, the appeal to mutual information to explain the relation between the parameters of the generative model and states of affairs in the world suggests a causal covariance theory, of which Usher's (2001) theory, for example, can be taken to be a probabilistic generalization. As Jakob puts it, “the aim of the perceptual system is to fine-tune the causal relation between states of the internal model and states of affairs in the world...so that they tend to predict each others' occurrence” (p. 182).
On the other hand, the hierarchical structure of the generative model and the functional importance of interrelations between hypotheses at various levels about properties of the environment at different spatiotemporal scales suggest a description theory. “For a given object or event in the world, the perceptual hierarchy builds up a highly structured definite description, which will attempt to pick out just that object or event” (p. 182). Though the account in this chapter is only a sketch, the picture Jakob suggests is of a statistical network in which the content of each variable is determined by its probabilistic relations to all the others (its internal causal or inferential role).
The conclusion is that the theory of content that fits best with PEM incorporates both causal and descriptive factors: representation of the world is “not just a matter of causal relations but rather a matter of causally guided modes of representation maintained in the brain” (p. 183).
Prediction error and mutual information
I turn now to a critical discussion of the positions sketched so far. One issue is that the contrast between short-term and long-term prediction error minimization that Jakob relies on to confront the disjunction problem seems to be at least conceptually independent of that between mutual information relations under current conditions VS on average. Even if the two approaches classify the same inferences as misperceptions (which is not immediately clear), it's not mutual information but the inferential consequences of erroneous perceptual inference that do the explanatory work in the story about why long-term PEM is undermined.
Jakob's argument, I take it, is something like the following: Selection of a false hypothesis at one level of the hierarchy in order to minimize local prediction error is bound to impair representation at other levels, given the interdependence of the hypotheses, and misperception occurs to the extent to which the total revisions to the model due to an inference tend to raise long-term prediction error (p. 176). The case in which I infer that the dog-at-night is a sheep may, for example, lead to inaccurate beliefs about the “wheareabouts of sheep and dogs” (p. 175). The case can also be made without appeal to inferences distinct from the perceptual inference in question: when I adopt the hypothesis that the dog-at-dusk is a sheep, I alter (however subtly) the priors that apply to perceptually similar situations, so that I become more likely to draw the same faulty inference in similar situations in the future.
A concern about this account is that, while drawing a certain type of perceptual inference regularly may predictably lead to an increase in prediction error, there may be single instances of perceptual inference that would intuitively count as misrepresentations but that never in fact lead to prediction errors. It seems that Jakob could meet this challenge by identifying cases of misperception as those that increase the risk (rather than the actual incidence) of future prediction error, given the way the world actually is. On this account, what marks off some perceptual inferences as cases of misperception is the fact that they result in an overall worse model of the world.
Troubles with statistical causal theories
The considerations of the previous section weaken the case for a causal component to content determination within the PEM framework, because the account of content in terms of mutual information seems to carry no explanatory weight with regard to misrepresentation. There are additional challenges to the view that mutual information relations can be used to isolate the causes relevant to determining content. I mention here only one such challenge that is particularly salient in the context of hierarchical PEM.
The challenge, considered by Eliasmith (2000, p. 59-60), is that any physical state will carry more information about its immediate causes than about its more distant causes, since each link in the causal chain introduces noise. Given this, we should expect representational vehicles to covary more reliably with the intermediate links in the causal chains connecting them to distal stimuli than with the distal stimuli themselves, and most reliably with other internal vehicles, for example the states of sensory transducers (see also Fodor 1990, p. 108-111). Eliasmith replies by ruling out dependencies that can be fully explained by computational links within the cognitive system as content-determining, but this seems too strong as it rules out the possibility of a system representing its own states in a way that's unmediated by exteroception.
The latter is potentially problematic for hierarchical PEM systems in particular, in which the causes of sensory input are represented in virtue of the system's ability to predict its own states. If “prediction” is univocal and each prediction corresponds to a hypothesis with representational content, then it should be the case that PEM systems represent states of the environment by virtue of many layers of higher-order representation of their own states. This is how Hinton et al (1995, p.2), for example, characterize the representational properties of a Helmholtz machine that includes a multi-layer generative model whose parameters are fit to data by minimizing free energy.
This claim about the ubiquity of higher-order representation in PEM systems is contentious, but is supported by the fact (discussed in my reply to Zoe's post on chapter 4) that representations high in the perceptual hierarchy function both as parts of the overall hypothesis about the external world and as predictions about the properties (such as precision) of hypotheses at lower levels in the hierarchy (see again Hinton et al. (1995)). And if each vehicle plays multiple representational roles, as this consideration suggests, then the contents of a vehicle can't be limited to the unique thing with which it covaries most reliably.
How do generative models represent?
The prospects for a content-determining role for one-one causal relations between individual vehicles and environmental states thus don't seem promising within the PEM framework. In addition to the argument just discussed, there are considerations in favor of the view that inferential roles determine the contents of the states of a generative model in a way that can't be explained by reference to their individual external causes.
First, a consistent theme in Jakob's book is that the internal model represents the world by recapitulating its causal structure. This suggests that the relation between the model and the world that's of representational interest is isomorphism (or, more weakly, homomorphism): the statistical relations between parameters of the model define a structure that is similar (in the ideal case, identical) to the causal structure of the bit of the world that's currently being represented, which explains the ability of an organism possessing such a model to respond rationally to features of the environment and their relations to one another. The role of the world is only to provide the error signal used to update the model in response to selective sampling by the senses (p. 183).
Second, the idea (discussed in chapter 5) that binding is inference favors of a descriptivist or inferentialist account of content. The proposal is that the error signal in a given case may be explained away in terms of a hypothesis involving one cause or one involving multiple causes, depending only on which has the highest posterior probability. This suggests that perception of objects in a scene is in fact a special case of perception of the scene as having some property (in this case, as containing some determinate number of objects).
Despite this, causal relations to the world clearly play an explanatory role, as Jakob says. That role is in explaining how model parameters are updated, and thus in explaining the etiology and maintenance of representations. Thus, it may be said that from the PEM perspective external causes play a diachronic (and genetic or etiological) role in fixing content, and inferential roles play a synchronic (and individuating) one.
Eliasmith, C. (2000). How Neurons Mean: A Neurocomputational Theory of Representational Content. Ph.D., Washington University St. Louis.
Fodor, Jerry A. (1990). A Theory of Content and Other Essays. MIT Press.
Hinton, G. E., Dayan, P., Frey, B. J. and Neal, R. (1995). “The Wake-Sleep Algorithm for Unsupervised Neural Networks”. Science 268, 1158-1161.
Usher, M. (2001). “A Statistical Referential Theory of Content: Using Information Theory to Account for Misrepresentation”. Mind & Language 16(3): 311-34.