Philosophy@Birmingham: Philosophy of Mind and Psychology Reading Group -- The Predictive Mind chapter 2

Welcome to the second post of the online reading group in the Philosophy of Mind and Psychology hosted by the Philosophy@Birmingham blog. This month, Alastair Wilson (Birmingham Fellow in Philosophy specialising in Metaphysics and Philosophy of Physics) presents chapter 2 of The Predictive Mind by Jakob Hohwy (OUP 2013).

Alastair Wilson

Chapter 2 - Prediction Error Minimization
Presented by Alastair Wilson

According to the perceptual error minimization (PEM) model, perception is inferential. Chapter 2 addresses a crucial question for any inferential approach - where do the priors come from? - and argues that PEM offers a new answer to this question.

A statistical illustration Translating talk of Bayesian priors into curve-fitting for illustrative purposes, what we want is a mechanism which will optimize the trade-off between accuracy and over-fitting. This means that there is a need to factor in expected noise levels in the incoming data when determining how low to force the prediction error. PEM implements this via the following feedback loop:

Use prior beliefs harnessed to an internal model to generate predictions of the sensory input	→	Revise models prediction or change sensory input to minimize prediction error subject to expectations of noise.
	←

Reconceiving the Relation to the World PEM goes beyond the analogy with trial-and-error procedures. The primary representational content of perception is encoded in the downwards/backwards connections between levels in the hierarchy rather than in the upwards/forwards connections. That is, perceptual content primarily consists in the predictions that higher levels are making about lower levels rather than in any top-down interpretation of the signals that higher levels receive from lower levels. "The functional role of the bottom-up signal from the world is then to be feedback on the internal models of the world." (47)

Being Supervised by the World The priors involved in Bayesian models of PEM are sub-personal: they control the outputs each individual layer of the perceptual hierarchy passes to the nearby layers based on what its inputs from those layers are. But where do these sub-personal priors come from in the first place? It looks like only something with priors designed into it by some supervisor could in turn design something with priors. This can be posed as an instance of a very general form of explanatory or causal regress.

The solution Jakob favours is that the process of perceptual inference is 'supervised by the world': "supervision... emerges once we operate with a comparison of predicted and actual input since revision of our internal model parameters can happen in the light of the difference between them" (49). The point is that there is a difference between two kinds of information-bearing signal that we might be receiving from the world. On a passive or receptive model of perception, the signal is essentially a picture of the world itself. But on the PEM model, the signal is a picture of the difference between the world and our internal prediction of it. Jakob argues that the latter can play the role of a supervisory signal in a way that the former cannot.

The PEM mechanism requires that the brain should be able to make direct measurements of prediction error. If we have such a measurement capacity, then it will tend to maximize the mutual information between the state of a neural population and a genuine cause in the external world.

A Deeper Perspective The surprisal of an outcome is a measure of how unexpected the outcome is. Surprisal is relative to a model, so different agents with different internal models and hence different expected outcomes will find outcomes differently surprising. Then we can define our phenotype in terms of the states we most probably occupy. We can't optimize surprisal directly; that's too computationally complex to evaluate. But we can evaluate and hence optimize free energy, which is equal to surprisal plus a quantity called perceptual divergence which is something like noise in the data and is always non-negative. Minimizing free energy is a proxy for minimizing surprise.

Recognition and Model Inversion Recognition is a very difficult problem, because "there is an intractably large number of ways that the interacting causes assumed by the generative model could give rise to various constellations of fantasized sensory input" (54). PEM solves (or avoids?) this problem by decentralizing processes like recognition: each level of the hierarchy has only to handle simpler causal models which allow for computationally tractable minimization procedures.

Questions:

1. I'm not quite clear on the relation between the personal and sub-personal levels in all this. Are both individual levels in the perceptual hierarchy and whole minds to be modelled as (different sorts of) Bayesian reasoners? To take a concrete example: is surprisal to be minimized by whole minds or by sub-personal components of minds? If the latter, then doesn't the phenotype as defined characterize a particular level in the hierarchy rather than a whole mind, as the discussion seems to imply?

2. It seems that the mind (or an individual sub-personal component of it) is 'supervised by the world' in at least two ways: i) the prediction error signal is partly generated by the world and feeds back into the model and ii) the wider cognitive architecture of the mind is partly determined by evolutionary pressures exerted by the world. How much do these two kinds of supervision have in common, and can we see them both as instances of some more general type of feedback-guided process?

3. How much of the work is done in the 'supervised-by-the-world' solution by the PEM mechanism itself and how much by the broad externalism about perceptual and semantic content?

2 comments:

Jakob Hohwy31 March 2014 at 12:12
These questions concern some of the deep, troubling, and philosophically crucial aspects of PEM. Some of them are addressed in later chapters, some in other work. They all seem fairly open and very interesting questions to me.

1. The intention is that the whole model rather than subcomponents of it make up the mind at a personal level. But the theory so far doesn’t guarantee this, as you spot. A couple of things speak in favour of treating the whole model as the agent. The interconnectedness of levels of the cortical hierarchy means it is neat to think of the agent in terms of all the levels combined (though strictly speaking layers could ‘peel of’; I discuss this in my forthcoming paper on ‘the self-evidencing brain’). In chapter 12, and a few other places, I discuss the self, as the internal model of oneself conceived as an environmental cause contributing to (non-linear changes in) the flow of sensory input; it may be that the most economical self-model is the one that treat the agent as the whole system rather than sub-components. Finally, it may be that the whole mind rather than its subcomponents is what best minimize prediction error on average and in the longest run, and should therefore be considered the agent. Though there always has to be an internal model and some external hidden causes it may be that the ‘size’ of the agent is somewhat context dependent, leaving a fuzzy line between personal and subpersonal levels.

2. Yes, I think we can see them both as instances of prediction error minimization (or free energy minimization, for a more general framework). Evolution is prediction error minimization over longer time scales, on this view. Obviously a huge discussion is buried here, which I studiously avoid in the book!

3. This is a question about which theory of content best fits PEM. I have some, fairly preliminary, discussion of this in Ch8. Though there are both internalist and externalist tenets in PEM, and the causal element in some respects is foregrounded, I also favour an internalist reading, such that everything depends on an internal process where the parameters of a model are refined to explain away sensory input – this looks to me as an approximation to a description or ‘network’ theory. I think the task is to make the ‘supervised by the world’ aspect central, and then explain the causal element on that basis, e.g., in terms of inference. But that certainly cannot be the end of that story.
Alastair Wilson2 April 2014 at 21:14
Thanks for this Jakob!

1. I suspect some will dislike the way PEM involves applying Bayes at both personal and subpersonal levels - it might seem suspicious that a framework developed for the personal level just happens to apply in a useful explanatory way at the subpersonal level. But PEM does have an account of why this is - of what organisms and levels in the hierarchy have in common which makes them both apt for Bayesian treatment. And there's plenty of precedent for this sort of thing elsewhere; we apply intentional and epistemic vocabulary both to individual bees and to whole hives. Probably people who insist that all such talk is metaphorical when applied to superorganisms will likewise tend to want to think of treat it as metaphorical when applied to levels in the hierarchy. Perhaps one motivation driving that thought is that the agent/non-agent distinction is necessarily non-vague and/or tied to consciousness, as some metaphysical pictures strongly support.

2. This is very cool. A topic for the next book?

3. I'll postpone more questions on this to chapter 8. My initial impression was that PEM was thoroughly externalist; the error-signal content seems 'world-involving' insofar as I understand that expression. But I now see how at least some motivations for internalism can be accounted for by the model-inversion element of PEM. It looks like there's potential for a nice reconciliation.

Friday, 28 March 2014

Philosophy of Mind and Psychology Reading Group -- The Predictive Mind chapter 2

2 comments: