Welcome to the second post of the online reading group in the Philosophy of Mind and Psychology hosted by the Philosophy@Birmingham blog. This month, Alastair Wilson (Birmingham Fellow in Philosophy specialising in Metaphysics and Philosophy of Physics) presents chapter 2 of The Predictive Mind by Jakob Hohwy (OUP 2013).
Chapter 2 - Prediction Error Minimization
Presented by Alastair Wilson
According to the perceptual error minimization (PEM) model, perception is inferential. Chapter 2 addresses a crucial question for any inferential approach - where do the priors come from? - and argues that PEM offers a new answer to this question.
A statistical illustration Translating talk of Bayesian priors into curve-fitting for illustrative purposes, what we want is a mechanism which will optimize the trade-off between accuracy and over-fitting. This means that there is a need to factor in expected noise levels in the incoming data when determining how low to force the prediction error. PEM implements this via the following feedback loop:
Use prior beliefs harnessed to an internal model to generate predictions of the sensory input
Reconceiving the Relation to the World PEM goes beyond the analogy with trial-and-error procedures. The primary representational content of perception is encoded in the downwards/backwards connections between levels in the hierarchy rather than in the upwards/forwards connections. That is, perceptual content primarily consists in the predictions that higher levels are making about lower levels rather than in any top-down interpretation of the signals that higher levels receive from lower levels. "The functional role of the bottom-up signal from the world is then to be feedback on the internal models of the world." (47)
The solution Jakob favours is that the process of perceptual inference is 'supervised by the world': "supervision... emerges once we operate with a comparison of predicted and actual input since revision of our internal model parameters can happen in the light of the difference between them" (49). The point is that there is a difference between two kinds of information-bearing signal that we might be receiving from the world. On a passive or receptive model of perception, the signal is essentially a picture of the world itself. But on the PEM model, the signal is a picture of the difference between the world and our internal prediction of it. Jakob argues that the latter can play the role of a supervisory signal in a way that the former cannot.
The PEM mechanism requires that the brain should be able to make direct measurements of prediction error. If we have such a measurement capacity, then it will tend to maximize the mutual information between the state of a neural population and a genuine cause in the external world.
A Deeper Perspective The surprisal of an outcome is a measure of how unexpected the outcome is. Surprisal is relative to a model, so different agents with different internal models and hence different expected outcomes will find outcomes differently surprising. Then we can define our phenotype in terms of the states we most probably occupy. We can't optimize surprisal directly; that's too computationally complex to evaluate. But we can evaluate and hence optimize free energy, which is equal to surprisal plus a quantity called perceptual divergence which is something like noise in the data and is always non-negative. Minimizing free energy is a proxy for minimizing surprise.
Recognition and Model Inversion Recognition is a very difficult problem, because "there is an intractably large number of ways that the interacting causes assumed by the generative model could give rise to various constellations of fantasized sensory input" (54). PEM solves (or avoids?) this problem by decentralizing processes like recognition: each level of the hierarchy has only to handle simpler causal models which allow for computationally tractable minimization procedures.
1. I'm not quite clear on the relation between the personal and sub-personal levels in all this. Are both individual levels in the perceptual hierarchy and whole minds to be modelled as (different sorts of) Bayesian reasoners? To take a concrete example: is surprisal to be minimized by whole minds or by sub-personal components of minds? If the latter, then doesn't the phenotype as defined characterize a particular level in the hierarchy rather than a whole mind, as the discussion seems to imply?
2. It seems that the mind (or an individual sub-personal component of it) is 'supervised by the world' in at least two ways: i) the prediction error signal is partly generated by the world and feeds back into the model and ii) the wider cognitive architecture of the mind is partly determined by evolutionary pressures exerted by the world. How much do these two kinds of supervision have in common, and can we see them both as instances of some more general type of feedback-guided process?
3. How much of the work is done in the 'supervised-by-the-world' solution by the PEM mechanism itself and how much by the broad externalism about perceptual and semantic content?