Welcome to the second post of the online reading group in the Philosophy of Mind and Psychology hosted by the Philosophy@Birmingham blog. This month, Alastair Wilson (Birmingham Fellow in Philosophy specialising in Metaphysics and Philosophy of Physics) presents chapter 2 of The Predictive Mind by Jakob Hohwy (OUP 2013).
Alastair Wilson |
Chapter 2 - Prediction Error Minimization
Presented by Alastair Wilson
According to the perceptual error
minimization (PEM) model, perception is inferential. Chapter 2 addresses a
crucial question for any inferential approach - where do the priors come from?
- and argues that PEM offers a new answer to this question.
A statistical illustration Translating talk of Bayesian priors
into curve-fitting for illustrative purposes, what we want is a mechanism which
will optimize the trade-off between accuracy and over-fitting. This means that
there is a need to factor in expected noise levels in the incoming data when
determining how low to force the prediction error. PEM implements this via the
following feedback loop:
Use prior beliefs harnessed to an internal model to
generate predictions of the sensory input
|
→
|
|
←
|
Reconceiving the Relation to the World PEM goes beyond the analogy with trial-and-error procedures. The primary representational content of perception is encoded in the downwards/backwards connections between levels in the hierarchy rather than in the upwards/forwards connections. That is, perceptual content primarily consists in the predictions that higher levels are making about lower levels rather than in any top-down interpretation of the signals that higher levels receive from lower levels. "The functional role of the bottom-up signal from the world is then to be feedback on the internal models of the world." (47)
The solution Jakob favours is
that the process of perceptual inference is 'supervised by the world':
"supervision... emerges once we operate with a comparison of predicted and
actual input since revision of our internal model parameters can happen in the
light of the difference between them" (49). The point is that there is a
difference between two kinds of information-bearing signal that we might be
receiving from the world. On a passive or receptive model of perception, the
signal is essentially a picture of the world itself. But on the PEM model, the
signal is a picture of the difference between the world and our internal
prediction of it. Jakob argues that the latter can play the role of a
supervisory signal in a way that the former cannot.
The PEM mechanism requires that
the brain should be able to make direct measurements of prediction error. If we
have such a measurement capacity, then it will tend to maximize the mutual information between the state of
a neural population and a genuine cause in the external world.
A Deeper Perspective The surprisal of an outcome is a measure of how unexpected the outcome
is. Surprisal is relative to a model, so different agents with different
internal models and hence different expected outcomes will find outcomes differently
surprising. Then we can define our phenotype
in terms of the states we most probably occupy. We can't optimize surprisal
directly; that's too computationally complex to evaluate. But we can evaluate
and hence optimize free energy, which
is equal to surprisal plus a quantity called perceptual divergence which is something like noise in the data and
is always non-negative. Minimizing free energy is a proxy for minimizing
surprise.
Recognition and Model Inversion Recognition is a very difficult
problem, because "there is an intractably large number of ways that the
interacting causes assumed by the generative model could give rise to various
constellations of fantasized sensory input" (54). PEM solves (or avoids?)
this problem by decentralizing processes like recognition: each level of the
hierarchy has only to handle simpler causal models which allow for
computationally tractable minimization procedures.
Questions:
1. I'm not quite clear on the
relation between the personal and sub-personal levels in all this. Are both
individual levels in the perceptual hierarchy and whole minds to be modelled as (different sorts of) Bayesian
reasoners? To take a concrete example: is surprisal to be minimized by whole
minds or by sub-personal components of minds? If the latter, then doesn't the
phenotype as defined characterize a particular level in the hierarchy rather
than a whole mind, as the discussion seems to imply?
2. It seems that the mind (or an individual sub-personal
component of it) is 'supervised by the world' in at least two ways: i) the
prediction error signal is partly generated by the world and feeds back into
the model and ii) the wider cognitive architecture of the mind is partly
determined by evolutionary pressures exerted by the world. How much do these
two kinds of supervision have in common, and can we see them both as instances
of some more general type of feedback-guided process?
3. How much of the work is done
in the 'supervised-by-the-world' solution
by the PEM mechanism itself and how much by the broad externalism about
perceptual and semantic content?
These questions concern some of the deep, troubling, and philosophically crucial aspects of PEM. Some of them are addressed in later chapters, some in other work. They all seem fairly open and very interesting questions to me.
ReplyDelete1. The intention is that the whole model rather than subcomponents of it make up the mind at a personal level. But the theory so far doesn’t guarantee this, as you spot. A couple of things speak in favour of treating the whole model as the agent. The interconnectedness of levels of the cortical hierarchy means it is neat to think of the agent in terms of all the levels combined (though strictly speaking layers could ‘peel of’; I discuss this in my forthcoming paper on ‘the self-evidencing brain’). In chapter 12, and a few other places, I discuss the self, as the internal model of oneself conceived as an environmental cause contributing to (non-linear changes in) the flow of sensory input; it may be that the most economical self-model is the one that treat the agent as the whole system rather than sub-components. Finally, it may be that the whole mind rather than its subcomponents is what best minimize prediction error on average and in the longest run, and should therefore be considered the agent. Though there always has to be an internal model and some external hidden causes it may be that the ‘size’ of the agent is somewhat context dependent, leaving a fuzzy line between personal and subpersonal levels.
2. Yes, I think we can see them both as instances of prediction error minimization (or free energy minimization, for a more general framework). Evolution is prediction error minimization over longer time scales, on this view. Obviously a huge discussion is buried here, which I studiously avoid in the book!
3. This is a question about which theory of content best fits PEM. I have some, fairly preliminary, discussion of this in Ch8. Though there are both internalist and externalist tenets in PEM, and the causal element in some respects is foregrounded, I also favour an internalist reading, such that everything depends on an internal process where the parameters of a model are refined to explain away sensory input – this looks to me as an approximation to a description or ‘network’ theory. I think the task is to make the ‘supervised by the world’ aspect central, and then explain the causal element on that basis, e.g., in terms of inference. But that certainly cannot be the end of that story.
Thanks for this Jakob!
ReplyDelete1. I suspect some will dislike the way PEM involves applying Bayes at both personal and subpersonal levels - it might seem suspicious that a framework developed for the personal level just happens to apply in a useful explanatory way at the subpersonal level. But PEM does have an account of why this is - of what organisms and levels in the hierarchy have in common which makes them both apt for Bayesian treatment. And there's plenty of precedent for this sort of thing elsewhere; we apply intentional and epistemic vocabulary both to individual bees and to whole hives. Probably people who insist that all such talk is metaphorical when applied to superorganisms will likewise tend to want to think of treat it as metaphorical when applied to levels in the hierarchy. Perhaps one motivation driving that thought is that the agent/non-agent distinction is necessarily non-vague and/or tied to consciousness, as some metaphysical pictures strongly support.
2. This is very cool. A topic for the next book?
3. I'll postpone more questions on this to chapter 8. My initial impression was that PEM was thoroughly externalist; the error-signal content seems 'world-involving' insofar as I understand that expression. But I now see how at least some motivations for internalism can be accounted for by the model-inversion element of PEM. It looks like there's potential for a nice reconciliation.