Philosophy@Birmingham: Philosophy of Mind and Psychology Reading Group -- The Predictive Mind chapter 4

Friday, 30 May 2014

Philosophy of Mind and Psychology Reading Group -- The Predictive Mind chapter 4

Zoe Jenkin

Welcome to the fourth post of the online reading group in the Philosophy of Mind and Psychology hosted by the Philosophy@Birmingham blog. This month, Zoe Jenkin (Harvard) presents chapter 4 of The Predictive Mind by Jakob Hohwy (OUP 2013).

Chapter 4 - Action and Expected Experience
Presented by Zoe Jenkin

The first three chapters of The Predictive Mind sketch how prediction error minimization underlies all perceptual processing, explaining various features of the mind using one unified framework. Chapter four addresses the question of how action fits into the PEM framework, arguing not only that PEM can adequately accommodate action, but also that action plays a crucial role in minimizing prediction error. We end up with a picture on which in any given case of a prediction error (a discrepancy between the prediction of the system and the sensory input), this error can in principle be minimized in one of two ways—by revising one’s priors and generating a new hypothesis, or by acting so as to selectively sample the world in a way that makes the input data match the selected hypothesis. An example of such selective sampling might be, if the system predicts that there will be a face before it, it will fixate its eyes toward the region where the prediction dictates a nose will be, and scan for a surface with a characteristically nose-like slope. Hohwy notes that this active, selective sampling method will be more efficient than random sampling, because it will target regions of space where the hypothesis makes a particular or unique prediction and so can easily be confirmed or disconfirmed. On this view, “perceiving and acting are but two different ways of doing the same thing” (71), where that “thing” is minimizing prediction error.

This raises the question of how the mind decides which of these courses of action to take in any individual case—should it revise its priors and generate a new hypothesis, or should it act so as to change the sensory input and match the current hypothesis? Hohwy indicates that precision predictions facilitate this decision. If the input that would be obtained upon acting is predicted to be more precise than the occurrent input, the action route is taken. If, in contrast, the input that would be obtained upon acting is predicted to be less precise than the occurrent input, the prediction revision route is taken (78). Both of these routes are fundamentally ways of minimizing prediction error. The core difference on this picture between beliefs and perceptions on the one hand, and desires on the other, is in their directions of fit—beliefs and perceptions involve making the mind fit the world, while desires (or mismatches of the sort that leads to action) involve making the world fit the mind (83).

One worry that arises right off the bat is, if the goal of the system is to minimize prediction error, why don’t we just always close our eyes and predict a dark room? This would create a state of no prediction error, because the sensory input would match the predicted state of the system, and thus seem to fulfill the goal of minimizing prediction error. Hohwy’s reply to this worry is that prior probabilities provide a constraint on the sorts of hypotheses that the system will generate, and thereby on those that can be confirmed or disconfirmed through selective sampling. The hypothesis of being in a dark room will have a low prior probability in most contexts, so it will not end up being selected as the contents of perception. Action can only be used to test the reliability of hypotheses over space and time, not to generate them in the first place.

This is related to a more general illicit confirmation worry: by engaging in selective sampling to test any hypothesis (dark room or otherwise), won’t we just confirm whatever prediction we generate, by taking as input whatever bit of the world would confirm it, thereby eliminating the check on predictions that sensory input was supposed to provide? This seems at least at first glance analogous to confirmation bias in empirical science. Hohwy’s reply to this worry takes two different forms for the interoceptive and exteroceptive senses. For the interoceptive senses, this sort of cycle is exactly what we want, because it allows us to maintain homeostasis. For the exteroceptive senses, while this cycle is not what we want in order to arrive at veridical perception of the world, the cycle won’t actually occur, because only hypotheses with high priors will be tested in the first place, and then the world will place significant constraints on which hypotheses can be confirmed. Even if selective that is sampling does allow for some leeway in which hypotheses might be confirmed, ultimately if the hypothesis is significantly off base, the brain will revert to revision by perceptual inference.

Another worry that arises is, if action and perception serve the same basic function of minimizing prediction error, why do we need action at all when we have Bayesian inference? Hohwy’s response to this question is that once a perceptual inference is made, action can strengthen our confidence in that inference, because it can confirm that hypothesis across alternative space and time contexts. This will help decrease uncertainty, especially when multiple competing hypotheses have relatively similar posterior probabilities. Also, this testing by active sampling is quick and efficient, because it only involves generating the predictions across space and time of the selected hypothesis, instead of running the entire Bayesian calculus multiple times on various sets of priors. For these reasons, while action has the same overall goal as prediction, it achieves that goal in some ways that are uniquely useful to the system.

Another important feature of how action works in a PEM system is that the system includes models of itself, in addition to models of the external world. So in humans, the perceptual system will also make predictions about inner states such as heartbeat, arousal, and proprioceptive and kinesthetic states, allowing for the regulation of such internal systems. This explains both how the interoceptive senses function to regulate bodily states, as well as how information about out inner states can figure in our predictions of the world (for example in calculations about whether you could leap over a puddle of a certain length). When action is taken so as to minimize error with respect to distal goal states, lower-level predictions are made with respect to all of the internal routes to those goals, including the small-scale muscle movements that would arise when pursing a particular course of action. This allows for specificity in our determination of which route to a goal state to pursue. In these sorts of cases, the system’s models on internal and external states of the body and world will function together to generate action.

Here are some questions about the role of action in the PEM framework:

1. Hohwy says that the system determines whether to minimize prediction error by acting or by revising priors depending on the relative precision predictions about the occurrent vs. counterfactual input data. How does the mind know whether the data that will be gleaned upon acting will be more or less precise than the occurrent input data? This seems to depend on the quality of input data that the system would receive in the future. How does the calculation of these counterfactual precision predictions work?

2. Is the selective sampling that occurs selective in the sense that it targets areas that would be likely to confirm the hypothesis, or in the sense that it targets areas that would be likely to bear on the hypothesis one way or the other? Only the first sort of selective sampling seems to be the sort that would cause an illicit confirmation worry. However, the first sort seems like it would do much better than the second toward minimizing prediction error.

3. The PEM account attempts to reduce desires to beliefs with a different direction of fit. For example, having a desire for a muffin would be selecting a hypothesis with the content “there is a muffin before me” when there was no sensory input indicating muffin-presence. This would in turn explain the connection between desire and action, because you would be driven to act so as to change the sensory input so it would indicate that there was in fact a muffin before you (by, for example, getting yourself a muffin). But how would we ever get into the sort of predictive state that would drive action toward such a specific goal? In most contexts, your muffin prior is not high—it would only be high in certain situations, such as if you were near an oven from which a muffin smell was emanating, or if you were on a street near a bakery that you knew sold muffins. So given that the muffin prior is usually very low in contexts when it would not be reasonable to expect a muffin, how can we explain cases like one in which I am sitting at my desk and all of a sudden I want a muffin? This sort of out of the blue (or at least unconnected from any occurrent sensory input) desire seems to occur all the time, and it does not seem obvious how it would be generated given what it seems the inputs to the Bayesian calculus would be in these contexts. How do we ever end up desiring anything unexpected?

4. Where does this picture, on which the difference between belief and desire is simply a difference in direction of fit, leave other mental attitudes? It does not on the face of it seem to leave much room for more finely-grained distinctions between fears, hopes, and other standardly accepted folk psychological states. Does the view hold that all of these can be reduced to beliefs or desires?

5. Does the PEM framework have any resources to explain associational connections between mental states, such as why reading the word “salt” makes me think of “pepper”, or why the smell of Guinness makes me think of Oxford? While it is possible there could be inferential connections between the two concepts/experiences in these cases, there also seems to be many psychological transitions that are driven simply by brute associations (that were likely formed due to past co-occurrence). The basic mechanisms of the PEM system are inferential, so this sort of associational triggering, whether perceptually or cognitively based, does not seem to fit in naturally. How does the PEM framework make sense of such cases?

6. Hohwy says that our inability to tickle ourselves (which survives even across extensive changes in body image) supports the idea that precision predictions help determine whether to act or revise predictions in any given case. This was an intriguing illustration and I am curious to hear more. How is this evidence for the view?

7. How separable is the PEM account of action from the rest of the PEM model of the mind? If we had particular reasons for rejecting the explanation of action as fundamentally a route to minimizing prediction error, could we still accept the broad strokes of the account of perceptual processing from the first three chapters? Or must they stand and fall together?

10 comments:

Jakob Hohwy3 June 2014 at 15:23
Answer to Q1: This is a question about what swings the system from perceptual inference to active inference. It is an interesting question because it might tell us something new about what it takes to be an agent (I discuss it again at the end of Ch. 7). Part of the answer has to do with expected precisions of prediction error. We build up short term, medium term and long term representations of patterns of precisions, this means we have expectations about which sorts of contexts and environments we should actively gravitate towards to increase the precision of prediction error. Another part of the answer has to do with running counterfactuals “off-line” (imagined active inference), which allows us to build up salience maps that guides us towards prediction errors that we, based on the map, expect to be precise. Further, we have prior expectations for the kinds of sensory flows we tend to have and these priors can outcompete the current hypothesis; this is partly because these priors have good expected precision (we attend to them) and it is partly because the current hypothesis (e.g., “I am not moving”) will tend to decrease its prior. This latter bit is because all the higher levels in the hierarchy represent causes that will manifest in the near and not so near future and thereby conspire to introduce changes in the current sensory data – I’ll return to this in the tickle question, #6 below.

Answer to Q2. I have had conversation about this question with a number of people, and it is a good one. I haven't seen any research that recognizably addresses this (though machine learning is a big and technical field so some text books might have something). So I can speculate. As I understand things, active inference is driven by predictions of precise prediction error (e.g., salience maps for eye movements). So one would think that this means active inference is mostly about finding evidence in favour of a selected hypothesis; i.e., testing to confirm. I think this also takes care of some instances of disconfirmation because finding evidence in favour of one hypothesis thereby "explains away" competing hypotheses (cf. Pearl on rain vs. sprinkler as explanation of one’s wet lawn, where the two hypotheses become dependent conditional on the new evidence of the neighbor’s lawn also being wet). I think it is very likely that the system also devises strategies for getting precise disconfirming evidence in some cases (perhaps the kinds of cases Andy Clark would describe as quick and dirty processing). There might be cases where the system has learnt that there is more precision, and quicker prediction error minimization gains, from evidence that disconfirms a hypothesis. It would be nice to have a good example here. Perhaps rubbing a painful spot, which disconfirms that there is tissue damage? In the rubber hand illusion, people want to move their real hand to disconfirm that the rubber hand is really theirs (though it's tricky since moving the hand simultaneously confirms the hypothesis that my hand is where I believe it is). It is probably also the case that where there is a plethora of possible high precision prediction error to explore, the system choses to explore prediction error that confirms one hypothesis and at the same time disconfirms another relevant competitor (e.g., eye movement that helps determine whether the face I look at is male or female, rather than determines whether the face I look at is male or Klingon). There is however probably more this than I can give here, as it also taps into topics about Bayesian model selection in the brain, which may go beyond vanilla PEM.

ReplyDelete
Replies
Jakob Hohwy3 June 2014 at 15:24
Answer to Q3. Perhaps this says more about my own muffin expectations than anything else, but I suspect that there is a great deal more regularity in our (interoceptive and exteroceptive) environment than we perhaps realize, I also suspect that we build up lots and lots of associations based on these regularities. In other words, I don’t think something like a muffin desire can only be triggered in a very direct way by the presence of clearly muffin-related stimuli. I think it is quite likely that some interoceptive prediction error is associated on the long time scale with getting sugary stuff, e.g., if it happens at a certain time of day. Then it is a matter of translating this down the hierarchy to issue in a reasonably precise expected sensory flow that can return me to the expected state. It seems likely that those who live in affluent societies like us have a certain set of sugary priors that can be associated with this flow, and that among these priors the muffin looms large. It is also possible that if no hypothesis sticks out specifically as good for getting sugar, then random fluctuations will jitter the state around and one day you end with muffin, another with a yellow banana. Exploration (in the expectation of encountering precise prediction error) is another source of learning about desires and their satisfaction.

Answer to Q4. This is a great question to think about. I did not deal with it in Ch 4, though philosophers tend to bring it up. I think it is one of the places in the book where there is room to explore, and where probably we’d end up with rather revisionary proposals. Notice that all the way at the final chapter (12) there is a discussion of how PEM explains emotions, under which some of these issues would fit. That chapter recaps some ideas from an earlier paper in Mind & Language, and I also agree with Anil Seth’s great work on interoceptive prediction error here. Sticking with the question of attitudes, I think, in fact, that there is much room for fine-grained mental attitudes. This is because we have two directions of fit and a whole hierarchy to play with. Perhaps we can make a start by saying that fears are simply expectations of dramatically increasing prediction error, perhaps coupled with few and weak priors for how to make that expected prediction error go away. Hope would be an expectation that priors will soon form, which allows expected prediction error to be explained away...

Answer to Q5. I think associations fit in nicely, even weak, rather spurious ones. Statistical regularities are all about associations, and they are our way of beginning to make sense of the world. We build up large, crude probabilistic maps, and only for some of the mapped associations do we later amass evidence to be confident that they hold, or test them under active inference, to see if they are invariant. But in cases where we don’t have strong evidence to go by it seems quite likely that processing can be led by such weak associations. Even if we don’t want pepper right now it is strongly predicted by the presence of salt so we are surprised if it is not there. Perhaps here also belongs cases like the unexpected absence of the noise from the train during the night, an absence of input which produces a prediction error that wakes me up, thinking about trains. Inference is about what hidden causes caused sensory input, and there can be lots of uncertainty in such inference, for which all sorts of associations can be cues (e.g., “I smell Guinness, that could be caused by me being in Oxford!” a hypothesis that is easy to disconfirm)
ReplyDelete
Replies
Jakob Hohwy3 June 2014 at 15:27
Answer to Q6. The standard account of our inability to tickle ourselves is that the sensory consequences of own movement is predicted precisely (via efference copies of motor commands) and thereby attenuated – hence less ticklish. When others tickle us we cannot predict and so it feels more ticklish. But Harriet Brown and colleagues are now developing a different account: we don’t feel self-tickle as much because at the moment of movement there is a more general dampening down of sensory input, and this dampening down is mandated by PEM. The reason is that movement requires a false hypothesis (“I am moving”) winning over a true hypothesis (“I am not moving”). The problem is that even if the muffin in front of me makes it increasingly likely that I am moving, the current proprioceptive and visual etc evidence I receive keeps accumulating in favour of the true hypothesis. To swing this balance, the current sensory input should be dampened down, thereby causing the true hypothesis to lose strength and the false hypothesis to win (and become the true hypothesis by having its prediction error enslave action, giving me the muffin already). The dampening down we can experience as the self-tickle effect. The question is whether this is a trick the brain has learnt over and above PEM, or whether it falls out of PEM itself. I think it falls out of PEM: we know that staying in the same situation (with the same inference) for too long is likely to be counterproductive in the longer run, because the world changes around us making old hypotheses lose probability. So we should constantly be wary of the current sensory input, i.e., decreasing the gain on it – which gives us this effect. George van Doorn and I have tickled lots of people (well, George have with his strange machine) in an attempt to find evidence for this story, namely finding that even self-generated but unpredictable tickle-sensations are attenuated, which would count against the efference copy story and in favour of this new story. So far we have done one loosely relevant study on self-tickling in illusory body-swapping (in Consciousness & Cognition) and just today we have been writing up our next study, which is a more close exploration of this – supporting the PEM story naturally: even unpredictable tickle fails to feel ticklish. Hopefully more in print on this soon.

Answer to Q7. There are many people who work with Bayesian accounts of perception (even predictive coding accounts) who don’t need any notion of action to do their amazing work. So in that sense things are separable. But on the account of prediction error minimization that is in play here active inference must be part of the story. Once we buy into the idea that prediction error minimization only makes sense on average and over the long run then we must consider that it happens relative to a model, and a model is described in terms of the states we tend to occupy, which we could only do if we move around (to avoid unexpected states), which makes active inference central. Though Bayesian perspectives on perception can be useful heuristics even without active inference, we can only reap the explanatory dividends by buying into the whole story, i.e., the free energy principle. Relatedly, there are arguments out there suggesting that action cannot rely on explicit cost functions, for reasons of intractability, and that therefore cost functions should be absorbed into the prior landscape and therefore dealt with in inferential terms. (Wanja Wiese has a nice review of the book in Minds & Machines, which further looks into the perception action issue).

I hope these answers are useful - I want to thank Zoe again for opening this great discussion. Even though I call these things 'answers' I think that there is still much that can be said about each of these questions. This is, in my view, what makes PEM such an interesting beast to play with.
ReplyDelete
Replies
Alex Kiefer4 June 2014 at 22:31
Thanks, all, for the great discussion so far, and sorry not to have jumped in earlier (though I’ve been following with interest)!

I’ll start by saying that I began as a fan of PEM and that the discussion so far has reinforced this. Most of the questions have been well-taken calls for detail, not in-principle challenges to the framework itself, and I think Jakob’s responses have been satisfying.

But I'm a bit bothered. One of the main alleged virtues of PEM is that it explains a whole lot using very little, and the complexity of the issues raised about error, predicted precision, switching between active and passive inference, etc, might seem to belie the simplicity of PEM-style explanation.

So I want to suggest a few theses that would, if true, simplify the picture of how the PEM mechanism deals with these issues, by deflating some of the relevant distinctions. I take it that these points are compatible with Jakob’s explicit claims in the book, and congruent with what’s already been said in defense of PEM, though I’d be happy to learn how my thinking differs from others’ on this stuff:

1) Higher-order prediction about precisions of signals is the same process as first-order prediction about sensory input, so we don’t need distinct higher-order hypotheses or a way of deciding when to rely on them.

In particular, excluding the first layer in the hierarchy that predicts sensory registration events directly, each layer of neurons Ln functions both as a first-order prediction about activity in the layer Ln-1 below and as part of a higher-order prediction about the activity of Ln-2. Concretely, suppose L1 predicts L0’s activities, and L2 predicts L1’s. Then if L2 is a good model of L1, it will be able to make different predictions for each of L1’s states, and in particular, it will model the difference between more and less precise hypotheses about L0 constituted by varying activity in L1. But L2 and L1 taken together also constitute a model of L0, and a better one than L1 alone does (Hinton et al show that adding layers in this way, under certain conditions, always yields a better model of the domain generating the input: http://www.stats.ox.ac.uk/~teh/research/unsup/nc2006.pdf).

2) No dedicated mechanism is needed to switch between passive and active inference. Top-down signals and bottom-up signals compete for “control” of the same representational space, and if top-down signals are sustained in the face of divergent input for whatever reason (low sugar levels in the muffin example, or emotional investment in a certain belief, say) they will tend to cause endogenous change in the system’s states (in the limit, action). If predictions about precision are just higher-order predictions about sensory input (see last point), we can see the choice of passive or active inference on a given occasion as falling out of the contents of the currently active predictions.

3) Associations are the basic currency of PEM. The framework is at bottom inspired by advances in machine learning and in particular in computational modeling of neural nets, i.e. connectionist networks. Connectionist models, interpreted as psychological theories, are usually taken to be associationist. So it would be weird if these models had trouble accounting for associations.

The alleged problem is that the connections between representations posited in PEM are inferential, and that there can be “brute associations” with no rational status. But I don’t think the latter has been shown. If the causal influence of one level in the hierarchy on others implements association (and I don’t see why it wouldn’t), then PEM suggests that all associations have some minimally rational (statistical, broadly Bayesian) etiology and function. This isn't common sense, but I don’t think the theoretical distinction between inference and association is sufficiently sharp to indict PEM on this matter, given that we lack examples of brute associations.

ReplyDelete
Replies
BryanP7 June 2014 at 19:48
Being my first post I should say that I am a big fan of PEM.

Alex, regarding point 3, I do not think it is entirely accurate to characterise PEM as having a connectionist pedigree or even being inspired by such accounts. There is a great deal of baggage that can come with a connectionist account that PEM would explicitly reject e.g. a lack of true dynamics, difficulty pinning down aspects of representation, lack of intention and/or agency, Bayes not necessarily being built in from the ground up, difficulties with appropriate/sensible learning rules (back prop etc.).

One of the important and somewhat novel aspects of PEM is that it is grounded in the physical laws, at its heart PEM (Free Energy) is based in Quantum Mechanics. Specifically, the Fokker-Planck (or Kolmogorov-Forwards) equations that detail how states evolve over time are formally equivalent to both Schrödinger's equation and Feynman's Path Integral formulation.
ReplyDelete
Replies
Alex Kiefer8 June 2014 at 05:03
Hi BryanP,
The newer artificial neural networks that overcome the limits of back-propagation, etc, use precisely PEM-type mechanisms. Free energy minimization within a hierarchical generative model is proposed as an approach to neural network learning by Geoff Hinton and colleagues in a 1995 paper:
http://www.gatsby.ucl.ac.uk/~dayan/papers/hdfn95.pdf

Other work of Hinton's discusses the connection between efficiency of representation in neural networks and minimization of free energy in more detail (I'm still working on understanding this latter paper myself): http://www.learning.cs.toronto.edu/~hinton/absps/cvq.pdf

Given the pervasive citation of Hinton in PEM literature I've read, I thought that this type of work was driving much of the paradigm. I may be wrong about the history, but at a minimum contemporary connectionism and PEM are deeply compatible.
ReplyDelete
Replies
BryanP8 June 2014 at 22:39
Hi Alex,

The second paper looks more like using free energy of a generative model) as a means of estimating the MDL for learning in an autoencoder?

I agree that Hinton's (along with Helmholtz and others) work is crucial to PEM (Free Energy). But PEM departs from neural networks in a number of ways, most notably there is no network structure, no input or output layers, nor any hidden layers (at least explicitly). The two key features of PEM (Free Energy) that distinguishes it are the distinction between external and internal states via a Markov Blanket and the gradient descent learning of state vectors (via Generalised Filtering).

I think given the advancements it does PEM a disservice to label it as computational modelling of neural nets, they are compatible but perhaps not deeply so.

Before I go too much further I need to get down some thoughts of my own on Zoe's questions.
ReplyDelete
Replies
Jakob Hohwy16 June 2014 at 05:07
General point. I think Alex is absolutely right in reminding us of the explanatory simplicity of PEM. This is why PEM is so attractive. As he says, nothing I’ve said detracts from this simple story, though the picture can come across as complicated. The complications arise when we are applying the basic idea to difficult issues like the distinction between action and perception. Even though the mechanism is the same everywhere, there are a few moving parts (predictions, precisions, action, complexity) which can all interact in different ways, in different parts of the hierarchy.

Point 1. This sounds right. The idea is that each level does the same type of job, and that levels together predict the input level activity. So the overall hypothesis spans levels, and each level is hypothesizing only about its own input (levels are conditionally independent). I sometimes talk about difficult decisions about when to engage higher level hypotheses and when not. This relates to precisions, and therefore to assessment of changes in context; in other words whether a change in the signal should be regarded as mere noise, or as the result of interacting causes. This is hard to do on a case-by-case basis and may require very long term prediction error minimization. It is the problem of how informative hyperpriors should be. I think we should expect a fair bit of individual variability in how people update the precisions of their hyperpriors, for this reason.

Point 2. I agree that there can be no mechanism that in itself chooses between perception and action. Instead the picture is very much like the one Alex sets out here. This is also what the attenuation and self-tickle story bears out: whether there is action is just a matter of which hypothesis wins, and which hypothesis begins to lose weight. We might have learned to exploit this optimally with endogenous attentional allocation, which shifts weights around (and this is of course itself a matter of learning a long term regularity in precisions).

Point 3. I agree on this point, it is all about picking up regularities in the pattern of input. As to the broader issue of how much this has to do with connectionism, I think both Alex and Bryan are right. There is much in common with connectionist (back propagation etc.) ideas, and Hinton’s machine learning approach is in the connectionist mould. But Bryan is right that there is huge influence from statistical physics here (part of which comes via Hinton’s work). The really challenging but also interesting part of the free energy way of telling the story comes from these ideas and how they connect to accounts of self-organisation and so on (see the papers Bryan links to). I don’t think there is an answer as to what the uniquely best way of describing PEM is, it depends somewhat on explanatory purposes. I never found connectionism itself so interesting because it seemed to me to lack the epistemic (Bayesian) element (i.e., no self-supervised connectionist account ever seemed to succeed until PEM came around). On the other hand the (enactivist) literature on self-organisation also seemed to lack a proper epistemic element so I always struggled to understand it (spruiking myself some more, the enactivist relation to PEM is discussed in my recent Noûs paper https://www.dropbox.com/s/ds7p7e5lwxm2nb2/Hohwy%20The%20self-evidencing%20brain%20Nous%20Web.pdf).
ReplyDelete
Replies

Add comment