First question: What are proper grounds for low confidence in a signal?
The signal can buck the trend in different ways. Importantly, a signal can change such that even if it has the same mean, it loses precision (e.g., a sound becomes blurry). This bucks a precision trend. This is sufficient for decreasing the trust in the signal. This is a reasonable response (and is seen in the ventriloquist effect). We build up expectations that deal with such state-dependent changes in uncertainty. Sometimes we will appeal to higher-level, slower regularities that interact with the low level expectations in question (my headphones are running out of battery, that’s why the sound has become blurry). This will often help in a particular case, but might not (I might rather learn I have developed otosclerosis, or I might need to adjust to new levels of irreducible noise). But if I do rely on the battery hypothesis then I down-grade the trust in the current signal. A similar story can be told for a change in mean, as when the location of the sound-source seems to change. Here I might appeal to an expectation about interfering causes that push the sound-source around. But if I assume the sound shifts due to some kind of echo chamber, then I’d be justified in trusting the signal less. The underlying story here is that the brain in the long run will end up apportioning noise to the right sources, and thus being able to make the right decision about any given signal.
Noise comes in different guises. There is irreducible noise, which the brain can’t do anything about (risk). There is expected uncertainty, variability in the signal that we know about and expect. And there is unexpected uncertainty, variability that we haven’t predicted (due to state-dependent levels of noise). Much of the work concerns unexpected uncertainty. The assumption is that often such uncertainty arises because there are interacting factors, which can be inferred.
All this is taken care of by the brain in unconscious perceptual inference, mechanistically by plasticity at various time scales. The somewhat elaborate illustration with the dam-plugging was meant to capture this mechanistic aspect (and how it ensures representation). If the brain is able to anticipate the ‘inflow’ of sensory input both at fast and slower time-scales, then it’ll minimize its prediction error (on average and in the long run), and if it does that then it is guaranteed to approximate Bayesian inference. Of course, there is much buried here in terms of how something like the brain can implement Bayes.
Second question: When there’s a discrepancy between the expectation and the signal, is that a prediction error?
There are a few different things to respond to in this question. I think it is best not to have anything but prediction error being ascending signals in the brain (so there is nothing that is ‘discrepancy’ which is different from prediction error). Prediction error can pertain both to the mean and the precision. So a new mean (from yellow to grey, say) can be a prediction error, but so can a new precision (from clearly yellow to a blurry yellow, say). If I expect to go into an environment with lots of blurriness (e.g., dusk), then the brain somewhat inhibits the prediction error I get there, which lets colour perception be more driven by priors. If course, if my predictions are truly perfect (in whatever environment), then there is no prediction error (then sensory input is completely attenuated by the model). But this situation never arises, due to irreducible and state-dependent noise. Crucially, it makes sense to speak of expected prediction error (this is going to be central for understanding action, in ch4). Organisms strive to avoid prediction error, but they also strive to avoid bad, imprecise prediction error in so far as they cannot entirely avoid prediction error (or rather, as they integrate out uncertainty about what kind of prediction error they expect, as pr the free energy principle).
In this way there is only one functional role for second-order statistics in the brain: control the gain on prediction error. Sometimes, the gain is high, when the signal is trusted, and sometimes it is low, when the signal is not trusted. This gain-control relates to expected precisions. It is all part of Bayes, but remember Bayes is empirical, hierarchical Bayes, which automatically has room for higher-level hyperparameters (expectations for what the next mean should be) and higher-level hyperpriors (expectations for what the next precision should be). In this sense, first and second order statistics are already there in the Bayesian framework. For the case of the grey banana, we have to ask if the ‘grey’-signal is precise or not. If precise, then it is likely it will drive perceptual inference, so we see the banana as grey (since a precise, narrow signal around the ‘grey’ mean makes it very unlikely that there is any evidence for a ‘yellow’ mean). If the signal is instead imprecise then it will be inhibited (proportional to its imprecision) and will exert less influence on the perceptual inference; this makes sense since even an imprecise signal has a mean (‘grey’) and this mean should not be allowed to determine perception, because it is embedded in a poor signal. At the same time, since it is an imprecise signal, it will (as we assume here) carry some evidence for a ‘yellow’ mean. Combined with a strong prior for yellow bananas, we might see it as yellow-ish (as indeed recent studies show).
Third question: Does the signal come in the form of a probability distribution?
I am not entirely sure I get the full intention with this question. I think it is safe to say that the signal, or the prediction error, comes as a probability distribution - it has a mean and variance. We might think of it as a set of samples (a certain firing pattern of a population of neurons when confronted with the banana; they each ask “yellow?”, and get a set of more or less consistent answers back). I think there is computational neuroscience out there, which treats such firing patterns as encoding a probability distribution, independently of this particular Bayesian story. It makes good sense I think: on a given occasion, the prediction error is a set of samples, which cluster imperfectly around the ‘yellow’ mean (e.g., a few samples will be towards the ‘red’ mean).
It is worth noting that with a hierarchical structure to rely on, these sensory samples can often afford to be very sparse and yet inference will be reliable; I just need a few high precision cues that a given pattern is instantiated and then I’ll know. Given all my prior knowledge, the sensory input need not embody all the information right here and now. (Illusions probably exploit this propensity to ‘jump’ to conclusions to create false inference (e.g., the pac man corners in the Kanizsa triangle)).
Fourth question: What is the impact of second-order perceptual inference on perceptual processing, and when and how does it differ functionally from first-order perceptual inference?
The banana-dialogues very nicely set out the statistical decisions one can engage in when confronted with different types of mean/precision combinations combined with different types of prior expectations for means and precisions. Some of my answers to the previous questions begin to address these. Option 1: in general if there is great precision in a signal then it’ll tend to drive inference, even against long-held priors. Much of the book goes on about this, e.g. for our experiments on the rubber hand illusion, where priors about the body are pushed aside. But the brain is shifty, it might try some subtle compromise (seeing the grey banana as slightly yellow-ish). Option 2: I discuss this above too: this seems a likely scenario for a weak signal. Option 3: here we can appeal to the idea that precision optimization is just attention (an idea which comes up in ch9 especially), and that attention comes in endogenous and exogenous forms. So I might have an expectation about a signal’s precision that sets the gain at a low ‘2’ (=I endogenously attend away from it) but later its intrinsic signal strength increases (someone changes the batteries) and with it its precision improves. This means that even if the gain is set at 2 for the old, weak signal, more of this new, strong signal will filter through (=I exogenously attend to it).
There is, as far as I know, no certain answer to what the brain does in these kinds of cases. Much depends on prior learning, and – we suspect – on individual differences. Given a new and unexpected level of imprecision the brain needs to decide whether to recruit higher levels or whether to trust the signal more, or less. Only quite subtle high level and long term (meta-)learning can really give the answers here (e.g., over the long term it may be best in terms of prediction error minimization to explain away uncertainty hierarchically). We have a fair bit of experimental evidence that these kinds of differences play out along the autism spectrum. Well-functioning individuals can differ somewhat in the extent to which they engage the hierarchy when there is unexpected uncertainty, with individuals low on autism traits more likely to do so than individuals high on autism traits (perhaps the former are more likely to be found in diverse, changing environments). But clinical individuals seem to be even more hierarchically restricted, which may be counterproductive. I mention this to highlight that we are doing experiments that speak to all this, and that we should not expect univocal answers to how well individuals approximate Bayes in the long run. (I discuss these issues much throughout later parts of the book).
Fifth question: Where can we locate second-order inferences in Bayes’ theorem?
I address some of this above too. Noise and uncertainty are both integral parts of Bayes, and especially of empirical, hierarchical Bayes. There is no separate noise-‘monitor’, there is just learning of means and precisions all the way up the hierarchy. Changes in precisions are encoded and learned in hierarchical Bayes because Bayes without it won’t work in a world with state-dependent noise (its learning rule is inappropriate). If we compress everything into Bayes’ rule, I think we get that a low precision prediction error will give us a low likelihood. I don’t think the posterior (p(h|e)) comes into it, except of course in the shape of the ‘old’ posterior, which is todays’ prior.
Sixth question: Do you think experience takes the form of a probability distribution, like any other signal does in PEM? If not then what makes it special?
There is probably more in this question than I am able to address here. (There are for example elements here of a nice question that Michael Madary asked of Clark, in Frontiers). One answer is that perception is driven often by precise prediction errors, which dampens down the presence of alternative possible experiences (action serves this kind of function, ‘deepening’ the selected hypothesis at the expense of others; hence without action there should be more of the sea of probability distributions, and there is some evidence of this in binocular rivalry when introspective reports are not required). When there is imprecision, perceptual inference begins to waver and becomes ambiguous (e.g., some phases of the rubber hand illusion). Another answer is that when one hypothesis wins, other hypotheses are ‘explained away’ (this is the classic example from Pearl of rain/sprinkler given wet grass): conditional on the new evidence E, Hi increases its probability and Hj decreases its, even though they were independent before the new evidence came in. This speaks to some kind of lateral reciprocal inhibition in the brain. A further answer is that the brain ‘cleans’ up its own hypotheses in the light of its assessment of noise (there are some comments on this at the end of Ch 5). None of these mechanisms relate to phenomenality, its all just more Bayes. I do speculate about phenomenality later, namely that selection of a content into consciousness happens when there is enough evidence for Hi for active inference to kick in and sensory consequences of Hi be generated, given action (ch 10).
I probably have not answered all the questions fully or satisfactorily here. The questions are really nice to think about however, and allow me to highlight the crucial role of precisions in the PEM framework.