# Tutorial 2: Statistical Inference#

Week 0, Day 5: Probability & Statistics

Content creators: Ulrik Beierholm

Content reviewers: Natalie Schaworonkow, Keith van Antwerp, Anoop Kulkarni, Pooya Pakarian, Hyosub Kim

Production editors: Ethan Cheng, Ella Batty

#Tutorial Objectives

This tutorial builds on Tutorial 1 by explaining how to do inference through inverting the generative process.

By completing the exercises in this tutorial, you should:

• understand what the likelihood function is, and have some intuition of why it is important

• know how to summarise the Gaussian distribution using mean and variance

• know how to maximise a likelihood function

• be able to do simple inference in both classical and Bayesian ways

• (Optional) understand how Bayes Net can be used to model causal relationships

# Setup#

## Install and import feedback gadget#

Hide code cell source
# @title Install and import feedback gadget

!pip3 install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
return DatatopsContentReviewContainer(
"",  # No text prompt
notebook_section,
{
"url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
"name": "neuromatch-precourse",
"user_key": "8zxfvwxw",
},
).render()

feedback_prefix = "W0D5_T2"

# Imports
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
from scipy.stats import norm
from numpy.random import default_rng  # a default random number generator


## Figure settings#

Hide code cell source
# @title Figure settings
import logging
logging.getLogger('matplotlib.font_manager').disabled = True
import ipywidgets as widgets  # interactive display
from ipywidgets import interact, fixed, HBox, Layout, VBox, interactive, Label, interact_manual
%config InlineBackend.figure_format = 'retina'


## Plotting functions#

Hide code cell source
# @title Plotting functions

def plot_hist(data, xlabel, figtitle = None, num_bins = None):
""" Plot the given data as a histogram.

Args:
data (ndarray): array with data to plot as histogram
xlabel (str): label of x-axis
figtitle (str): title of histogram plot (default is no title)
num_bins (int): number of bins for histogram (default is 10)

Returns:
count (ndarray): number of samples in each histogram bin
bins (ndarray): center of each histogram bin
"""
fig, ax = plt.subplots()
ax.set_xlabel(xlabel)
ax.set_ylabel('Count')
if num_bins is not None:
count, bins, _ = plt.hist(data, max(data), bins=num_bins)
else:
count, bins, _ = plt.hist(data, max(data))  # 10 bins default
if figtitle is not None:
fig.suptitle(figtitle, size=16)
plt.show()
return count, bins

def plot_gaussian_samples_true(samples, xspace, mu, sigma, xlabel, ylabel):
""" Plot a histogram of the data samples on the same plot as the gaussian
distribution specified by the give mu and sigma values.

Args:
samples (ndarray): data samples for gaussian distribution
xspace (ndarray): x values to sample from normal distribution
mu (scalar): mean parameter of normal distribution
sigma (scalar): variance parameter of normal distribution
xlabel (str): the label of the x-axis of the histogram
ylabel (str): the label of the y-axis of the histogram

Returns:
Nothing.
"""
fig, ax = plt.subplots()
ax.set_xlabel(xlabel)
ax.set_ylabel(ylabel)
# num_samples = samples.shape[0]

count, bins, _ = plt.hist(samples, density=True)  # probability density function

plt.plot(xspace, norm.pdf(xspace, mu, sigma), 'r-')
plt.show()

def plot_likelihoods(likelihoods, mean_vals, variance_vals):
""" Plot the likelihood values on a heatmap plot where the x and y axes match
the mean and variance parameter values the likelihoods were computed for.

Args:
likelihoods (ndarray): array of computed likelihood values
mean_vals (ndarray): array of mean parameter values for which the
likelihood was computed
variance_vals (ndarray): array of variance parameter values for which the
likelihood was computed

Returns:
Nothing.
"""
fig, ax = plt.subplots()
im = ax.imshow(likelihoods)

cbar = ax.figure.colorbar(im, ax=ax)
cbar.ax.set_ylabel('log likelihood', rotation=-90, va="bottom")

ax.set_xticks(np.arange(len(mean_vals)))
ax.set_yticks(np.arange(len(variance_vals)))
ax.set_xticklabels(mean_vals)
ax.set_yticklabels(variance_vals)
ax.set_xlabel('Mean')
ax.set_ylabel('Variance')
plt.show()

def posterior_plot(x, likelihood=None, prior=None,
posterior_pointwise=None, ax=None):
"""
Plots normalized Gaussian distributions and posterior.

Args:
x (numpy array of floats):         points at which the likelihood has been evaluated
auditory (numpy array of floats):  normalized probabilities for auditory likelihood evaluated at each x
visual (numpy array of floats):    normalized probabilities for visual likelihood evaluated at each x
posterior (numpy array of floats): normalized probabilities for the posterior evaluated at each x
ax: Axis in which to plot. If None, create new axis.

Returns:
Nothing.
"""
if likelihood is None:
likelihood = np.zeros_like(x)

if prior is None:
prior = np.zeros_like(x)

if posterior_pointwise is None:
posterior_pointwise = np.zeros_like(x)

if ax is None:
fig, ax = plt.subplots()

ax.plot(x, likelihood, '-C1', linewidth=2, label='Auditory')
ax.plot(x, prior, '-C0', linewidth=2, label='Visual')
ax.plot(x, posterior_pointwise, '-C2', linewidth=2, label='Posterior')
ax.legend()
ax.set_ylabel('Probability')
ax.set_xlabel('Orientation (Degrees)')
plt.show()

return ax

def plot_classical_vs_bayesian_normal(num_points, mu_classic, var_classic,
mu_bayes, var_bayes):
""" Helper function to plot optimal normal distribution parameters for varying
observed sample sizes using both classic and Bayesian inference methods.

Args:
num_points (int): max observed sample size to perform inference with
mu_classic (ndarray): estimated mean parameter for each observed sample size
using classic inference method
var_classic (ndarray): estimated variance parameter for each observed sample size
using classic inference method
mu_bayes (ndarray): estimated mean parameter for each observed sample size
using Bayesian inference method
var_bayes (ndarray): estimated variance parameter for each observed sample size
using Bayesian inference method

Returns:
Nothing.
"""
xspace = np.linspace(0, num_points, num_points)
fig, ax = plt.subplots()
ax.set_xlabel('n data points')
ax.set_ylabel('mu')
plt.plot(xspace, mu_classic,'r-', label="Classical")
plt.plot(xspace, mu_bayes,'b-', label="Bayes")
plt.legend()
plt.show()

fig, ax = plt.subplots()
ax.set_xlabel('n data points')
ax.set_ylabel('sigma^2')
plt.plot(xspace, var_classic,'r-', label="Classical")
plt.plot(xspace, var_bayes,'b-', label="Bayes")
plt.legend()
plt.show()


# Section 1: Basic probability#

## Section 1.1: Basic probability theory#

### Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Basic_Probability_Video")


This video covers basic probability theory, including complementary probability, conditional probability, joint probability, and marginalisation.

Click here for text recap of video

Previously we were only looking at sampling or properties of a single variables, but as we will now move on to statistical inference, it is useful to go over basic probability theory.

As a reminder, probability has to be in the range 0 to 1 $$P(A) \in [0,1]$$

and the complementary can always be defined as

$$P(\neg A) = 1-P(A)$$

When we have two variables, the conditional probability of $$A$$ given $$B$$ is

$$P (A|B) = P (A \cap B)/P (B)=P (A, B)/P (B)$$

while the joint probability of $$A$$ and $$B$$ is

$$P(A \cap B)=P(A,B) = P(B|A)P(A) = P(A|B)P(B)$$

We can then also define the process of marginalisation (for discrete variables) as

$$P(A)=\sum P(A,B)=\sum P(A|B)P(B)$$

where the summation is over the possible values of $$B$$.

As an example if $$B$$ is a binary variable that can take values $$B+$$ or $$B0$$ then $$P(A)=\sum P(A,B)=P(A|B+)P(B+)+ P(A|B0)P(B0)$$.

For continuous variables marginalization is given as $$P(A)=\int P(A,B) dB=\int P(A|B)P(B) dB$$

### Math Exercise 1.1: Probability example#

To remind ourselves of how to use basic probability theory we will do a short exercise (no coding needed!), based on measurement of binary probabilistic neural responses. As shown by Hubel and Wiesel in 1959 there are neurons in primary visual cortex that respond to different orientations of visual stimuli, with different neurons being sensitive to different orientations. The numbers in the following are however purely fictional.

Imagine that your collaborator tells you that they have recorded the activity of visual neurons while presenting either a horizontal or vertical grid as a visual stimulus. The activity of the neurons is measured as binary: they are either active or inactive in response to the stimulus.

After recording from a large number of neurons they find that when presenting a horizontal grid, on average 40% of neurons are active, while 30% respond to vertical grids.

We will use the following notation to indicate the probability that a randomly chosen neuron responds to horizontal grids

(105)#$$$P(h_+)=0.4$$$

and this to show the probability it responds to vertical:

(106)#$$$P(v_+)=0.3$$$

We can find the complementary event, that the neuron does not respond to the horizontal grid, using the fact that these events must add up to 1. We see that the probability the neuron does not respond to the horizontal grid ($$h_0$$) is

(107)#$$$P(h_0)=1-P(h_+)=0.6$$$

and that the probability to not respond to vertical is

(108)#$$$P(v_0)=1-P(v_+)=0.7$$$

We will practice computing various probabilities in this framework.

### A) Product#

Assuming that the horizontal and vertical orientation selectivity are independent, what is the probability that a randomly chosen neuron is sensitive to both horizontal and vertical orientations?

Hint: Two events are independent if the outcome of one does not affect the outcome of the other.

Click for solution

### B) Joint probability generally#

A collaborator informs you that actually these are not independent. Of those neurons that respond to vertical, only 10 percent also respond to horizontal, i.e. the probability of responding to horizonal given that it responds to vertical is $$P(h+|v+)=0.1$$

Given this new information, what is now the probability that a randomly chosen neuron is sensitive to both horizontal and vertical orientations?

Click for solution

### C) Conditional probability#

You start measuring from a neuron and find that it responds to horizontal orientations. What is now the probability that it also responds to vertical, i.e., $$P(v_+|h_+)$$)?

Click for solution

### D) Marginal probability#

Lastly, let’s check that everything has been done correctly. Given our knowledge about the conditional probabilities, we should be able to use marginalisation to recover the marginal probability of a random neuron responding to vertical orientations, i.e.,$$P(v_+)$$. We know from above that this should equal 0.3.

Calculate $$P(v_+)$$ based on the conditional probabilities for $$P(v_+|h_+)$$ and $$P(v_+|h_0)$$ (the latter which you will need to calculate).

Click for solution

#### Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Probability_Example_Main_Exercise")


## Section 1.2: Markov chains#

### Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Markov_Chains_Video")


### Coding exercise 1.2 Markov chains#

We will practice more probability theory by looking at Markov chains. The Markov property specifies that you can fully encapsulate the important properties of a system based on its current state at the current time, any previous history does not matter. It is memoryless.

As an example imagine that a rat is able to move freely between 3 areas: a dark rest area ($$state=1$$), a nesting area ($$state=2$$) and a bright area for collecting food ($$state=3$$). Every 5 minutes (timepoint $$i$$) we record the rat’s location. We can use a categorical distribution to look at the probability that the rat moves to one state from another.

The table below shows the probability of the rat transitioning from one area to another between timepoints ($$state_i$$ to $$state_{i+1}$$).

(109)#\[\begin{matrix} \hline state_{i} &P(state_{i+1}=1|state_i=*) &P(state_{i