\n", " \n", "

\n", "\n", "See [bonus section 1](#b1) for in-depth instructions for how to code up such a network in PyTorch. For now, however, we'll leave these details aside and focus on training this network, called `CNN`, and analyzing its internal representations.\n", "\n", "Run the next cell to train such a network to solve this task. After initializing our CNN model, it builds a dataset of oriented grating stimuli to use for training it. These are then passed into a function called `train()` that uses SGD to optimize the model's parameters, taking similar arguments as the `train()` function we wrote in Tutorial 1.\n", "\n", "Note that it may take ~30 seconds for the training to complete." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "help(train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# Set random seeds for reproducibility\n", "np.random.seed(12)\n", "torch.manual_seed(12)\n", "\n", "# Initialize CNN model\n", "net = CNN(h, w)\n", "\n", "# Build training set to train it on\n", "n_train = 1000 # size of training set\n", "\n", "# sample n_train random orientations between -90 and +90 degrees\n", "ori = (np.random.rand(n_train) - 0.5) * 180\n", "\n", "# build orientated grating stimuli\n", "stimuli = torch.stack([grating(i) for i in ori])\n", "\n", "# stimulus tilt: 1. if tilted right, 0. if tilted left, as a column vector\n", "tilt = torch.tensor(ori > 0).type(torch.float).unsqueeze(-1)\n", "\n", "# Train model\n", "train(net, stimuli, tilt)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.3: Load data\n", "\n", "*Estimated timing to here from start of tutorial: 15 min*\n", "\n", "In the next cell, we provide code for loading in some data from [Stringer _et al._, 2021](https://doi.org/10.1016/j.cell.2021.03.042), which contains the responses of about ~20,000 neurons in the mouse primary visual cortex to grating stimuli like those used to train our network (this is the same data used in Tutorial 1). These data are stored in two variables:\n", "* `resp_v1` is a matrix where each row contains the responses of all neurons to a single stimulus.\n", "* `ori` is a vector with the orientations of each stimulus, in degrees. As in the above convention, negative angles denote stimuli tilted to the left and positive angles denote stimuli tilted to the right.\n", "\n", "We will then extract our deep CNN model's representations of these same stimuli (i.e. oriented gratings with the orientations in `ori`). We will run the same stimuli through our CNN model and use the helper function `get_hidden_activity()` to store the model's internal representations. The output of this function is a Python `dict`, which contains a matrix of population responses (just like `resp_v1`) for each layer of the network specified by the `layer_labels` argument. We'll focus on looking at the representations in\n", "* the output of the first convolutional layer, stored in the model as `'pool'` (see Bonus Section 1 for the details of the CNN architecture to understand why it's called this way)\n", "* the 10-dimensional output of the fully connected layer, stored in the model as `'fc'`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "both", "execution": {} }, "outputs": [], "source": [ "# Load mouse V1 data\n", "resp_v1, ori = load_data()\n", "\n", "# Extract model internal representations of each stimulus in the V1 data\n", "# construct grating stimuli for each orientation presented in the V1 data\n", "stimuli = torch.stack([grating(a.item()) for a in ori])\n", "layer_labels = ['pool', 'fc']\n", "resp_model = get_hidden_activity(net, stimuli, layer_labels)\n", "\n", "# Aggregate all responses into one dict\n", "resp_dict = {}\n", "resp_dict['V1 data'] = resp_v1\n", "for k, v in resp_model.items():\n", " label = f\"model\\n'{k}' layer\"\n", " resp_dict[label] = v" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 2: Quantitative comparisons of CNNs and neural activity\n", "\n", "*Estimated timing to here from start of tutorial: 20 min*\n", "\n", "Let's now analyze the internal representations of our deep CNN model of orientation discrimination and compare them to population responses in the mouse primary visual cortex. \n", "\n", "In this section, we'll try to quantitatively compare CNN and primary visual cortex representations. In the next section, we will visualize their representations and get some intuition for their structure.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 2: Quantitative comparisons of CNNs and neural activity\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 2: Quantitative comparisons of CNNs and neural activity\n", "from ipywidgets import widgets\n", "\n", "out2 = widgets.Output()\n", "with out2:\n", " from IPython.display import IFrame\n", " class BiliVideo(IFrame):\n", " def __init__(self, id, page=1, width=400, height=300, **kwargs):\n", " self.id=id\n", " src = 'https://player.bilibili.com/player.html?bvid={0}&page={1}'.format(id, page)\n", " super(BiliVideo, self).__init__(src, width, height, **kwargs)\n", "\n", " video = BiliVideo(id=\"BV1KT4y1j7nn\", width=730, height=410, fs=1)\n", " print('Video available at https://www.bilibili.com/video/{0}'.format(video.id))\n", " display(video)\n", "\n", "out1 = widgets.Output()\n", "with out1:\n", " from IPython.display import YouTubeVideo\n", " video = YouTubeVideo(id=\"2Jbk7jFBvbU\", width=730, height=410, fs=1, rel=0)\n", " print('Video available at https://youtube.com/watch?v=' + video.id)\n", " display(video)\n", "\n", "out = widgets.Tab([out1, out2])\n", "out.set_title(0, 'Youtube')\n", "out.set_title(1, 'Bilibili')\n", "\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We noticed above some similarities and differences between the population responses in the mouse primary visual cortex and in different layers in our model. Let's now try to quantify this.\n", "\n", "To do this, we'll use a technique called [**Representational Similarity Analysis**](https://www.frontiersin.org/articles/10.3389/neuro.06.004.2008/full?utm_source=FWEB&utm_medium=NBLOG&utm_campaign=ECO_10YA_top-research). The idea is to look at the similarity structure between representations of different stimuli. We can say that a brain area and a model use a similar representational scheme if stimuli that are represented (dis)similarly in the brain are represented (dis)similarly in the model as well.\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.1: Representational dissimilarity matrix (RDM)\n", "\n", "To quantify this, we begin by computing the **representational dissimilarity matrix (RDM)** for the mouse V1 data and each model layer. This matrix, which we'll call $\\mathbf{M}$, is computed as one minus the correlation coefficients between population responses to each stimulus. We can efficiently compute this by using the $z$-scored responses. \n", "\n", "The $z$-scored response of all neurons $\\mathbf{r}$ to stimulus $s$ is the response mean-subtracted across neurons $i$ and normalized to standard deviation 1 across neurons $i$ where $N$ is the total number of neurons:\n", "\n", "\\begin{equation}\n", "\\mathbf{z}^{(s)} = \\frac{\\mathbf{r}^{(s)} - \\mu^{(s)}}\n", "{\\sigma^{(s)}}\n", "\\end{equation}\n", "\n", "where $\\mu^{(s)} = \\frac{1}{N}\\sum_{i=1}^N r_i^{(s)}$ and \n", "$\\sigma^{(s)} = \\sqrt{\\frac{1}{N}\\sum_{i=1}^N \\left( r_i^{(s)} - \\mu^{(s)} \\right)^2}$.\n", "\n", "Then the full matrix can be computed as:\n", "\n", "\\begin{equation}\n", "\\mathbf{M} = 1 - \\frac{1}{N} \\mathbf{ZZ}^\\top \\\\\n", "\\end{equation}\n", "\n", "where $\\mathbf{Z}$ is the z-scored response matrix with rows $\\mathbf{r}^{(s)}$ and N is the number of neurons (or units). See [bonus section 3](#b3) for full explanation." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 2.1: Compute RDMs\n", "\n", "Complete the function `RDM()` for computing the RDM for a given set of population responses to each stimulus. Use the above formula in terms of $z$-scored population responses. We will use the helper function `zscore()` to compute the matrix of $z$-scored responses.\n", "\n", "The subsequent cell uses this function to plot the RDM of the population responses in the V1 data and in each layer of our model CNN.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def RDM(resp):\n", " \"\"\"Compute the representational dissimilarity matrix (RDM)\n", "\n", " Args:\n", " resp (ndarray): S x N matrix with population responses to\n", " each stimulus in each row\n", "\n", " Returns:\n", " ndarray: S x S representational dissimilarity matrix\n", " \"\"\"\n", " #########################################################\n", " ## TO DO for students: compute representational dissimilarity matrix\n", " # Fill out function and remove\n", " raise NotImplementedError(\"Student exercise: complete function RDM\")\n", " #########################################################\n", "\n", " # z-score responses to each stimulus\n", " zresp = zscore(resp, axis=1)\n", "\n", " # Compute RDM\n", " RDM = ...\n", "\n", " return RDM\n", "\n", "\n", "# Compute RDMs for each layer\n", "rdm_dict = {label: RDM(resp) for label, resp in resp_dict.items()}\n", "\n", "# Plot RDMs\n", "plot_multiple_rdm(rdm_dict)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main//tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial3_Solution_ed074d46.py)\n", "\n", "*Example output:*\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Video 3: Coding Exercise 2.1 solution discussion\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 3: Coding Exercise 2.1 solution discussion\n", "from ipywidgets import widgets\n", "\n", "out2 = widgets.Output()\n", "with out2:\n", " from IPython.display import IFrame\n", " class BiliVideo(IFrame):\n", " def __init__(self, id, page=1, width=400, height=300, **kwargs):\n", " self.id=id\n", " src = 'https://player.bilibili.com/player.html?bvid={0}&page={1}'.format(id, page)\n", " super(BiliVideo, self).__init__(src, width, height, **kwargs)\n", "\n", " video = BiliVideo(id=\"BV16a4y1a7nc\", width=730, height=410, fs=1)\n", " print('Video available at https://www.bilibili.com/video/{0}'.format(video.id))\n", " display(video)\n", "\n", "out1 = widgets.Output()\n", "with out1:\n", " from IPython.display import YouTubeVideo\n", " video = YouTubeVideo(id=\"otzR-KXDjus\", width=730, height=410, fs=1, rel=0)\n", " print('Video available at https://youtube.com/watch?v=' + video.id)\n", " display(video)\n", "\n", "out = widgets.Tab([out1, out2])\n", "out.set_title(0, 'Youtube')\n", "out.set_title(1, 'Bilibili')\n", "\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.2: Determining representation similarity\n", "\n", "*Estimated timing to here from start of tutorial: 35 min*\n", "\n", "To quantify how similar the representations are, we can simply correlate their dissimilarity matrices. For this, we'll again use the correlation coefficient. Note that dissimilarity matrices are symmetric ($M_{ss'} = M_{s's}$), so we should only use the off-diagonal terms on one side of the diagonal when computing this correlation to avoid overcounting. Moreover, we should leave out the diagonal terms, which are always equal to 0, so will always be perfectly correlated across any pair of RDM's.\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 2.2: Correlate RDMs\n", "\n", "Complete the function `correlate_rdms()` below that computes this correlation. The code for extracting the off-diagonal terms is provided.\n", "\n", "We will then use a function to compute the correlation between the RDM's for each layer of our model CNN and that of the V1 data. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def correlate_rdms(rdm1, rdm2):\n", " \"\"\"Correlate off-diagonal elements of two RDM's\n", "\n", " Args:\n", " rdm1 (np.ndarray): S x S representational dissimilarity matrix\n", " rdm2 (np.ndarray): S x S representational dissimilarity matrix to\n", " correlate with rdm1\n", "\n", " Returns:\n", " float: correlation coefficient between the off-diagonal elements\n", " of rdm1 and rdm2\n", "\n", " \"\"\"\n", "\n", " # Extract off-diagonal elements of each RDM\n", " ioffdiag = np.triu_indices(rdm1.shape[0], k=1) # indices of off-diagonal elements\n", " rdm1_offdiag = rdm1[ioffdiag]\n", " rdm2_offdiag = rdm2[ioffdiag]\n", "\n", " #########################################################\n", " ## TO DO for students: compute correlation coefficient\n", " # Fill out function and remove\n", " raise NotImplementedError(\"Student exercise: complete correlate rdms\")\n", " #########################################################\n", " corr_coef = np.corrcoef(..., ...)[0,1]\n", "\n", " return corr_coef\n", "\n", "\n", "# Split RDMs into V1 responses and model responses\n", "rdm_model = rdm_dict.copy()\n", "rdm_v1 = rdm_model.pop('V1 data')\n", "\n", "# Correlate off-diagonal terms of dissimilarity matrices\n", "rdm_sim = {label: correlate_rdms(rdm_v1, rdm) for label, rdm in rdm_model.items()}\n", "\n", "# Visualize\n", "plot_rdm_rdm_correlations(rdm_sim)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main//tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial3_Solution_ce2f98e6.py)\n", "\n", "*Example output:*\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "According to this metric, which layer's representations most resemble those in the data? Does this agree with your intuitions from coding exercise 2.1?" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.3: Further understanding RDMs\n", "\n", "*Estimated timing to here from start of tutorial: 45 min*\n", "\n", "To better understand how these correlations in RDM's arise, we can try plotting individual rows of the RDM matrix. The resulting curves show the similarity of the responses to each stimulus with that to one specific stimulus." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "ori_list = [-75, -25, 25, 75]\n", "plot_rdm_rows(ori_list, rdm_dict, ori)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 3: Qualitative comparisons of CNNs and neural activity\n", "\n", "To visualize the representations in the data and in each of these model layers, we'll use two classic techniques from systems neuroscience:\n", "\n", "1. **tuning curves**: plotting the response of single neurons (or units, in the case of the deep network) as a function of the stimulus orientation\n", "\n", "2. **dimensionality reduction**: plotting full population responses to each stimulus in two dimensions via dimensionality reduction. We'll use the non-linear dimensionality reduction technique t-SNE for this. We use dimensionality reduction because there are many units and it's difficult to visualize all of them at once. We use a non-linear dimensionality reduction technique because it can capture complex relationships between stimuli (see W1D5 for more details)." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 3.1: Tuning Curves\n", "\n", "*Estimated timing to here from start of tutorial: 50 min*" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "\n", "Below, we show some example tuning curves for different neurons and units in the CNN we trained above. How are the single neuron responses similar/different between the model and the data? Try running this cell multiple times to get an idea of shared properties in the tuning curves of the neurons within each population.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to visualize tuning curves\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "#@title\n", "#@markdown Execute this cell to visualize tuning curves\n", "\n", "fig, axs = plt.subplots(1, len(resp_dict), figsize=(len(resp_dict) * 6, 6))\n", "\n", "for i, (label, resp) in enumerate(resp_dict.items()):\n", "\n", " ax = axs[i]\n", " ax.set_title('%s responses' % label)\n", "\n", " # Pick three random neurons whose tuning curves to plot\n", " ineurons = np.random.choice(resp.shape[1], 3, replace=False)\n", "\n", " # Plot tuning curves of ineurons\n", " ax.plot(ori, resp[:, ineurons])\n", "\n", " ax.set_xticks(np.linspace(-90, 90, 5))\n", " ax.set_xlabel('stimulus orientation')\n", " ax.set_ylabel('neural response')\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 3.2: Dimensionality reduction of representations\n", "\n", "*Estimated timing to here from start of tutorial: 55 min*\n", "\n", "We can visualize a dimensionality-reduced version of the internal representations of the mouse primary visual cortex or CNN internal representations in order to potentially uncover informative structure. Here, we use PCA to reduce the dimensionality to 20 dimensions, and then use tSNE to further reduce dimensionality to 2 dimensions. We use the first step of PCA so that tSNE runs faster (this is standard practice in the field)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to visualize low-d representations\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to visualize low-d representations\n", "def plot_resp_lowd(resp_dict):\n", " \"\"\"Plot a low-dimensional representation of each dataset in resp_dict.\"\"\"\n", " n_col = len(resp_dict)\n", " fig, axs = plt.subplots(1, n_col, figsize=(4.5 * len(resp_dict), 4.5))\n", " for i, (label, resp) in enumerate(resp_dict.items()):\n", "\n", " ax = axs[i]\n", " ax.set_title('%s responses' % label)\n", "\n", " # First do PCA to reduce dimensionality to 20 dimensions so that tSNE is faster\n", " resp_lowd = PCA(n_components=min(20, resp.shape[1]), random_state=0).fit_transform(resp)\n", "\n", " # Then do tSNE to reduce dimensionality to 2 dimensions\n", " resp_lowd = TSNE(n_components=2, random_state=0).fit_transform(resp_lowd)\n", "\n", " # Plot dimensionality-reduced population responses 'resp_lowd'\n", " # on 2D axes, with each point colored by stimulus orientation\n", " x, y = resp_lowd[:, 0], resp_lowd[:, 1]\n", " pts = ax.scatter(x, y, c=ori, cmap='twilight', vmin=-90, vmax=90)\n", " fig.colorbar(pts, ax=ax, ticks=np.linspace(-90, 90, 5), label='Stimulus orientation')\n", "\n", " ax.set_xlabel('Dimension 1')\n", " ax.set_ylabel('Dimension 2')\n", " ax.set_xticks([])\n", " ax.set_yticks([])\n", "\n", "with plt.xkcd():\n", " plot_resp_lowd(resp_dict)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Think! 3.2: Visualizing reduced dimensionality representations\n", "\n", "Interpret the figure above. Why do these representations look the way they do? Here are a few specific questions to think about:\n", " * How are the population responses similar/different between the model and the data? Can you explain these population-level responses from the single neuron responses seen in the previous exercise, or vice-versa?\n", " * How do the representations in the different layers of the model differ, and how does this relate to the orientation discrimination task the model was optimized for?\n", " * Which layer of our deep network encoding model most closely resembles the V1 data?\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main//tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial3_Solution_ff17ff9e.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Summary\n", "\n", "*Estimated timing of tutorial: 1 hour, 10 minutes*\n", "\n", "In this notebook, we learned \n", "* how to use deep learning to build a normative encoding model of the visual system\n", "* how to use RSA to evaluate how the model's representations match to those in the brain\n", "\n", "Our approach was to optimize a deep convolutional network to solve an orientation discrimination task. But note that many other approaches could have been taken.\n", "\n", "Firstly, there are many other \"normative\" ways to solve this orientation discrimination task. We could have used different neural network architectures, or even used a completely different algorithm that didn't involve a neural network at all, but instead used other kinds of image transformations (e.g. Fourier transforms). Neural network approaches, however, are special in that they explicitly use abstract distributed representations to compute, which feels a lot closer to the kinds of algorithms the brain uses. *Convolutional* neural networks in particular are well-suited for building normative models of the visual system.\n", "\n", "Secondly, our choice of visual task was mostly arbitrary. For example, we could have trained our network to directly estimate the orientation of the stimulus, rather than just discriminating between two classes of tilt. Or, we could have trained the network to perform a more naturalistic task, such as recognizing the rotation of an arbitrary image. Or we could try a task like object recognition. Is this something that mice compute in their visual cortex?\n", "\n", "Training on different tasks could lead to different representations of the oriented grating stimuli, which might match the observed V1 representations better or worse.\n", "\n", "**See Section 3 in the Bonus Tutorial to fit a convolutional neural network directly to neural activity to build an encoding model.**" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Bonus" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "\n", "## Bonus Section 1: Building CNN's with PyTorch\n", "\n", "Here we walk through building the different types of layers in a CNN using PyTorch, culminating in the CNN model used above." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Bonus Section 1.1: Fully connected layers\n", "\n", "In a fully connected layer, each unit computes a weighted sum over all the input units and applies a non-linear function to this weighted sum. You have used such layers many times already in parts 1 and 2. As you have already seen, these are implemented in PyTorch using the `nn.Linear` class.\n", "\n", " See the next cell for code for constructing a deep network with one fully connected layer that will classify an input image as being tilted left or right. Specifically, its output is the predicted probability of the input image being tilted right. To ensure that its output is a probability (i.e. a number between 0 and 1), we use a sigmoid activation function to squash the output into this range (implemented with `torch.sigmoid()`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class FC(nn.Module):\n", " \"\"\"Deep network with one fully connected layer\n", "\n", " Args:\n", " h_in (int): height of input image, in pixels (i.e. number of rows)\n", " w_in (int): width of input image, in pixels (i.e. number of columns)\n", "\n", " Attributes:\n", " fc (nn.Linear): weights and biases of fully connected layer\n", " out (nn.Linear): weights and biases of output layer\n", "\n", " \"\"\"\n", "\n", " def __init__(self, h_in, w_in):\n", " super().__init__()\n", " self.dims = h_in * w_in # dimensions of flattened input\n", " self.fc = nn.Linear(self.dims, 10) # flattened input image --> 10D representation\n", " self.out = nn.Linear(10, 1) # 10D representation --> scalar\n", "\n", " def forward(self, x):\n", " \"\"\"Classify grating stimulus as tilted right or left\n", "\n", " Args:\n", " x (torch.Tensor): p x 48 x 64 tensor with pixel grayscale values for\n", " each of p stimulus images.\n", "\n", " Returns:\n", " torch.Tensor: p x 1 tensor with network outputs for each input provided\n", " in x. Each output should be interpreted as the probability of the\n", " corresponding stimulus being tilted right.\n", "\n", " \"\"\"\n", " x = x.view(-1, self.dims) # flatten each input image into a vector\n", " x = torch.relu(self.fc(x)) # output of fully connected layer\n", " x = torch.sigmoid(self.out(x)) # network output\n", " return x" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Bonus Section 1.2: Convolutional layers\n", "\n", "In a convolutional layer, each unit computes a weighted sum over a two-dimensional $K \\times K$ patch of inputs. As we saw in part 2, the units are arranged in **channels** (see figure below), whereby units in the same channel compute the same weighted sum over different parts of the input, using the weights of that channel's **convolutional filter (or kernel)**. The output of a convolutional layer is thus a three-dimensional tensor of shape $C^{out} \\times H \\times W$, where $C^{out}$ is the number of channels (i.e. the number of convolutional filters/kernels), and $H$ and $W$ are the height and width of the input.\n", "\n", "\n", " \n", "

\n", "\n", "Such layers can be implemented in Python using the PyTorch class `nn.Conv2d as we saw in Tutorial 2 (documentation [here](https://pytorch.org/docs/master/generated/torch.nn.Conv2d.html)).\n", " \n", "See the next cell for code incorporating a convolutional layer with 8 convolutional filters of size 5 $\\times$ 5 into our above fully connected network. Note that we have to flatten the multi-channel output in order to pass it on to the fully connected layer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class ConvFC(nn.Module):\n", " \"\"\"Deep network with one convolutional layer and one fully connected layer\n", "\n", " Args:\n", " h_in (int): height of input image, in pixels (i.e. number of rows)\n", " w_in (int): width of input image, in pixels (i.e. number of columns)\n", "\n", " Attributes:\n", " conv (nn.Conv2d): filter weights of convolutional layer\n", " dims (tuple of ints): dimensions of output from conv layer\n", " fc (nn.Linear): weights and biases of fully connected layer\n", " out (nn.Linear): weights and biases of output layer\n", "\n", " \"\"\"\n", "\n", " def __init__(self, h_in, w_in):\n", " super().__init__()\n", " C_in = 1 # input stimuli have only 1 input channel\n", " C_out = 6 # number of output channels (i.e. of convolutional kernels to convolve the input with)\n", " K = 7 # size of each convolutional kernel (should be odd number for the padding to work as expected)\n", " self.conv = nn.Conv2d(C_in, C_out, kernel_size=K, padding=K//2) # add padding to ensure that each channel has same dimensionality as input\n", " self.dims = (C_out, h_in, w_in) # dimensions of conv layer output\n", " self.fc = nn.Linear(np.prod(self.dims), 10) # flattened conv output --> 10D representation\n", " self.out = nn.Linear(10, 1) # 10D representation --> scalar\n", "\n", " def forward(self, x):\n", " \"\"\"Classify grating stimulus as tilted right or left\n", "\n", " Args:\n", " x (torch.Tensor): p x 48 x 64 tensor with pixel grayscale values for\n", " each of p stimulus images.\n", "\n", " Returns:\n", " torch.Tensor: p x 1 tensor with network outputs for each input provided\n", " in x. Each output should be interpreted as the probability of the\n", " corresponding stimulus being tilted right.\n", "\n", " \"\"\"\n", " x = x.unsqueeze(1) # p x 1 x 48 x 64, add a singleton dimension for the single stimulus channel\n", " x = torch.relu(self.conv(x)) # output of convolutional layer\n", " x = x.view(-1, np.prod(self.dims)) # flatten convolutional layer outputs into a vector\n", " x = torch.relu(self.fc(x)) # output of fully connected layer\n", " x = torch.sigmoid(self.out(x)) # network output\n", " return x" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Bonus Section 1.3: Max pooling layers\n", "\n", "In a max pooling layer, each unit computes the maximum over a small two-dimensional $K^{pool} \\times K^{pool}$ patch of inputs. Given a multi-channel input of dimensions $C \\times H \\times W$, the output of a max pooling layer has dimensions $C \\times H^{out} \\times W^{out}$, where:\n", "\n", "\\begin{align}\n", "H^{out} &= \\left\\lfloor \\frac{H}{K^{pool}} \\right\\rfloor\\\\\n", "W^{out} &= \\left\\lfloor \\frac{W}{K^{pool}} \\right\\rfloor\n", "\\end{align}\n", "\n", "where $\\lfloor\\cdot\\rfloor$ denotes rounding down to the nearest integer below (i.e., floor division `//` in Python).\n", "\n", "Max pooling layers can be implemented with the PyTorch `nn.MaxPool2d` class, which takes as a single argument the size $K^{pool}$ of the pooling patch. See the next cell for an example, which builds upon the previous example by adding in a max pooling layer just after the convolutional layer. Note again that we need to calculate the dimensions of its output in order to set the dimensions of the subsequent fully connected layer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class ConvPoolFC(nn.Module):\n", " \"\"\"Deep network with one convolutional layer followed by a max pooling layer\n", " and one fully connected layer\n", "\n", " Args:\n", " h_in (int): height of input image, in pixels (i.e. number of rows)\n", " w_in (int): width of input image, in pixels (i.e. number of columns)\n", "\n", " Attributes:\n", " conv (nn.Conv2d): filter weights of convolutional layer\n", " pool (nn.MaxPool2d): max pooling layer\n", " dims (tuple of ints): dimensions of output from pool layer\n", " fc (nn.Linear): weights and biases of fully connected layer\n", " out (nn.Linear): weights and biases of output layer\n", "\n", " \"\"\"\n", "\n", " def __init__(self, h_in, w_in):\n", " super().__init__()\n", " C_in = 1 # input stimuli have only 1 input channel\n", " C_out = 6 # number of output channels (i.e. of convolutional kernels to convolve the input with)\n", " K = 7 # size of each convolutional kernel\n", " Kpool = 8 # size of patches over which to pool\n", " self.conv = nn.Conv2d(C_in, C_out, kernel_size=K, padding=K//2) # add padding to ensure that each channel has same dimensionality as input\n", " self.pool = nn.MaxPool2d(Kpool)\n", " self.dims = (C_out, h_in // Kpool, w_in // Kpool) # dimensions of pool layer output\n", " self.fc = nn.Linear(np.prod(self.dims), 10) # flattened pool output --> 10D representation\n", " self.out = nn.Linear(10, 1) # 10D representation --> scalar\n", "\n", " def forward(self, x):\n", " \"\"\"Classify grating stimulus as tilted right or left\n", "\n", " Args:\n", " x (torch.Tensor): p x 48 x 64 tensor with pixel grayscale values for\n", " each of p stimulus images.\n", "\n", " Returns:\n", " torch.Tensor: p x 1 tensor with network outputs for each input provided\n", " in x. Each output should be interpreted as the probability of the\n", " corresponding stimulus being tilted right.\n", "\n", " \"\"\"\n", " x = x.unsqueeze(1) # p x 1 x 48 x 64, add a singleton dimension for the single stimulus channel\n", " x = torch.relu(self.conv(x)) # output of convolutional layer\n", " x = self.pool(x) # output of pooling layer\n", " x = x.view(-1, np.prod(self.dims)) # flatten pooling layer outputs into a vector\n", " x = torch.relu(self.fc(x)) # output of fully connected layer\n", " x = torch.sigmoid(self.out(x)) # network output\n", " return x" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "This pooling layer completes the CNN model trained above to perform orientation discrimination. We can think of this architecture as having two primary layers:\n", "1. a convolutional + pooling layer\n", "2. a fully connected layer\n", "\n", "We group together the convolution and pooling layers into one, as they really form one full unit of convolutional processing, where each patch of the image is passed through a convolutional filter and pooled with neighboring patches. It is standard practice to follow up any convolutional layer with a pooling layer, so they are generally treated as a single block of processing." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "\n", "## Bonus Section 2: Orientation discrimination as a binary classification problem\n", "\n", "What loss function should we minimize to optimize orientation discrimination performance? We first note that the orientation discrimination task is a **binary classification problem**, where the goal is to classify a given stimulus into one of two classes: being tilted left or being tilted right. \n", "\n", "Our goal is thus to output a high probability of the stimulus being tilted right (i.e. large $p$) whenever the stimulus is tilted right, and a high probability of the stimulus being tilted left (i.e. large $1-p \\Leftrightarrow$ small $p$) whenever the stimulus is tilted left.\n", "\n", "Let $\\tilde{y}^{(n)}$ be the label of the $n$th stimulus in the mini-batch, indicating its true tilt:\n", "\n", "\\begin{equation}\n", "\\tilde{y}^{(n)} =\n", "\\begin{cases}\n", "1 &\\text{if stimulus }n\\text{ is tilted right} \\\\\n", "0 &\\text{if stimulus }n\\text{ is tilted left}\n", "\\end{cases}\n", "\\end{equation}\n", "\n", "Let $p^{(n)}$ be the predicted probability of that stimulus being tilted right assigned by our network. Note that that $1-p^{(n)}$ is the predicted probability of that stimulus being tilted left. We'd now like to modify the parameters so as to maximize the predicted probability of the true class $\\tilde{y}^{(n)}$. One way to formalize this is as maximizing the *log* probability\n", "\n", "\\begin{align}\n", "\\log \\left( \\text{predicted probability of stimulus } n \\text{ being of class } \\tilde{y}^{(n)}\\right) &= \n", "\\begin{cases}\n", "\\log p^{(n)} &\\text{if }\\tilde{y}^{(n)} = 1 \\\\\n", "\\log (1 - p^{(n)}) &\\text{if }\\tilde{y}^{(n)} = 0\n", "\\end{cases}\n", "\\\\\n", "&= \\tilde{y}^{(n)} \\log p^{(n)} + (1 - \\tilde{y}^{(n)})\\log(1 - p^{(n)})\n", "\\end{align}\n", "\n", "You should recognize this expression as the log likelihood of the Bernoulli distribution under the predicted probability $p^{(n)}$. This is the same quantity that is maximized in logistic regression, where the predicted probability $p^{(n)}$ is just a simple linear sum of its inputs (rather than a complicated non-linear operation, like in the deep networks used here).\n", "\n", "To turn this into a loss function, we simply multiply it by -1, resulting in the so-called **binary cross-entropy**, or **negative log likelihood**. Summing over $P$ samples in a batch, the binary cross entropy loss is given by\n", "\n", "\\begin{equation}\n", "L = -\\sum_{n=1}^P \\tilde{y}^{(n)} \\log p^{(n)} + (1 - \\tilde{y}^{(n)})\\log(1 - p^{(n)})\n", "\\end{equation}\n", "\n", "The binary cross-entropy loss can be implemented in PyTorch using the `nn.BCELoss()` loss function (cf. [documentation](https://pytorch.org/docs/master/generated/torch.nn.BCELoss.html)). \n", "\n", "Feel free to check out the code used to optimize the CNN in the `train()` function defined in the hidden cell of helper functions at the top of the notebook. Because the CNN's used here have lots of parameters, we have to use two tricks that we didn't use in the previous parts of this tutorial:\n", "1. We have to use *stochastic* gradient descent (SGD), rather than just gradient descent (GD).\n", "2. We have to use [momentum](https://distill.pub/2017/momentum/) in our SGD updates. This is easily incorporated into our PyTorch implementation by just setting the `momentum` argument of the built-in `optim.SGD` optimizer." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "\n", "## Bonus Section 3: RDM Z-Score Explanation\n", "\n", "If $r^{(s)}_i$ is the response of the $i$th neuron to the $s$th stimulus, then\n", "\n", "\\begin{gather}\n", "M_{ss'} = 1 - \\frac{\\text{Cov}\\left[ r_i^{(s)}, r_i^{(s')} \\right]}{\\sqrt{\\text{Var}\\left[ r_i^{(s)} \\right] \\text{Var}\\left[ r_i^{(s')} \\right]}} = 1 - \\frac{\\sum_{i=1}^N (r_i^{(s)} - \\bar{r}^{(s)})(r_i^{(s')} - \\bar{r}^{(s')}) }{\\sqrt{\\sum_{i=1}^N \\left( r_i^{(s)} - \\bar{r}^{(s)} \\right)^2 \\sum_{i=1}^N \\left( r_i^{(s')} - \\bar{r}^{(s')} \\right)^2 }} \\\\\n", "\\bar{r}^{(s)} = \\frac{1}{N} \\sum_{i=1}^N r_i^{(s)}\n", "\\end{gather}\n", "\n", "This can be computed efficiently by using the $z$-scored responses\n", "\\begin{equation}\n", "z_i^{(s)} = \\frac{r_i^{(s)} - \\bar{r}^{(s)}}{\\sqrt{\\frac{1}{N}\\sum_{i=1}^N \\left( r_i^{(s)} - \\bar{r}^{(s)} \\right)^2}} \\Rightarrow M_{ss'} = 1 - \\frac{1}{N}\\sum_{i=1}^N z_i^{(s)}z_i^{(s')}\n", "\\end{equation}\n", "such that the full matrix can be computed through the matrix multiplication\n", "\\begin{gather}\n", "\\mathbf{M} = 1 - \\frac{1}{N} \\mathbf{ZZ}^\\top \\\\\n", "\\mathbf{Z} = \n", "\\begin{bmatrix}\n", "z_1^{(1)} & z_2^{(1)} & \\ldots & z_N^{(1)} \\\\\n", "z_1^{(2)} & z_2^{(2)} & \\ldots & z_N^{(2)} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "z_1^{(S)} & z_2^{(S)} & \\ldots & z_N^{(S)}\n", "\\end{bmatrix}\n", "\\end{gather}\n", "\n", "\n", "where $S$ is the total number of stimuli. Note that $\\mathbf{Z}$ is an $S \\times N$ matrix, and $\\mathbf{M}$ is an $S \\times S$ matrix." ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W1D5_Tutorial3", "provenance": [], "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.13" } }, "nbformat": 4, "nbformat_minor": 0 }