{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {},
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/NeuromatchAcademy/course-content/blob/main/tutorials/W1D5_DeepLearning/student/W1D5_Tutorial4.ipynb\" target=\"_blank\"><img alt=\"Open In Colab\" src=\"https://colab.research.google.com/assets/colab-badge.svg\"/></a>   <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/tutorials/W1D5_DeepLearning/student/W1D5_Tutorial4.ipynb\" target=\"_blank\"><img alt=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Bonus Tutorial: Diving Deeper into Decoding & Encoding\n",
    "\n",
    "**Week 1, Day 5: Deep Learning**\n",
    "\n",
    "**By Neuromatch Academy**\n",
    "\n",
    "**Content creators**: Jorge A. Menendez, Carsen Stringer\n",
    "\n",
    "**Content reviewers**: Roozbeh Farhoodi, Madineh Sarvestani, Kshitij Dwivedi, Spiros Chavlis, Ella Batty, Michael Waskom\n",
    "\n",
    "**Production editor:** Spiros Chavlis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Tutorial Objectives\n",
    "In this tutorial, we will dive deeper into our decoding model from Tutorial 1 and we will fit a convolutional neural network directly to neural activities.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Setup\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Install and import feedback gadget\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Install and import feedback gadget\n",
    "\n",
    "!pip3 install vibecheck datatops --quiet\n",
    "\n",
    "from vibecheck import DatatopsContentReviewContainer\n",
    "def content_review(notebook_section: str):\n",
    "    return DatatopsContentReviewContainer(\n",
    "        \"\",  # No text prompt\n",
    "        notebook_section,\n",
    "        {\n",
    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
    "            \"name\": \"neuromatch_cn\",\n",
    "            \"user_key\": \"y1x3mpx5\",\n",
    "        },\n",
    "    ).render()\n",
    "\n",
    "\n",
    "feedback_prefix = \"W1D5_T4_Bonus\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Imports\n",
    "import os\n",
    "import numpy as np\n",
    "\n",
    "import torch\n",
    "from torch import nn\n",
    "from torch import optim\n",
    "\n",
    "import matplotlib as mpl\n",
    "from matplotlib import pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Figure Settings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Figure Settings\n",
    "import logging\n",
    "logging.getLogger('matplotlib.font_manager').disabled = True\n",
    "\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Plotting Functions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Plotting Functions\n",
    "\n",
    "def plot_data_matrix(X, ax):\n",
    "  \"\"\"Visualize data matrix of neural responses using a heatmap\n",
    "\n",
    "  Args:\n",
    "    X (torch.Tensor or np.ndarray): matrix of neural responses to visualize\n",
    "        with a heatmap\n",
    "    ax (matplotlib axes): where to plot\n",
    "  \"\"\"\n",
    "  cax = ax.imshow(X, cmap=mpl.cm.pink, vmin=np.percentile(X, 1), vmax=np.percentile(X, 99))\n",
    "  cbar = plt.colorbar(cax, ax=ax, label='normalized neural response')\n",
    "\n",
    "  ax.set_aspect('auto')\n",
    "  ax.set_xticks([])\n",
    "  ax.set_yticks([])\n",
    "\n",
    "\n",
    "def plot_decoded_results(train_loss, test_loss, test_labels,\n",
    "                         predicted_test_labels, n_classes):\n",
    "  \"\"\" Plot decoding results in the form of network training loss and test predictions\n",
    "\n",
    "  Args:\n",
    "    train_loss (list): training error over iterations\n",
    "    test_labels (torch.Tensor): n_test x 1 tensor with orientations of the\n",
    "      stimuli corresponding to each row of train_data, in radians\n",
    "    predicted_test_labels (torch.Tensor): n_test x 1 tensor with predicted orientations of the\n",
    "      stimuli from decoding neural network\n",
    "    n_classes: number of classes\n",
    "\n",
    "  \"\"\"\n",
    "\n",
    "  # Plot results\n",
    "  fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))\n",
    "\n",
    "  # Plot the training loss over iterations of GD\n",
    "  ax1.plot(train_loss)\n",
    "  # Plot the testing loss over iterations of GD\n",
    "  ax1.plot(test_loss)\n",
    "  ax1.legend(['train loss', 'test loss'])\n",
    "\n",
    "  # Plot true stimulus orientation vs. predicted class\n",
    "  ax2.plot(stimuli_test.squeeze(), predicted_test_labels, '.')\n",
    "\n",
    "  ax1.set_xlim([0, None])\n",
    "  ax1.set_ylim([0, None])\n",
    "  ax1.set_xlabel('iterations of gradient descent')\n",
    "  ax1.set_ylabel('negative log likelihood')\n",
    "  ax2.set_xlabel('true stimulus orientation ($^o$)')\n",
    "  ax2.set_ylabel('decoded orientation bin')\n",
    "  ax2.set_xticks(np.linspace(0, 360, n_classes + 1))\n",
    "  ax2.set_yticks(np.arange(n_classes))\n",
    "  class_bins = [f'{i * 360 / n_classes: .0f}$^o$ - {(i + 1) * 360 / n_classes: .0f}$^o$' for i in range(n_classes)]\n",
    "  ax2.set_yticklabels(class_bins);\n",
    "\n",
    "  # Draw bin edges as vertical lines\n",
    "  ax2.set_ylim(ax2.get_ylim())  # fix y-axis limits\n",
    "  for i in range(n_classes):\n",
    "    lower = i * 360 / n_classes\n",
    "    upper = (i + 1) * 360 / n_classes\n",
    "    ax2.plot([lower, lower], ax2.get_ylim(), '-', color=\"0.7\",\n",
    "             linewidth=1, zorder=-1)\n",
    "    ax2.plot([upper, upper], ax2.get_ylim(), '-', color=\"0.7\",\n",
    "             linewidth=1, zorder=-1)\n",
    "\n",
    "  plt.tight_layout()\n",
    "\n",
    "\n",
    "def visualize_weights(W_in_sorted, W_out):\n",
    "  plt.figure(figsize=(10, 4))\n",
    "  plt.subplot(1, 2, 1)\n",
    "  plt.imshow(W_in_sorted, aspect='auto', cmap='bwr', vmin=-1e-2, vmax=1e-2)\n",
    "  plt.colorbar()\n",
    "  plt.xlabel('sorted neurons')\n",
    "  plt.ylabel('hidden units')\n",
    "  plt.title('$W_{in}$')\n",
    "\n",
    "  plt.subplot(1, 2, 2)\n",
    "  plt.imshow(W_out.T, cmap='bwr', vmin=-3, vmax=3)\n",
    "  plt.xticks([])\n",
    "  plt.xlabel('output')\n",
    "  plt.ylabel('hidden units')\n",
    "  plt.colorbar()\n",
    "  plt.title('$W_{out}$')\n",
    "\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def visualize_hidden_units(W_in_sorted, h):\n",
    "  plt.figure(figsize=(10, 8))\n",
    "  plt.subplot(2, 2, 1)\n",
    "  plt.imshow(W_in_sorted, aspect='auto', cmap='bwr', vmin=-1e-2, vmax=1e-2)\n",
    "  plt.xlabel('sorted neurons')\n",
    "  plt.ylabel('hidden units')\n",
    "  plt.colorbar()\n",
    "  plt.title('$W_{in}$')\n",
    "\n",
    "  plt.subplot(2, 2, 2)\n",
    "  plt.imshow(h, aspect='auto')\n",
    "  plt.xlabel('stimulus orientation ($^\\circ$)')\n",
    "  plt.ylabel('hidden units')\n",
    "  plt.colorbar()\n",
    "  plt.title('$\\mathbf{h}$')\n",
    "\n",
    "  plt.subplot(2, 2, 4)\n",
    "  plt.plot(h.T)\n",
    "  plt.xlabel('stimulus orientation ($^\\circ$)')\n",
    "  plt.ylabel('hidden unit activity')\n",
    "  plt.title('$\\mathbf{h}$ tuning curves')\n",
    "\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def plot_weights(weights, channels=[0], colorbar=True):\n",
    "  \"\"\" plot convolutional channel weights\n",
    "  Args:\n",
    "        weights: weights of convolutional filters (conv_channels x K x K)\n",
    "        channels: which conv channels to plot\n",
    "  \"\"\"\n",
    "  wmax = torch.abs(weights).max()\n",
    "  fig, axs = plt.subplots(1, len(channels), figsize=(12, 2.5))\n",
    "  for i, channel in enumerate(channels):\n",
    "    im = axs[i].imshow(weights[channel,0], vmin=-wmax, vmax=wmax, cmap='bwr')\n",
    "    axs[i].set_title('channel %d'%channel)\n",
    "\n",
    "  if colorbar:\n",
    "    ax = fig.add_axes([1, 0.1, 0.05, 0.8])\n",
    "    plt.colorbar(im, ax=ax)\n",
    "    ax.axis('off')\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def plot_tuning(ax, stimuli, respi_train, respi_test, neuron_index, linewidth=2):\n",
    "  \"\"\"Plot the tuning curve of a neuron\"\"\"\n",
    "\n",
    "  ax.plot(stimuli, respi_train, 'y', linewidth=linewidth)  # plot its responses as a function of stimulus orientation\n",
    "  ax.plot(stimuli, respi_test, 'm', linewidth=linewidth)  # plot its responses as a function of stimulus orientation\n",
    "  ax.set_title('neuron %i' % neuron_index)\n",
    "  ax.set_xlabel('stimulus orientation ($^o$)')\n",
    "  ax.set_ylabel('neural response')\n",
    "  ax.set_xticks(np.linspace(0, 360, 5))\n",
    "  ax.set_ylim([-0.5, 2.4])\n",
    "\n",
    "\n",
    "def plot_prediction(ax, y_pred, y_train, y_test, show=False):\n",
    "  \"\"\" plot prediction of neural response + test neural response \"\"\"\n",
    "  ax.plot(y_train, 'y', linewidth=1)\n",
    "  ax.plot(y_test,color='m')\n",
    "  ax.plot(y_pred, 'g', linewidth=3)\n",
    "  ax.set_xlabel('stimulus bin')\n",
    "  ax.set_ylabel('response')\n",
    "  if show:\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "def plot_training_curves(train_loss, test_loss):\n",
    "  \"\"\"\n",
    "  Args:\n",
    "    train_loss (list): training error over iterations\n",
    "    test_loss (list): n_test x 1 tensor with orientations of the\n",
    "      stimuli corresponding to each row of train_data, in radians\n",
    "    predicted_test_labels (torch.Tensor): n_test x 1 tensor with predicted orientations of the\n",
    "      stimuli from decoding neural network\n",
    "  \"\"\"\n",
    "  f, ax = plt.subplots()\n",
    "  # Plot the training loss over iterations of GD\n",
    "  ax.plot(train_loss)\n",
    "  # Plot the testing loss over iterations of GD\n",
    "  ax.plot(test_loss, '.', markersize=10)\n",
    "  ax.legend(['train loss', 'test loss'])\n",
    "  ax.set(xlabel=\"Gradient descent iteration\", ylabel=\"Mean squared error\")\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def identityLine():\n",
    "  \"\"\"\n",
    "  Plot the identity line y=x\n",
    "  \"\"\"\n",
    "  ax = plt.gca()\n",
    "  lims = np.array([ax.get_xlim(), ax.get_ylim()])\n",
    "  minval = lims[:, 0].min()\n",
    "  maxval = lims[:, 1].max()\n",
    "  equal_lims = [minval, maxval]\n",
    "  ax.set_xlim(equal_lims)\n",
    "  ax.set_ylim(equal_lims)\n",
    "  line = ax.plot([minval, maxval], [minval, maxval], color=\"0.7\")\n",
    "  line[0].set_zorder(-1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Helper Functions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Helper Functions\n",
    "\n",
    "def load_data(data_name, bin_width=1):\n",
    "  \"\"\"Load mouse V1 data from Stringer et al. (2019)\n",
    "\n",
    "  Data from study reported in this preprint:\n",
    "  https://www.biorxiv.org/content/10.1101/679324v2.abstract\n",
    "\n",
    "  These data comprise time-averaged responses of ~20,000 neurons\n",
    "  to ~4,000 stimulus gratings of different orientations, recorded\n",
    "  through Calcium imaging. The responses have been normalized by\n",
    "  spontaneous levels of activity and then z-scored over stimuli, so\n",
    "  expect negative numbers. They have also been binned and averaged\n",
    "  to each degree of orientation.\n",
    "\n",
    "  This function returns the relevant data (neural responses and\n",
    "  stimulus orientations) in a torch.Tensor of data type torch.float32\n",
    "  in order to match the default data type for nn.Parameters in\n",
    "  Google Colab.\n",
    "\n",
    "  This function will actually average responses to stimuli with orientations\n",
    "  falling within bins specified by the bin_width argument. This helps\n",
    "  produce individual neural \"responses\" with smoother and more\n",
    "  interpretable tuning curves.\n",
    "\n",
    "  Args:\n",
    "    bin_width (float): size of stimulus bins over which to average neural\n",
    "      responses\n",
    "\n",
    "  Returns:\n",
    "    resp (torch.Tensor): n_stimuli x n_neurons matrix of neural responses,\n",
    "        each row contains the responses of each neuron to a given stimulus.\n",
    "        As mentioned above, neural \"response\" is actually an average over\n",
    "        responses to stimuli with similar angles falling within specified bins.\n",
    "    stimuli: (torch.Tensor): n_stimuli x 1 column vector with orientation\n",
    "        of each stimulus, in degrees. This is actually the mean orientation\n",
    "        of all stimuli in each bin.\n",
    "\n",
    "  \"\"\"\n",
    "  with np.load(data_name) as dobj:\n",
    "    data = dict(**dobj)\n",
    "  resp = data['resp']\n",
    "  stimuli = data['stimuli']\n",
    "\n",
    "  if bin_width > 1:\n",
    "    # Bin neural responses and stimuli\n",
    "    bins = np.digitize(stimuli, np.arange(0, 360 + bin_width, bin_width))\n",
    "    stimuli_binned = np.array([stimuli[bins == i].mean() for i in np.unique(bins)])\n",
    "    resp_binned = np.array([resp[bins == i, :].mean(0) for i in np.unique(bins)])\n",
    "  else:\n",
    "    resp_binned = resp\n",
    "    stimuli_binned = stimuli\n",
    "\n",
    "  # Return as torch.Tensor\n",
    "  resp_tensor = torch.tensor(resp_binned, dtype=torch.float32)\n",
    "  stimuli_tensor = torch.tensor(stimuli_binned, dtype=torch.float32).unsqueeze(1)  # add singleton dimension to make a column vector\n",
    "\n",
    "  return resp_tensor, stimuli_tensor\n",
    "\n",
    "\n",
    "def load_data_split(data_name):\n",
    "  \"\"\"Load mouse V1 data from Stringer et al. (2019)\n",
    "\n",
    "  Data from study reported in this preprint:\n",
    "  https://www.biorxiv.org/content/10.1101/679324v2.abstract\n",
    "\n",
    "  These data comprise time-averaged responses of ~20,000 neurons\n",
    "  to ~4,000 stimulus gratings of different orientations, recorded\n",
    "  through Calcium imaginge. The responses have been normalized by\n",
    "  spontaneous levels of activity and then z-scored over stimuli, so\n",
    "  expect negative numbers. The responses were split into train and\n",
    "  test and then each set were averaged in bins of 6 degrees.\n",
    "\n",
    "  This function returns the relevant data (neural responses and\n",
    "  stimulus orientations) in a torch.Tensor of data type torch.float32\n",
    "  in order to match the default data type for nn.Parameters in\n",
    "  Google Colab.\n",
    "\n",
    "  It will hold out some of the trials when averaging to allow us to have test\n",
    "  tuning curves.\n",
    "\n",
    "  Args:\n",
    "    data_name (str): filename to load\n",
    "\n",
    "  Returns:\n",
    "    resp_train (torch.Tensor): n_stimuli x n_neurons matrix of neural responses,\n",
    "        each row contains the responses of each neuron to a given stimulus.\n",
    "        As mentioned above, neural \"response\" is actually an average over\n",
    "        responses to stimuli with similar angles falling within specified bins.\n",
    "    resp_test (torch.Tensor): n_stimuli x n_neurons matrix of neural responses,\n",
    "        each row contains the responses of each neuron to a given stimulus.\n",
    "        As mentioned above, neural \"response\" is actually an average over\n",
    "        responses to stimuli with similar angles falling within specified bins\n",
    "    stimuli: (torch.Tensor): n_stimuli x 1 column vector with orientation\n",
    "        of each stimulus, in degrees. This is actually the mean orientation\n",
    "        of all stimuli in each bin.\n",
    "\n",
    "  \"\"\"\n",
    "  with np.load(data_name) as dobj:\n",
    "    data = dict(**dobj)\n",
    "  resp_train = data['resp_train']\n",
    "  resp_test = data['resp_test']\n",
    "  stimuli = data['stimuli']\n",
    "\n",
    "  # Return as torch.Tensor\n",
    "  resp_train_tensor = torch.tensor(resp_train, dtype=torch.float32)\n",
    "  resp_test_tensor = torch.tensor(resp_test, dtype=torch.float32)\n",
    "  stimuli_tensor = torch.tensor(stimuli, dtype=torch.float32)\n",
    "\n",
    "  return resp_train_tensor, resp_test_tensor, stimuli_tensor\n",
    "\n",
    "\n",
    "def get_data(n_stim, train_data, train_labels):\n",
    "  \"\"\" Return n_stim randomly drawn stimuli/resp pairs\n",
    "\n",
    "  Args:\n",
    "    n_stim (scalar): number of stimuli to draw\n",
    "    resp (torch.Tensor):\n",
    "    train_data (torch.Tensor): n_train x n_neurons tensor with neural\n",
    "      responses to train on\n",
    "    train_labels (torch.Tensor): n_train x 1 tensor with orientations of the\n",
    "      stimuli corresponding to each row of train_data, in radians\n",
    "\n",
    "  Returns:\n",
    "    (torch.Tensor, torch.Tensor): n_stim x n_neurons tensor of neural responses and n_stim x 1 of orientations respectively\n",
    "  \"\"\"\n",
    "  n_stimuli = train_labels.shape[0]\n",
    "  istim = np.random.choice(n_stimuli, n_stim)\n",
    "  r = train_data[istim]  # neural responses to this stimulus\n",
    "  ori = train_labels[istim]  # true stimulus orientation\n",
    "\n",
    "  return r, ori\n",
    "\n",
    "\n",
    "def stimulus_class(ori, n_classes):\n",
    "  \"\"\"Get stimulus class from stimulus orientation\n",
    "\n",
    "  Args:\n",
    "    ori (torch.Tensor): orientations of stimuli to return classes for\n",
    "    n_classes (int): total number of classes\n",
    "\n",
    "  Returns:\n",
    "    torch.Tensor: 1D tensor with the classes for each stimulus\n",
    "\n",
    "  \"\"\"\n",
    "  bins = np.linspace(0, 360, n_classes + 1)\n",
    "  return torch.tensor(np.digitize(ori.squeeze(), bins)) - 1  # minus 1 to accommodate Python indexing\n",
    "\n",
    "def grating(angle, sf=1 / 28, res=0.1, patch=False):\n",
    "  \"\"\"Generate oriented grating stimulus\n",
    "\n",
    "  Args:\n",
    "    angle (float): orientation of grating (angle from vertical), in degrees\n",
    "    sf (float): controls spatial frequency of the grating\n",
    "    res (float): resolution of image. Smaller values will make the image\n",
    "      smaller in terms of pixels. res=1.0 corresponds to 640 x 480 pixels.\n",
    "    patch (boolean): set to True to make the grating a localized\n",
    "      patch on the left side of the image. If False, then the\n",
    "      grating occupies the full image.\n",
    "\n",
    "  Returns:\n",
    "    torch.Tensor: (res * 480) x (res * 640) pixel oriented grating image\n",
    "\n",
    "  \"\"\"\n",
    "\n",
    "  angle = np.deg2rad(angle)  # transform to radians\n",
    "\n",
    "  wpix, hpix = 640, 480  # width and height of image in pixels for res=1.0\n",
    "\n",
    "  xx, yy = np.meshgrid(sf * np.arange(0, wpix * res) / res, sf * np.arange(0, hpix * res) / res)\n",
    "\n",
    "  if patch:\n",
    "    gratings = np.cos(xx * np.cos(angle + .1) + yy * np.sin(angle + .1))  # phase shift to make it better fit within patch\n",
    "    gratings[gratings < 0] = 0\n",
    "    gratings[gratings > 0] = 1\n",
    "    xcent = gratings.shape[1] * .75\n",
    "    ycent = gratings.shape[0] / 2\n",
    "    xxc, yyc = np.meshgrid(np.arange(0, gratings.shape[1]), np.arange(0, gratings.shape[0]))\n",
    "    icirc = ((xxc - xcent) ** 2 + (yyc - ycent) ** 2) ** 0.5 < wpix / 3 / 2 * res\n",
    "    gratings[~icirc] = 0.5\n",
    "\n",
    "  else:\n",
    "    gratings = np.cos(xx * np.cos(angle) + yy * np.sin(angle))\n",
    "    gratings[gratings < 0] = 0\n",
    "    gratings[gratings > 0] = 1\n",
    "\n",
    "  gratings -= 0.5\n",
    "\n",
    "  # Return torch tensor\n",
    "  return torch.tensor(gratings, dtype=torch.float32)\n",
    "\n",
    "\n",
    "def filters(out_channels=6, K=7):\n",
    "  \"\"\" make example filters, some center-surround and gabors\n",
    "  Returns:\n",
    "      filters: out_channels x K x K\n",
    "  \"\"\"\n",
    "  grid = np.linspace(-K/2, K/2, K).astype(np.float32)\n",
    "  xx,yy = np.meshgrid(grid, grid, indexing='ij')\n",
    "\n",
    "  # create center-surround filters\n",
    "  sigma = 1.1\n",
    "  gaussian = np.exp(-(xx**2 + yy**2)**0.5/(2*sigma**2))\n",
    "  wide_gaussian = np.exp(-(xx**2 + yy**2)**0.5/(2*(sigma*2)**2))\n",
    "  center_surround = gaussian - 0.5 * wide_gaussian\n",
    "\n",
    "  # create gabor filters\n",
    "  thetas = np.linspace(0, 180, out_channels-2+1)[:-1] * np.pi/180\n",
    "  gabors = np.zeros((len(thetas), K, K), np.float32)\n",
    "  lam = 10\n",
    "  phi = np.pi/2\n",
    "  gaussian = np.exp(-(xx**2 + yy**2)**0.5/(2*(sigma*0.4)**2))\n",
    "  for i,theta in enumerate(thetas):\n",
    "    x = xx*np.cos(theta) + yy*np.sin(theta)\n",
    "    gabors[i] = gaussian * np.cos(2*np.pi*x/lam + phi)\n",
    "\n",
    "  filters = np.concatenate((center_surround[np.newaxis,:,:],\n",
    "                            -1*center_surround[np.newaxis,:,:],\n",
    "                            gabors),\n",
    "                           axis=0)\n",
    "  filters /= np.abs(filters).max(axis=(1,2))[:,np.newaxis,np.newaxis]\n",
    "  filters -= filters.mean(axis=(1,2))[:,np.newaxis,np.newaxis]\n",
    "  # convert to torch\n",
    "  filters = torch.from_numpy(filters)\n",
    "  # add channel axis\n",
    "  filters = filters.unsqueeze(1)\n",
    "\n",
    "  return filters\n",
    "\n",
    "\n",
    "def regularized_MSE_loss(output, target, weights=None, L2_penalty=0, L1_penalty=0):\n",
    "  \"\"\"loss function for MSE\n",
    "\n",
    "  Args:\n",
    "    output (torch.Tensor): output of network\n",
    "    target (torch.Tensor): neural response network is trying to predict\n",
    "    weights (torch.Tensor): fully-connected layer weights of network (net.out_layer.weight)\n",
    "    L2_penalty : scaling factor of sum of squared weights\n",
    "    L1_penalty : scalaing factor for sum of absolute weights\n",
    "\n",
    "  Returns:\n",
    "    (torch.Tensor) mean-squared error with L1 and L2 penalties added\n",
    "\n",
    "  \"\"\"\n",
    "\n",
    "  loss_fn = nn.MSELoss()\n",
    "  loss = loss_fn(output, target)\n",
    "\n",
    "  if weights is not None:\n",
    "    L2 = L2_penalty * torch.square(weights).sum()\n",
    "    L1 = L1_penalty * torch.abs(weights).sum()\n",
    "    loss += L1 + L2\n",
    "\n",
    "  return loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Data retrieval and loading\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "#@title Data retrieval and loading\n",
    "import hashlib\n",
    "import requests\n",
    "\n",
    "fname = \"W3D4_stringer_oribinned1.npz\"\n",
    "url = \"https://osf.io/683xc/download\"\n",
    "expected_md5 = \"436599dfd8ebe6019f066c38aed20580\"\n",
    "\n",
    "if not os.path.isfile(fname):\n",
    "  try:\n",
    "    r = requests.get(url)\n",
    "  except requests.ConnectionError:\n",
    "    print(\"!!! Failed to download data !!!\")\n",
    "  else:\n",
    "    if r.status_code != requests.codes.ok:\n",
    "      print(\"!!! Failed to download data !!!\")\n",
    "    elif hashlib.md5(r.content).hexdigest() != expected_md5:\n",
    "      print(\"!!! Data download appears corrupted !!!\")\n",
    "    else:\n",
    "      with open(fname, \"wb\") as fid:\n",
    "        fid.write(r.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Set device (GPU or CPU). Execute `set_device()`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Set device (GPU or CPU). Execute `set_device()`\n",
    "\n",
    "# inform the user if the notebook uses GPU or CPU.\n",
    "\n",
    "def set_device():\n",
    "  \"\"\"\n",
    "  Set the device. CUDA if available, CPU otherwise\n",
    "\n",
    "  Args:\n",
    "    None\n",
    "\n",
    "  Returns:\n",
    "    Nothing\n",
    "  \"\"\"\n",
    "  device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
    "  if device != \"cuda\":\n",
    "    print(\"WARNING: For this notebook to perform best, \"\n",
    "        \"if possible, in the menu under `Runtime` -> \"\n",
    "        \"`Change runtime type.`  select `GPU` \")\n",
    "  else:\n",
    "    print(\"GPU is enabled in this notebook.\")\n",
    "\n",
    "  return device"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Calling the function `set_device`, we can choose the GPU, if it is available. If not, the system will automatically choose the CPU. We can also manually choose the device by calling:\n",
    "\n",
    "```python\n",
    "device = torch.device('cpu')\n",
    "```\n",
    "\n",
    "or\n",
    "\n",
    "```python\n",
    "device = torch.device('cuda')\n",
    "```\n",
    "\n",
    "To enable the GPU in google colab, you choose 'Runtime' --> 'Change runtime type', and from the dropdown menu you select 'GPU' under 'Hardware accelerator'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Choose the GPU, if it is available\n",
    "device = set_device()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 1: Decoding - Investigating model and evaluating performance\n",
    "\n",
    "In this section, we will return to our decoding model from Tutorial 1 and further investigate its performance, and then improve it in the next section. Let's first load the data again and train our model, as we did in Tutorial 1.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to load and visualize data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to load and visualize data\n",
    "\n",
    "# Load data\n",
    "resp_all, stimuli_all = load_data(fname)  # argument to this function specifies bin width\n",
    "n_stimuli, n_neurons = resp_all.shape\n",
    "\n",
    "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))\n",
    "\n",
    "# Visualize data matrix\n",
    "plot_data_matrix(resp_all[:100, :].T, ax1)  # plot responses of first 100 neurons\n",
    "ax1.set_xlabel('stimulus')\n",
    "ax1.set_ylabel('neuron')\n",
    "\n",
    "# Plot tuning curves of three random neurons\n",
    "ineurons = np.random.choice(n_neurons, 3, replace=False)  # pick three random neurons\n",
    "ax2.plot(stimuli_all, resp_all[:, ineurons])\n",
    "ax2.set_xlabel('stimulus orientation ($^o$)')\n",
    "ax2.set_ylabel('neural response')\n",
    "ax2.set_xticks(np.linspace(0, 360, 5))\n",
    "\n",
    "fig.suptitle(f'{n_neurons} neurons in response to {n_stimuli} stimuli')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to split into training and test sets\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to split into training and test sets\n",
    "\n",
    "# Set random seeds for reproducibility\n",
    "np.random.seed(4)\n",
    "torch.manual_seed(4)\n",
    "\n",
    "# Split data into training set and testing set\n",
    "n_train = int(0.6 * n_stimuli)  # use 60% of all data for training set\n",
    "ishuffle = torch.randperm(n_stimuli)\n",
    "itrain = ishuffle[:n_train]  # indices of data samples to include in training set\n",
    "itest = ishuffle[n_train:]  # indices of data samples to include in testing set\n",
    "stimuli_test = stimuli_all[itest]\n",
    "resp_test = resp_all[itest]\n",
    "stimuli_train = stimuli_all[itrain]\n",
    "resp_train = resp_all[itrain]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to train the network\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to train the network\n",
    "\n",
    "class DeepNetReLU(nn.Module):\n",
    "  \"\"\" network with a single hidden layer h with a RELU \"\"\"\n",
    "\n",
    "  def __init__(self, n_inputs, n_hidden):\n",
    "    super().__init__()  # needed to invoke the properties of the parent class nn.Module\n",
    "    self.in_layer = nn.Linear(n_inputs, n_hidden) # neural activity --> hidden units\n",
    "    self.out_layer = nn.Linear(n_hidden, 1) # hidden units --> output\n",
    "\n",
    "  def forward(self, r):\n",
    "\n",
    "    h = torch.relu(self.in_layer(r)) # h is size (n_inputs, n_hidden)\n",
    "    y = self.out_layer(h) # y is size (n_inputs, 1)\n",
    "\n",
    "    return y\n",
    "\n",
    "\n",
    "def train(net, loss_fn, train_data, train_labels,\n",
    "          n_epochs=50, learning_rate=1e-4):\n",
    "  \"\"\"Run gradient descent to optimize parameters of a given network\n",
    "\n",
    "  Args:\n",
    "    net (nn.Module): PyTorch network whose parameters to optimize\n",
    "    loss_fn: built-in PyTorch loss function to minimize\n",
    "    train_data (torch.Tensor): n_train x n_neurons tensor with neural\n",
    "      responses to train on\n",
    "    train_labels (torch.Tensor): n_train x 1 tensor with orientations of the\n",
    "      stimuli corresponding to each row of train_data\n",
    "    n_epochs (int, optional): number of epochs of gradient descent to run\n",
    "    learning_rate (float, optional): learning rate to use for gradient descent\n",
    "\n",
    "  Returns:\n",
    "    (list): training loss over iterations\n",
    "\n",
    "  \"\"\"\n",
    "\n",
    "  # Initialize PyTorch SGD optimizer\n",
    "  optimizer = optim.SGD(net.parameters(), lr=learning_rate)\n",
    "\n",
    "  # Placeholder to save the loss at each iteration\n",
    "  train_loss = []\n",
    "\n",
    "  # Loop over epochs\n",
    "  for i in range(n_epochs):\n",
    "\n",
    "    # compute network output from inputs in train_data\n",
    "    out = net(train_data)  # compute network output from inputs in train_data\n",
    "\n",
    "    # evaluate loss function\n",
    "    loss = loss_fn(out, train_labels)\n",
    "\n",
    "    # Clear previous gradients\n",
    "    optimizer.zero_grad()\n",
    "\n",
    "    # Compute gradients\n",
    "    loss.backward()\n",
    "\n",
    "    # Update weights\n",
    "    optimizer.step()\n",
    "\n",
    "    # Store current value of loss\n",
    "    train_loss.append(loss.item())  # .item() needed to transform the tensor output of loss_fn to a scalar\n",
    "\n",
    "    # Track progress\n",
    "    if (i + 1) % (n_epochs // 5) == 0:\n",
    "      print(f'iteration {i + 1}/{n_epochs} | loss: {loss.item():.3f}')\n",
    "\n",
    "  return train_loss\n",
    "\n",
    "\n",
    "# Set random seeds for reproducibility\n",
    "np.random.seed(1)\n",
    "torch.manual_seed(1)\n",
    "\n",
    "# Initialize network with 10 hidden units\n",
    "net = DeepNetReLU(n_neurons, 10).to(device)\n",
    "\n",
    "# Initialize built-in PyTorch MSE loss function\n",
    "loss_fn = nn.MSELoss().to(device)\n",
    "\n",
    "# Run gradient descent on data\n",
    "train_loss = train(net, loss_fn, resp_train.to(device), stimuli_train.to(device))\n",
    "\n",
    "# Plot the training loss over iterations of GD\n",
    "plt.plot(train_loss)\n",
    "plt.xlim([0, None])\n",
    "plt.ylim([0, None])\n",
    "plt.xlabel('iterations of gradient descent')\n",
    "plt.ylabel('mean squared error')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 1.1: Peering inside the decoding model\n",
    "\n",
    "We have built a model to perform decoding that takes as input neural activity and outputs the estimated angle of the stimulus. We can imagine that an animal that needs to determine angles would have a brain area that acts like the hidden layer in our model. It transforms the neural activity from the visual cortex and outputs a decision. Decisions about orientations of edges could include figuring out how to jump onto a branch, how to avoid obstacles, or determining the type of an object, e.g., food or predator.\n",
    "\n",
    "What sort of connectivity would this brain area have with the visual cortex? Determining this experimentally would be very difficult, perhaps we can look at the model we have and see if its structure constrains the type of connectivity we'd expect.\n",
    "\n",
    "Below we will visualize the weights from the neurons in visual cortex to the hidden units $\\mathbf{W}_{in}$, and the weights from the hidden units to the output orientation $\\mathbf{W}_{out}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "**PyTorch Note**:\n",
    "\n",
    "An important thing to note in the code below is the `.detach()` method. The PyTorch `nn.Module` class is special in that, behind the scenes, each of the variables inside it are linked to each other in a computational graph, for the purposes of automatic differentiation (the algorithm used in `.backward()` to compute gradients). As a result, if you want to do anything that is not a `torch` operation to the parameters or outputs of an `nn.Module` class, you'll need to first \"detach\" it from its computational graph, and \"attach\" it on the CPU with the `.cpu()` method. This is what the `.detach()` and `cpu()` methods do. In this code below, we need to call it on the weights of the network. We also convert the variable from a PyTorch tensor to a numpy array using the `.numpy()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "W_in = net.in_layer.weight.detach().cpu().numpy()  # we can run .detach(), .cpu() and .numpy() to get a numpy array\n",
    "print(f'shape of W_in: {W_in.shape}')\n",
    "\n",
    "W_out = net.out_layer.weight.detach().cpu().numpy()  # we can run .detach() .cpu() and .numpy() to get a numpy array\n",
    "print(f'shape of W_out: {W_out.shape}')\n",
    "\n",
    "plt.figure(figsize=(10, 4))\n",
    "plt.subplot(121)\n",
    "plt.imshow(W_in, aspect='auto', cmap='bwr', vmin=-1e-2, vmax=1e-2)\n",
    "plt.xlabel('neurons')\n",
    "plt.ylabel('hidden units')\n",
    "plt.colorbar()\n",
    "plt.title('$W_{in}$')\n",
    "\n",
    "plt.subplot(122)\n",
    "plt.imshow(W_out.T, cmap='bwr', vmin=-3, vmax=3)\n",
    "plt.xticks([])\n",
    "plt.xlabel('output')\n",
    "plt.ylabel('hidden units')\n",
    "plt.colorbar()\n",
    "plt.title('$W_{out}$')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 1.1: Visualizing weights\n",
    "\n",
    "It's difficult to see any structure in this weight matrix. How might we visualize it in a better way?\n",
    "\n",
    "Perhaps we can sort the neurons by their preferred orientation. We will use the `resp_all` matrix which is 360 stimuli (360$^\\circ$ of angles) by number of neurons. How do we find the preferred orientation?\n",
    "\n",
    "Let's visualize one column of this `resp_all` matrix first as we did at the beginning of the notebook. Can you see how we might want to first process this tuning curve before choosing the preferred orientation?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "idx = 235\n",
    "plt.plot(resp_all[:, idx])\n",
    "plt.ylabel('neural response')\n",
    "plt.xlabel('stimulus orientation ($^\\circ$)')\n",
    "plt.title(f'neuron {idx}')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Looking at this tuning curve, there is a bit of noise across orientations, so let's smooth with a gaussian filter and then find the position of the maximum for each neuron. After getting the maximum position aka the \"preferred orientation\" for each neuron, we will re-sort the $\\mathbf{W}_{in}$ matrix. The maximum position in a matrix can be computed using the `.argmax(axis=_)` function in python -- make sure you specify the right axis though! Next, to get the indices of a matrix sorted we will need to use the `.argsort()` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "from scipy.ndimage import gaussian_filter1d\n",
    "\n",
    "# first let's smooth the tuning curves resp_all to make sure we get\n",
    "# an accurate peak that isn't just noise\n",
    "# resp_all is size (n_stimuli, n_neurons)\n",
    "resp_smoothed = gaussian_filter1d(resp_all, 5, axis=0)\n",
    "# resp_smoothed is size (n_stimuli, n_neurons)\n",
    "\n",
    "############################################################################\n",
    "## TO DO for students\n",
    "# Fill out function and remove\n",
    "raise NotImplementedError(\"Student exercise: find preferred orientation\")\n",
    "############################################################################\n",
    "\n",
    "## find position of max response for each neuron\n",
    "## aka preferred orientation for each neuron\n",
    "preferred_orientation = ...\n",
    "\n",
    "# Resort W_in matrix by preferred orientation\n",
    "isort = preferred_orientation.argsort()\n",
    "W_in_sorted = W_in[:, isort]\n",
    "\n",
    "# plot resorted W_in matrix\n",
    "visualize_weights(W_in_sorted, W_out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial4_Solution_f0b29255.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=945.0 height=378.0 src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/tutorials/W1D5_DeepLearning/static/W1D5_Tutorial4_Solution_f0b29255_0.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Visualizing_weights_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "We can plot the activity of hidden units across various stimuli to better understand the hidden units. Recall the formula for the hidden units\n",
    "\n",
    "\\begin{equation}\n",
    "\\mathbf{h}^{(n)} = \\phi(\\mathbf{W}^{in} \\mathbf{r}^{(n)} + \\mathbf{b}^{in})\n",
    "\\end{equation}\n",
    "\n",
    "We can compute the activity $\\mathbf{h}^{(n)}$ directly using $\\mathbf{W}^{in}$ and $\\mathbf{b}^{in}$ or we can modify our network above to return `h` in the `.forward()` method. In this case, we'll compute using the equation, but in practice the second method is recommended."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "W_in = net.in_layer.weight  # size (10, n_neurons)\n",
    "b_in = net.in_layer.bias.unsqueeze(1)  # size (10, 1)\n",
    "\n",
    "# Compute hidden unit activity h\n",
    "h = torch.relu(W_in @ resp_all.T.to(device) + b_in)\n",
    "h = h.detach().cpu().numpy()  # we can run .detach(), .cpu() and .numpy() to get a numpy array\n",
    "\n",
    "# Visualize\n",
    "visualize_hidden_units(W_in_sorted, h)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Think! 1.1: Interpreting weights\n",
    "\n",
    "We have just visualized how the model transforms neural activity to hidden layer activity. How should we interpret these matrices? Here are some guiding questions to explore:\n",
    "* Why are some of the $\\mathbf{W}_{in}$ weights close to zero for some of the hidden units? Do these correspond to close to zero weights in $\\mathbf{W}_{out}$?\n",
    "* Note how each hidden unit seems to have the strongest weights to two groups of neurons in $\\mathbf{W}_{in}$, corresponding to two different sets of preferred orientations. Why do you think that is? What might this tell us about the structure of the tuning curves of the neurons?\n",
    "* It appears that there is at least one hidden unit active at each orientation, which is necessary to decode across all orientations. What would happen if some orientations did not activate any hidden units?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial4_Solution_4a594500.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Interpreting_weights_Discussion\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 1.2: Generalization performance with test data\n",
    "\n",
    "Note that gradient descent is essentially an algorithm for fitting the network's parameters to a given set of training data. Selecting this training data is thus crucial for ensuring that the optimized parameters **generalize** to unseen data they weren't trained on. In our case, for example, we want to make sure that our trained network is good at decoding stimulus orientations from neural responses to any orientation, not just those in our data set.\n",
    "\n",
    "To ensure this, we have split up the full data set into a **training set** and a **testing set**. In Coding Exercise 3.2, we trained a deep network by optimizing the parameters on a training set. We will now evaluate how good the optimized parameters are by using the trained network to decode stimulus orientations from neural responses in the testing set. Good decoding performance on this testing set should then be indicative of good decoding performance on the neurons' responses to any other stimulus orientation. This procedure is commonly used in machine learning (not just in deep learning)and is typically referred to as **cross-validation**.\n",
    "\n",
    "We will compute the MSE on the test data and plot the decoded stimulus orientations as a function of the true stimulus.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to evaluate and plot test error\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to evaluate and plot test error\n",
    "\n",
    "out = net(resp_test.to(device))  # decode stimulus orientation for neural responses in testing set\n",
    "ori = stimuli_test  # true stimulus orientations\n",
    "test_loss = loss_fn(out.to(device), ori.to(device))  # MSE on testing set (Hint: use loss_fn initialized in previous exercise)\n",
    "\n",
    "plt.plot(ori, out.detach().cpu(), '.')  # N.B. need to use .detach() to pass network output into plt.plot()\n",
    "identityLine()  # draw the identity line y=x; deviations from this indicate bad decoding!\n",
    "plt.title(f'MSE on testing set: {test_loss.item():.2f}')  # N.B. need to use .item() to turn test_loss into a scalar\n",
    "plt.xlabel('true stimulus orientation ($^o$)')\n",
    "plt.ylabel('decoded stimulus orientation ($^o$)')\n",
    "axticks = np.linspace(0, 360, 5)\n",
    "plt.xticks(axticks)\n",
    "plt.yticks(axticks)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "If interested, please see the next section to think more about model criticism, improve the loss function accordingly, and add regularization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Section 2: Decoding - Evaluating & improving models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "## Section 2.1: Model criticism\n",
    "\n",
    "Let's now take a step back and think about how our model is succeeding/failing and how to improve it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to plot decoding error\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to plot decoding error\n",
    "\n",
    "out = net(resp_test.to(device)).to(device)  # decode stimulus orientation for neural responses in testing set\n",
    "ori = stimuli_test.to(device)  # true stimulus orientations\n",
    "error = out - ori  # decoding error\n",
    "\n",
    "\n",
    "plt.plot(ori.detach().cpu(), error.detach().cpu(), '.')   # plot decoding error as a function of true orientation (make sure all arguments to plt.plot() have been detached from PyTorch network!)\n",
    "\n",
    "# Plotting\n",
    "plt.xlabel('true stimulus orientation ($^o$)')\n",
    "plt.ylabel('decoding error ($^o$)')\n",
    "plt.xticks(np.linspace(0, 360, 5))\n",
    "plt.yticks(np.linspace(-360, 360, 9))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Think! 2.1: Delving into error problems\n",
    "\n",
    "In the cell below, we will plot the *decoding error* for each neural response in the testing set. The decoding error is defined as the decoded stimulus orientation minus true stimulus orientation\n",
    "\n",
    "\\begin{equation}\n",
    "\\text{decoding error} = y^{(n)} - \\tilde{y}^{(n)}\n",
    "\\end{equation}\n",
    "\n",
    "In particular, we plot decoding error as a function of the true stimulus orientation.\n",
    "\n",
    "* Are some stimulus orientations harder to decode than others?\n",
    "* If so, in what sense? Are the decoded orientations for these stimuli more variable and/or are they biased?\n",
    "* Can you explain this variability/bias? What makes these stimulus orientations different from the others?\n",
    "* (Will be addressed in next exercise) Can you think of a way to modify the deep network in order to avoid this?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial4_Solution_10c7128b.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Delving_into_error_problems_Discussion\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.2: Improving the loss function\n",
    "As illustrated in the previous exercise, the squared error is not a good loss function for circular quantities like angles, since two angles that are very close (e.g. $1^o$ and $359^o$) might actually have a very large squared error.\n",
    "\n",
    "Here, we'll avoid this problem by changing our loss function to treat our decoding problem as a **classification problem**. Rather than estimating the *exact* angle of the stimulus, we'll now aim to construct a decoder that classifies the stimulus into one of $C$ classes, corresponding to different bins of angles of width $b = \\frac{360}{C}$. The true class $\\tilde{y}^{(n)}$ of stimulus $i$ is now given by\n",
    "\n",
    "\\begin{equation}\n",
    "\\tilde{y}^{(n)} =\n",
    "\\begin{cases}\n",
    "1 &\\text{if angle of stimulus $n$ is in the range } [0, b] \\\\\n",
    "2 &\\text{if angle of stimulus $n$ is in the range } [b, 2b] \\\\\n",
    "3 &\\text{if angle of stimulus $n$ is in the range } [2b, 3b] \\\\\n",
    "\\vdots \\\\\n",
    "C &\\text{if angle of stimulus $n$ is in the range } [(C-1)b, 360]\n",
    "\\end{cases}\n",
    "\\end{equation}\n",
    "\n",
    "We have a helper function `stimulus_class` that will extract `n_classes` stimulus classes for us from the stimulus orientations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "To decode the stimulus class from neural responses, we'll use a deep network that outputs a $C$-dimensional vector of probabilities $\\mathbf{p} = \\begin{bmatrix} p_1, p_2, \\ldots, p_C \\end{bmatrix}^T$, corresponding to the estimated probabilities of the stimulus belonging to each class $1, 2, \\ldots, C$.\n",
    "\n",
    "To ensure the network's outputs are indeed probabilities (i.e. they are positive numbers between 0 and 1, and sum to 1), we'll use a [softmax function](https://en.wikipedia.org/wiki/Softmax_function) to transform the real-valued outputs from the hidden layer into probabilities. Letting $\\sigma(\\cdot)$ denote this softmax function, the equations describing our network are\n",
    "\n",
    "\\begin{align}\n",
    "\\mathbf{h}^{(n)} &= \\phi(\\mathbf{W}^{in} \\mathbf{r}^{(n)} + \\mathbf{b}^{in}), && [\\mathbf{W}^{in}: M \\times N], \\\\\n",
    "\\mathbf{p}^{(n)} &= \\sigma(\\mathbf{W}^{out} \\mathbf{h}^{(n)} + \\mathbf{b}^{out}),  && [\\mathbf{W}^{out}: C \\times M],\n",
    "\\end{align}\n",
    "\n",
    "The decoded stimulus class is then given by that assigned the highest probability by the network:\n",
    "\n",
    "\\begin{equation}\n",
    "y^{(n)} = \\underset{i}{\\arg\\max} \\,\\, p_i\n",
    "\\end{equation}\n",
    "\n",
    "The softmax function can be implemented in PyTorch simply using `torch.softmax()`.\n",
    "\n",
    "Often *log* probabilities are easier to work with than actual probabilities, because probabilities tend to be very small numbers that computers have trouble representing. We'll therefore actually use the logarithm of the softmax as the output of our network,\n",
    "\n",
    "\\begin{equation}\n",
    "\\mathbf{l}^{(n)} = \\log \\left( \\mathbf{p}^{(n)} \\right)\n",
    "\\end{equation}\n",
    "\n",
    "which can implemented in PyTorch together with the softmax via an `nn.LogSoftmax` layer. The nice thing about the logarithmic function is that it's *monotonic*, so if one probability is larger/smaller than another, then its logarithm is also larger/smaller than the other's. We therefore have that\n",
    "\n",
    "\\begin{equation}\n",
    "y^{(n)} = \\underset{i}{\\arg\\max} \\,\\, p_i^{(n)} = \\underset{i}{\\arg\\max} \\, \\log p_i^{(n)} = \\underset{i}{\\arg\\max} \\,\\, l_i^{(n)}\n",
    "\\end{equation}\n",
    "\n",
    "See the next cell for code for constructing a deep network with one hidden layer that of ReLU's that outputs a vector of log probabilities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Deep network for classification\n",
    "class DeepNetSoftmax(nn.Module):\n",
    "  \"\"\"Deep Network with one hidden layer, for classification\n",
    "\n",
    "  Args:\n",
    "    n_inputs (int): number of input units\n",
    "    n_hidden (int): number of units in hidden layer\n",
    "    n_classes (int): number of outputs, i.e. number of classes to output\n",
    "      probabilities for\n",
    "\n",
    "  Attributes:\n",
    "    in_layer (nn.Linear): weights and biases of input layer\n",
    "    out_layer (nn.Linear): weights and biases of output layer\n",
    "\n",
    "  \"\"\"\n",
    "\n",
    "  def __init__(self, n_inputs, n_hidden, n_classes):\n",
    "    super().__init__()  # needed to invoke the properties of the parent class nn.Module\n",
    "    self.in_layer = nn.Linear(n_inputs, n_hidden)  # neural activity --> hidden units\n",
    "    self.out_layer = nn.Linear(n_hidden, n_classes)  # hidden units --> outputs\n",
    "    self.logprob = nn.LogSoftmax(dim=1)  # probabilities across columns should sum to 1 (each output row corresponds to a different input)\n",
    "\n",
    "  def forward(self, r):\n",
    "    \"\"\"Predict stimulus orientation bin from neural responses\n",
    "\n",
    "    Args:\n",
    "      r (torch.Tensor): n_stimuli x n_inputs tensor with neural responses to n_stimuli\n",
    "\n",
    "    Returns:\n",
    "      torch.Tensor: n_stimuli x n_classes tensor with predicted class probabilities\n",
    "\n",
    "    \"\"\"\n",
    "    h = torch.relu(self.in_layer(r))\n",
    "    logp = self.logprob(self.out_layer(h))\n",
    "    return logp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "What should our loss function now be? Ideally, we want the probabilities outputted by our network to be such that the probability of the true stimulus class is high. One way to formalize this is to say that we want to maximize the *log* probability of the true stimulus class $\\tilde{y}^{(n)}$ under the class probabilities predicted by the network,\n",
    "\n",
    "\\begin{equation}\n",
    "\\log \\left( \\text{predicted probability of stimulus } n \\text{ being of class } \\tilde{y}^{(n)} \\right) = \\log p^{(n)}_{\\tilde{y}^{(n)}} = l^{(n)}_{\\tilde{y}^{(n)}}\n",
    "\\end{equation}\n",
    "\n",
    "To turn this into a loss function to be *minimized*, we can then simply multiply it by -1: maximizing the log probability is the same as minimizing the *negative* log probability. Summing over a batch of $P$ inputs, our loss function is then given by\n",
    "\n",
    "\\begin{equation}\n",
    "L = -\\sum_{n=1}^P \\log p^{(n)}_{\\tilde{y}^{(n)}} = -\\sum_{n=1}^P l^{(n)}_{\\tilde{y}^{(n)}}\n",
    "\\end{equation}\n",
    "\n",
    "In the deep learning community, this loss function is typically referred to as the **cross-entropy**, or **negative log likelihood**. The corresponding built-in loss function in PyTorch is `nn.NLLLoss()` (documentation [here](https://pytorch.org/docs/master/generated/torch.nn.CrossEntropyLoss.html))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 2.2: A new loss function\n",
    "In the next cell, we've provided most of the code to train and test a network to decode stimulus orientations via classification, by minimizing the negative log likelihood. Fill in the missing pieces.\n",
    "\n",
    "Once you've done this, have a look at the plotted results. Does changing the loss function from mean squared error to a classification loss solve our problems? Note that errors may still occur -- but are these errors as bad as the ones that our network above was making?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Run this cell to create train function that uses test_data and L1 and L2 terms for next exercise\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Run this cell to create train function that uses test_data and L1 and L2 terms for next exercise\n",
    "def train(net, loss_fn, train_data, train_labels,\n",
    "          n_iter=50, learning_rate=1e-4,\n",
    "          test_data=None, test_labels=None,\n",
    "          L2_penalty=0, L1_penalty=0):\n",
    "  \"\"\"Run gradient descent to optimize parameters of a given network\n",
    "\n",
    "  Args:\n",
    "    net (nn.Module): PyTorch network whose parameters to optimize\n",
    "    loss_fn: built-in PyTorch loss function to minimize\n",
    "    train_data (torch.Tensor): n_train x n_neurons tensor with neural\n",
    "      responses to train on\n",
    "    train_labels (torch.Tensor): n_train x 1 tensor with orientations of the\n",
    "      stimuli corresponding to each row of train_data\n",
    "    n_iter (int, optional): number of iterations of gradient descent to run\n",
    "    learning_rate (float, optional): learning rate to use for gradient descent\n",
    "    test_data (torch.Tensor, optional): n_test x n_neurons tensor with neural\n",
    "      responses to test on\n",
    "    test_labels (torch.Tensor, optional): n_test x 1 tensor with orientations of\n",
    "      the stimuli corresponding to each row of test_data\n",
    "    L2_penalty (float, optional): l2 penalty regularizer coefficient\n",
    "    L1_penalty (float, optional): l1 penalty regularizer coefficient\n",
    "\n",
    "  Returns:\n",
    "    (list): training loss over iterations\n",
    "\n",
    "  \"\"\"\n",
    "\n",
    "  # Initialize PyTorch SGD optimizer\n",
    "  optimizer = optim.SGD(net.parameters(), lr=learning_rate)\n",
    "\n",
    "  # Placeholder to save the loss at each iteration\n",
    "  train_loss = []\n",
    "  test_loss = []\n",
    "\n",
    "  # Loop over epochs\n",
    "  for i in range(n_iter):\n",
    "\n",
    "    # compute network output from inputs in train_data\n",
    "    out = net(train_data)  # compute network output from inputs in train_data\n",
    "\n",
    "    # evaluate loss function\n",
    "    if L2_penalty==0 and L1_penalty==0:\n",
    "      # normal loss function\n",
    "      loss = loss_fn(out, train_labels)\n",
    "    else:\n",
    "      # custom loss function from bonus exercise 3.3\n",
    "      loss = loss_fn(out, train_labels, net.in_layer.weight,\n",
    "                     L2_penalty, L1_penalty)\n",
    "\n",
    "    # Clear previous gradients\n",
    "    optimizer.zero_grad()\n",
    "    # Compute gradients\n",
    "    loss.backward()\n",
    "\n",
    "    # Update weights\n",
    "    optimizer.step()\n",
    "\n",
    "    # Store current value of loss\n",
    "    train_loss.append(loss.item())  # .item() needed to transform the tensor output of loss_fn to a scalar\n",
    "\n",
    "    # Get loss for test_data, if given (we will use this in the bonus exercise 3.2 and 3.3)\n",
    "    if test_data is not None:\n",
    "      out_test = net(test_data)\n",
    "      # evaluate loss function\n",
    "      if L2_penalty==0 and L1_penalty==0:\n",
    "        # normal loss function\n",
    "        loss_test = loss_fn(out_test, test_labels)\n",
    "      else:\n",
    "        # (BONUS code) custom loss function from Bonus exercise 3.3\n",
    "        loss_test = loss_fn(out_test, test_labels, net.in_layer.weight,\n",
    "                            L2_penalty, L1_penalty)\n",
    "      test_loss.append(loss_test.item())  # .item() needed to transform the tensor output of loss_fn to a scalar\n",
    "\n",
    "    # Track progress\n",
    "    if (i + 1) % (n_iter // 5) == 0:\n",
    "      if test_data is None:\n",
    "        print(f'iteration {i + 1}/{n_iter} | loss: {loss.item():.3f}')\n",
    "      else:\n",
    "        print(f'iteration {i + 1}/{n_iter} | loss: {loss.item():.3f} | test_loss: {loss_test.item():.3f}')\n",
    "\n",
    "  if test_data is None:\n",
    "    return train_loss\n",
    "  else:\n",
    "    return train_loss, test_loss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def decode_orientation(net, n_classes, loss_fn,\n",
    "                       train_data, train_labels, test_data, test_labels,\n",
    "                       n_iter=1000, L2_penalty=0, L1_penalty=0, device='cpu'):\n",
    "  \"\"\" Initialize, train, and test deep network to decode binned orientation from neural responses\n",
    "\n",
    "  Args:\n",
    "    net (nn.Module): deep network to run\n",
    "    n_classes (scalar): number of classes in which to bin orientation\n",
    "    loss_fn (function): loss function to run\n",
    "    train_data (torch.Tensor): n_train x n_neurons tensor with neural\n",
    "      responses to train on\n",
    "    train_labels (torch.Tensor): n_train x 1 tensor with orientations of the\n",
    "      stimuli corresponding to each row of train_data, in radians\n",
    "    test_data (torch.Tensor): n_test x n_neurons tensor with neural\n",
    "      responses to train on\n",
    "    test_labels (torch.Tensor): n_test x 1 tensor with orientations of the\n",
    "      stimuli corresponding to each row of train_data, in radians\n",
    "    n_iter (int, optional): number of iterations to run optimization\n",
    "    L2_penalty (float, optional): l2 penalty regularizer coefficient\n",
    "    L1_penalty (float, optional): l1 penalty regularizer coefficient\n",
    "\n",
    "  Returns:\n",
    "    (list, torch.Tensor): training loss over iterations, n_test x 1 tensor with predicted orientations of the\n",
    "      stimuli from decoding neural network\n",
    "  \"\"\"\n",
    "\n",
    "  # Bin stimulus orientations in training set\n",
    "  train_binned_labels = stimulus_class(train_labels, n_classes).to(device)\n",
    "  test_binned_labels = stimulus_class(test_labels, n_classes).to(device)\n",
    "  train_data = train_data.to(device)\n",
    "  test_data = test_data.to(device)\n",
    "\n",
    "  # Run GD on training set data, using learning rate of 0.1\n",
    "  # (add optional arguments test_data and test_binned_labels!)\n",
    "  train_loss, test_loss = train(net.to(device), loss_fn, train_data, train_binned_labels,\n",
    "                                learning_rate=0.1, test_data=test_data,\n",
    "                                test_labels=test_binned_labels, n_iter=n_iter,\n",
    "                                L2_penalty=L2_penalty, L1_penalty=L1_penalty)\n",
    "\n",
    "  # Decode neural responses in testing set data\n",
    "  out = net(test_data).to(device)\n",
    "  out_labels = np.argmax(out.detach().cpu(), axis=1)  # predicted classes\n",
    "\n",
    "  frac_correct = (out_labels == test_binned_labels.cpu()).sum() / len(test_binned_labels)\n",
    "  print(f'>>> fraction correct = {frac_correct:.3f}')\n",
    "\n",
    "  return train_loss, test_loss, out_labels\n",
    "\n",
    "\n",
    "# Set random seeds for reproducibility\n",
    "np.random.seed(1)\n",
    "torch.manual_seed(1)\n",
    "\n",
    "n_classes = 20\n",
    "\n",
    "############################################################################\n",
    "## TO DO for students\n",
    "# Fill out function and remove\n",
    "raise NotImplementedError(\"Student exercise: make network and loss\")\n",
    "############################################################################\n",
    "\n",
    "# Initialize network\n",
    "net = ...  # use M=20 hidden units\n",
    "\n",
    "# Initialize built-in PyTorch negative log likelihood loss function\n",
    "loss_fn = ...\n",
    "\n",
    "# Train network and run it on test images\n",
    "# this function uses the train function you wrote before\n",
    "train_loss, test_loss, predicted_test_labels = decode_orientation(net, n_classes, loss_fn.to(device),\n",
    "                                                                  resp_train, stimuli_train,\n",
    "                                                                  resp_test, stimuli_test,\n",
    "                                                                  device=device)\n",
    "\n",
    "# Plot results\n",
    "plot_decoded_results(train_loss, test_loss, stimuli_test, predicted_test_labels, n_classes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial4_Solution_e924146b.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=1577.0 height=576.0 src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/tutorials/W1D5_DeepLearning/static/W1D5_Tutorial4_Solution_e924146b_5.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_A_new_loss_function_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "How do the weights $W_{in}$ from the neurons to the hidden layer look now?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "W_in = net.in_layer.weight.detach().cpu().numpy() # we can run detach, cpu and numpy to get a numpy array\n",
    "print(f'shape of W_in: {W_in.shape}')\n",
    "\n",
    "W_out = net.out_layer.weight.detach().cpu().numpy()\n",
    "\n",
    "# plot resorted W_in matrix\n",
    "visualize_weights(W_in[:, isort], W_out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.3: Regularization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Video 1: Regularization\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 1: Regularization\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', 'Qnn5OPHKo5w'), ('Bilibili', 'BV1na4y1a7ug')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Regularization_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "As discussed in the lecture, it is often important to incorporate regularization terms into the loss function to avoid overfitting. In particular, in this case, we will use these terms to enforce sparsity in the linear layer from neurons to hidden units.\n",
    "\n",
    "Here we'll consider the classic L2 regularization penalty $\\mathcal{R}_{L2}$, which is the sum of squares of each weight in the network $\\sum_{ij} {\\mathbf{W}^{out}_{ij}}^2$ times a constant that we call `L2_penalty`.\n",
    "\n",
    "We will also add an L1 regularization penalty $\\mathcal{R}_{L1}$ to enforce sparsity of the weights, which is the sum of the absolute values of the weights $\\sum_{ij} |{\\mathbf{W}^{out}_{ij}}|$ times a constant that we call `L1_penalty`.\n",
    "\n",
    "We will add both of these to the loss function:\n",
    "\n",
    "\\begin{equation}\n",
    "L = (y - \\tilde{y})^2 + \\mathcal{R}_{L2} + \\mathcal{R}_{L1}\n",
    "\\end{equation}\n",
    "\n",
    "The parameters `L2_penalty` and `L1_penalty` are inputs to the train function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 2.3: Add regularization to training\n",
    "\n",
    "We will create a new loss function that adds L1 and L2 regularization.\n",
    "In particular, you will:\n",
    "* add L2 loss penalty to the weights\n",
    "* add L1 loss penalty to the weights\n",
    "\n",
    "\n",
    "We will then train the network using this loss function. Full training will take a few minutes: if you want to train for just a few steps to speed up the code while iterating on your code, you can decrease the n_iter input from 500.\n",
    "\n",
    "Hint: since we are using `torch` instead of `np`, we will use `torch.abs` instead of `np.absolute`. You can use `torch.sum` or `.sum()` to sum over a tensor.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def regularized_loss(output, target, weights, L2_penalty=0, L1_penalty=0,\n",
    "                     device='cpu'):\n",
    "  \"\"\"loss function with L2 and L1 regularization\n",
    "\n",
    "  Args:\n",
    "    output (torch.Tensor): output of network\n",
    "    target (torch.Tensor): neural response network is trying to predict\n",
    "    weights (torch.Tensor): linear layer weights from neurons to hidden units (net.in_layer.weight)\n",
    "    L2_penalty : scaling factor of sum of squared weights\n",
    "    L1_penalty : scaling factor for sum of absolute weights\n",
    "\n",
    "  Returns:\n",
    "    (torch.Tensor) mean-squared error with L1 and L2 penalties added\n",
    "\n",
    "  \"\"\"\n",
    "\n",
    "  ##############################################################################\n",
    "  # TO DO: add L1 and L2 regularization to the loss function\n",
    "  raise NotImplementedError(\"Student exercise: complete regularized_loss\")\n",
    "  ##############################################################################\n",
    "\n",
    "  loss_fn = nn.NLLLoss()\n",
    "  loss = loss_fn(output, target)\n",
    "\n",
    "  L2 = L2_penalty * ...\n",
    "  L1 = L1_penalty * ...\n",
    "  loss += L1 + L2\n",
    "\n",
    "  return loss.to(device)\n",
    "\n",
    "\n",
    "# Set random seeds for reproducibility\n",
    "np.random.seed(1)\n",
    "torch.manual_seed(1)\n",
    "\n",
    "n_classes = 20\n",
    "\n",
    "# Initialize network\n",
    "net = DeepNetSoftmax(n_neurons, 20, n_classes)  # use M=20 hidden units\n",
    "\n",
    "# Here you can play with L2_penalty > 0, L1_penalty > 0\n",
    "train_loss, test_loss, predicted_test_labels = decode_orientation(net, n_classes,\n",
    "                                                                  regularized_loss,\n",
    "                                                                  resp_train, stimuli_train,\n",
    "                                                                  resp_test, stimuli_test,\n",
    "                                                                  n_iter=1000,\n",
    "                                                                  L2_penalty=1e-2,\n",
    "                                                                  L1_penalty=5e-4,\n",
    "                                                                  device=device)\n",
    "\n",
    "# Plot results\n",
    "plot_decoded_results(train_loss, test_loss, stimuli_test, predicted_test_labels, n_classes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial4_Solution_9e7e87e5.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=1577.0 height=575.0 src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/tutorials/W1D5_DeepLearning/static/W1D5_Tutorial4_Solution_9e7e87e5_5.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Add_regularization_to_training_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "It seems we were overfitting a little because we increased the accuracy a small amount by adding an L1 and L2 regularization penalty. What errors are still being made by the model?\n",
    "\n",
    "Let's see how the weights look after adding `L1_penalty > 0`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "W_in = net.in_layer.weight.detach().cpu().numpy()  # we can run detach, cpu and numpy to get a numpy array\n",
    "print(f'shape of W_in: {W_in.shape}')\n",
    "\n",
    "W_out = net.out_layer.weight.detach().cpu().numpy()\n",
    "\n",
    "visualize_weights(W_in[:, isort], W_out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "The weights appear to be sparser than before."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 3: Encoding - Convolutional Networks for Encoding"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 2: Convolutional Encoding Model\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 2: Convolutional Encoding Model\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', 'UNBOPZf0QNQ'), ('Bilibili', 'BV1Eh41167WP')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Convolutional_Encoding_Model_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "\n",
    "In neuroscience, we often want to understand how the brain represents external stimuli. One approach to discovering these representations is to build an encoding model that takes as input the external stimuli (in this case grating stimuli) and outputs the neural responses.\n",
    "\n",
    "Because the visual cortex is often thought to be a convolutional network where the same filters are combined across the visual field, we will use a model with a convolutional layer. We learned how to build a convolutional layer in the previous section. We will add to this convolutional layer a fully connected layer from the output of the convolutions to the neurons. We will then visualize the weights of this fully connected layer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to download the data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to download the data\n",
    "import hashlib\n",
    "import requests\n",
    "\n",
    "fname = \"W3D4_stringer_oribinned6_split.npz\"\n",
    "url = \"https://osf.io/p3aeb/download\"\n",
    "expected_md5 = \"b3f7245c6221234a676b71a1f43c3bb5\"\n",
    "\n",
    "if not os.path.isfile(fname):\n",
    "  try:\n",
    "    r = requests.get(url)\n",
    "  except requests.ConnectionError:\n",
    "    print(\"!!! Failed to download data !!!\")\n",
    "  else:\n",
    "    if r.status_code != requests.codes.ok:\n",
    "      print(\"!!! Failed to download data !!!\")\n",
    "    elif hashlib.md5(r.content).hexdigest() != expected_md5:\n",
    "      print(\"!!! Data download appears corrupted !!!\")\n",
    "    else:\n",
    "      with open(fname, \"wb\") as fid:\n",
    "        fid.write(r.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 3.1: Neural tuning curves\n",
    "\n",
    "In the next cell, we plot the turning curves of a random subset of neurons. We have binned the stimuli orientations more than in Tutorial 1. We create the gratings images as above for the 60 orientations below, and save them to a variable `grating_stimuli`.\n",
    "\n",
    "Rerun the cell to look at different example neurons and observe the diversity of tuning curves in the population. How can we fit these neural responses with an encoding model?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to load data, create stimuli, and plot neural tuning curves\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to load data, create stimuli, and plot neural tuning curves\n",
    "\n",
    "### Load data and bin at 8 degrees\n",
    "# responses are split into test and train\n",
    "resp_train, resp_test, stimuli = load_data_split(fname)\n",
    "n_stimuli, n_neurons = resp_train.shape\n",
    "print(f'resp_train contains averaged responses of {n_neurons} neurons'\n",
    "      f'to {n_stimuli} binned stimuli')\n",
    "\n",
    "# also make stimuli into images\n",
    "orientations = np.linspace(0, 360, 61)[:-1] - 180\n",
    "grating_stimuli = np.zeros((60, 1, 12, 16), np.float32)\n",
    "for i, ori in enumerate(orientations):\n",
    "  grating_stimuli[i,0] = grating(ori, res=0.025)#[18:30, 24:40]\n",
    "\n",
    "grating_stimuli = torch.from_numpy(grating_stimuli)\n",
    "print('grating_stimuli contains 60 stimuli of size 12 x 16')\n",
    "\n",
    "# Visualize tuning curves\n",
    "fig, axs = plt.subplots(3, 5, figsize=(15,7))\n",
    "for k, ax in enumerate(axs.flatten()):\n",
    "  neuron_index = np.random.choice(n_neurons)  # pick random neuron\n",
    "  plot_tuning(ax, stimuli, resp_train[:, neuron_index], resp_test[:, neuron_index], neuron_index, linewidth=2)\n",
    "  if k == 0:\n",
    "    ax.text(1.0, 0.9, 'train', color='y', transform=ax.transAxes)\n",
    "    ax.text(1.0, 0.65, 'test', color='m', transform=ax.transAxes)\n",
    "\n",
    "fig.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 3.2: Adding a fully-connected layer to create encoding model\n",
    "\n",
    "We will build a torch model like above with a convolutional layer. Additionally, we will add a fully connected linear layer from the convolutional units to the neurons. We will use 6 convolutional channels ($C^{out}$) and a kernel size ($K$) of 7 with a stride of 1 and padding of $K/2$ (same as above). The stimulus is size `(12, 16)`. Then the convolutional unit activations will go through a linear layer to be transformed into neural responses."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Think! 3.2: Number of units and weights\n",
    "\n",
    "* How many units will the convolutional layer have?\n",
    "* How many weights will the fully connected linear layer from convolutional units to neurons have?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial4_Solution_c35524a1.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Number_of_units_and_weights_Discussion\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 3.2: Add linear layer\n",
    "\n",
    "Remember in Tutorial 1 we used linear layers. Use your knowledge from Tutorial 1 to add a linear layer to the model we created above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute to get `train` function for our neural encoding model\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute to get `train` function for our neural encoding model\n",
    "\n",
    "def train(net, custom_loss, train_data, train_labels,\n",
    "          test_data=None, test_labels=None,\n",
    "          learning_rate=10, n_iter=500, L2_penalty=0., L1_penalty=0.):\n",
    "  \"\"\"Run gradient descent for network without batches\n",
    "\n",
    "  Args:\n",
    "    net (nn.Module): deep network whose parameters to optimize with SGD\n",
    "    custom_loss: loss function for network\n",
    "    train_data: training data (n_train x input features)\n",
    "    train_labels: training labels (n_train x output features)\n",
    "    test_data: test data (n_train x input features)\n",
    "    test_labels: test labels (n_train x output features)\n",
    "    learning_rate (float): learning rate for gradient descent\n",
    "    n_epochs (int): number of epochs to run gradient descent\n",
    "    L2_penalty (float): magnitude of L2 penalty\n",
    "    L1_penalty (float): magnitude of L1 penalty\n",
    "\n",
    "  Returns:\n",
    "    train_loss: training loss across iterations\n",
    "    test_loss: testing loss across iterations\n",
    "\n",
    "  \"\"\"\n",
    "  optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9) # Initialize PyTorch SGD optimizer\n",
    "  train_loss = np.nan * np.zeros(n_iter)  # Placeholder for train loss\n",
    "  test_loss = np.nan * np.zeros(n_iter)  # Placeholder for test loss\n",
    "\n",
    "  # Loop over epochs\n",
    "  for i in range(n_iter):\n",
    "    if n_iter < 10:\n",
    "      for param_group in self.optimizer.param_groups:\n",
    "        param_group['lr'] = np.linspace(0, learning_rate, 10)[n_iter]\n",
    "    y_pred = net(train_data) # Forward pass: compute predicted y by passing train_data to the model.\n",
    "\n",
    "    if L2_penalty>0 or L1_penalty>0:\n",
    "      weights = net.out_layer.weight\n",
    "      loss = custom_loss(y_pred, train_labels, weights, L2_penalty, L1_penalty)\n",
    "    else:\n",
    "      loss = custom_loss(y_pred, train_labels)\n",
    "\n",
    "    ### Update parameters\n",
    "    optimizer.zero_grad() # zero out gradients\n",
    "    loss.backward() # Backward pass: compute gradient of the loss with respect to model parameters\n",
    "    optimizer.step() # step parameters in gradient direction\n",
    "    train_loss[i] = loss.item()  # .item() transforms the tensor to a scalar and does .detach() for us\n",
    "\n",
    "    # Track progress\n",
    "    if (i+1) % (n_iter // 10) == 0 or i==0:\n",
    "      if test_data is not None and test_labels is not None:\n",
    "        y_pred = net(test_data)\n",
    "        if L2_penalty>0 or L1_penalty>0:\n",
    "          loss = custom_loss(y_pred, test_labels, weights, L2_penalty, L1_penalty)\n",
    "        else:\n",
    "          loss = custom_loss(y_pred, test_labels)\n",
    "        test_loss[i] = loss.item()\n",
    "        print(f'iteration {i+1}/{n_iter} | train loss: {train_loss[i]:.4f} | test loss: {test_loss[i]:.4f}')\n",
    "      else:\n",
    "        print(f'iteration {i+1}/{n_iter} | train loss: {train_loss[i]:.4f}')\n",
    "\n",
    "  return train_loss, test_loss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "class ConvFC(nn.Module):\n",
    "  \"\"\"Deep network with one convolutional layer + one fully connected layer\n",
    "\n",
    "  Attributes:\n",
    "    conv (nn.Conv1d): convolutional layer\n",
    "    dims (tuple): shape of convolutional layer output\n",
    "    out_layer (nn.Linear): linear layer\n",
    "\n",
    "  \"\"\"\n",
    "\n",
    "  def __init__(self, n_neurons, c_in=1, c_out=6, K=7, b=12*16,\n",
    "               filters=None):\n",
    "    \"\"\" initialize layer\n",
    "    Args:\n",
    "        c_in: number of input stimulus channels\n",
    "        c_out: number of convolutional channels\n",
    "        K: size of each convolutional filter\n",
    "        h: number of stimulus bins, n_bins\n",
    "    \"\"\"\n",
    "    super().__init__()\n",
    "    self.conv = nn.Conv2d(c_in, c_out, kernel_size=K, padding=K//2, stride=1)\n",
    "    self.dims = (c_out, b)  # dimensions of conv layer output\n",
    "    M = np.prod(self.dims)  # number of hidden units\n",
    "\n",
    "    ################################################################################\n",
    "    ## TO DO for students: add fully connected layer to network (self.out_layer)\n",
    "    # Fill out function and remove\n",
    "    raise NotImplementedError(\"Student exercise: add fully connected layer to initialize network\")\n",
    "    ################################################################################\n",
    "    self.out_layer = nn.Linear(M, ...)\n",
    "\n",
    "    # initialize weights\n",
    "    if filters is not None:\n",
    "      self.conv.weight = nn.Parameter(filters)\n",
    "      self.conv.bias = nn.Parameter(torch.zeros((c_out,), dtype=torch.float32))\n",
    "\n",
    "    nn.init.normal_(self.out_layer.weight, std=0.01)  # initialize weights to be small\n",
    "\n",
    "  def forward(self, s):\n",
    "    \"\"\" Predict neural responses to stimuli s\n",
    "\n",
    "    Args:\n",
    "        s (torch.Tensor): n_stimuli x c_in x h x w tensor with stimuli\n",
    "\n",
    "    Returns:\n",
    "        y (torch.Tensor): n_stimuli x n_neurons tensor with predicted neural responses\n",
    "\n",
    "    \"\"\"\n",
    "    a = self.conv(s)  # output of convolutional layer\n",
    "    a = a.view(-1, np.prod(self.dims))  # flatten each convolutional layer output into a vector\n",
    "\n",
    "    ################################################################################\n",
    "    ## TO DO for students: add fully connected layer to forward pass of network (self.out_layer)\n",
    "    # Fill out function and remove\n",
    "    raise NotImplementedError(\"Student exercise: add fully connected layer to network\")\n",
    "    ################################################################################\n",
    "    y = ...\n",
    "\n",
    "    return y\n",
    "\n",
    "\n",
    "# Initialize network\n",
    "n_neurons = resp_train.shape[1]\n",
    "## Initialize with filters from Tutorial 2\n",
    "example_filters = filters(out_channels=6, K=7)\n",
    "\n",
    "net = ConvFC(n_neurons, filters = example_filters).to(device)\n",
    "\n",
    "# Run GD on training set data\n",
    "# ** this time we are also providing the test data to estimate the test loss\n",
    "train_loss, test_loss = train(net, regularized_MSE_loss,\n",
    "                              train_data=grating_stimuli.to(device), train_labels=resp_train.to(device),\n",
    "                              test_data=grating_stimuli.to(device), test_labels=resp_test.to(device),\n",
    "                              n_iter=200, learning_rate=2,\n",
    "                              L2_penalty=5e-4, L1_penalty=1e-6)\n",
    "\n",
    "# Visualize\n",
    "plot_training_curves(train_loss, test_loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D5_DeepLearning/solutions/W1D5_Tutorial4_Solution_5dffefa9.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=777.0 height=577.0 src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/tutorials/W1D5_DeepLearning/static/W1D5_Tutorial4_Solution_5dffefa9_11.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Add_linear_layer_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "How well can we fit single neuron tuning curves with this model? What aspects of the tuning curves are we capturing?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to examine predictions for random subsets of neurons\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to examine predictions for random subsets of neurons\n",
    "\n",
    "y_pred = net(grating_stimuli.to(device))\n",
    "# Visualize tuning curves & plot neural predictions\n",
    "fig, axs = plt.subplots(2, 5, figsize=(15,6))\n",
    "for k, ax in enumerate(axs.flatten()):\n",
    "  ineur = np.random.choice(n_neurons)\n",
    "  plot_prediction(ax, y_pred[:, ineur].detach().cpu(),\n",
    "                  resp_train[:, ineur],\n",
    "                  resp_test[:, ineur])\n",
    "  if k==0:\n",
    "    ax.text(.1, 1., 'train', color='y', transform=ax.transAxes)\n",
    "    ax.text(.1, 0.9, 'test', color='m', transform=ax.transAxes)\n",
    "    ax.text(.1, 0.8, 'prediction', color='g', transform=ax.transAxes)\n",
    "\n",
    "fig.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "We can see if the convolutional channels changed at all from their initialization as center-surround and Gabor filters. If they don't then it means that they were a sufficient basis set to explain the responses of the neurons to orientations to the accuracy seen above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# get weights of conv layer in convLayer\n",
    "out_channels = 6  # how many convolutional channels to have in our layer\n",
    "weights = net.conv.weight.detach().cpu()\n",
    "print(weights.shape)  # Can you identify what each of the dimensions are?\n",
    "\n",
    "plot_weights(weights, channels=np.arange(0, out_channels))"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "gpuType": "T4",
   "include_colab_link": true,
   "name": "W1D5_Tutorial4",
   "provenance": [],
   "toc_visible": true
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.21"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}