{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "id": "view-in-github" }, "source": [ "\"Open   \"Open" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 3: Simultaneous fitting/regression\n", "\n", "**Week 3, Day 5: Network Causality**\n", "\n", "**By Neuromatch Academy**\n", "\n", "**Content creators**: Ari Benjamin, Tony Liu, Konrad Kording\n", "\n", "**Content reviewers**: Mike X Cohen, Madineh Sarvestani, Yoni Friedman, Ella Batty, Michael Waskom\n", "\n", "**Production editors:** Gagana B, Spiros Chavlis" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Tutorial objectives\n", "\n", "*Estimated timing of tutorial: 20 min*\n", "\n", "This is tutorial 3 on our day of examining causality. Below is the high level outline of what we'll cover today, with the sections we will focus on in this notebook in bold:\n", "\n", "1. Master definitions of causality\n", "2. Understand that estimating causality is possible\n", "3. Learn 4 different methods and understand when they fail\n", " * perturbations\n", " * correlations\n", " * **simultaneous fitting/regression**\n", " * instrumental variables\n", "\n", "**Tutorial 3 objectives**\n", "\n", "In tutorial 2 we explored correlation as an approximation for causation and learned that correlation $\\neq$ causation for larger networks. However, computing correlations is a rather simple approach, and you may be wondering: will more sophisticated techniques allow us to better estimate causality? Can't we control things?\n", "\n", "Here we'll use some common advanced (but controversial) methods that estimate causality from observational data. These methods rely on fitting a function to our data directly, instead of trying to use perturbations or correlations. Since we have the full closed-form equation of our system, we can try these methods and see how well they work in estimating causal connectivity when there are no perturbations. Specifically, we will:\n", "\n", "- Learn about more advanced (but also controversial) techniques for estimating causality\n", " * conditional probabilities (**regression**)\n", "- Explore limitations and failure modes\n", " * understand the problem of **omitted variable bias**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @markdown\n", "from IPython.display import IFrame\n", "from ipywidgets import widgets\n", "out = widgets.Output()\n", "with out:\n", " print(f\"If you want to download the slides: https://osf.io/download/gp4m9/\")\n", " display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/gp4m9/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and import feedback gadget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Install and import feedback gadget\n", "\n", "!pip3 install vibecheck datatops --quiet\n", "\n", "from vibecheck import DatatopsContentReviewContainer\n", "def content_review(notebook_section: str):\n", " return DatatopsContentReviewContainer(\n", " \"\", # No text prompt\n", " notebook_section,\n", " {\n", " \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n", " \"name\": \"neuromatch_cn\",\n", " \"user_key\": \"y1x3mpx5\",\n", " },\n", " ).render()\n", "\n", "\n", "feedback_prefix = \"W3D5_T3\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# Imports\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "from sklearn.multioutput import MultiOutputRegressor\n", "from sklearn.linear_model import Lasso" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure Settings\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Figure Settings\n", "import logging\n", "logging.getLogger('matplotlib.font_manager').disabled = True\n", "\n", "import ipywidgets as widgets # interactive display\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting Functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Plotting Functions\n", "\n", "def see_neurons(A, ax, ratio_observed=1, arrows=True, show=False):\n", " \"\"\"\n", " Visualizes the connectivity matrix.\n", "\n", " Args:\n", " A (np.ndarray): the connectivity matrix of shape (n_neurons, n_neurons)\n", " ax (plt.axis): the matplotlib axis to display on\n", "\n", " Returns:\n", " Nothing, but visualizes A.\n", " \"\"\"\n", " n = len(A)\n", "\n", " ax.set_aspect('equal')\n", " thetas = np.linspace(0, np.pi * 2, n, endpoint=False)\n", " x, y = np.cos(thetas), np.sin(thetas),\n", " if arrows:\n", " for i in range(n):\n", " for j in range(n):\n", " if A[i, j] > 0:\n", " ax.arrow(x[i], y[i], x[j] - x[i], y[j] - y[i], color='k',\n", " head_width=.05, width = A[i, j] / 25, shape='right',\n", " length_includes_head=True, alpha=.2)\n", " if ratio_observed < 1:\n", " nn = int(n * ratio_observed)\n", " ax.scatter(x[:nn], y[:nn], c='r', s=150, label='Observed')\n", " ax.scatter(x[nn:], y[nn:], c='b', s=150, label='Unobserved')\n", " ax.legend(fontsize=15)\n", " else:\n", " ax.scatter(x, y, c='k', s=150)\n", " ax.axis('off')\n", " if show:\n", " plt.show()\n", "\n", "\n", "def plot_connectivity_matrix(A, ax=None):\n", " \"\"\"Plot the (weighted) connectivity matrix A as a heatmap\n", "\n", " Args:\n", " A (ndarray): connectivity matrix (n_neurons by n_neurons)\n", " ax: axis on which to display connectivity matrix\n", " \"\"\"\n", " if ax is None:\n", " ax = plt.gca()\n", " lim = np.abs(A).max()\n", " ax.imshow(A, vmin=-lim, vmax=lim, cmap=\"coolwarm\")\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Helper Functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Helper Functions\n", "\n", "def sigmoid(x):\n", " \"\"\"\n", " Compute sigmoid nonlinearity element-wise on x.\n", "\n", " Args:\n", " x (np.ndarray): the numpy data array we want to transform\n", " Returns\n", " (np.ndarray): x with sigmoid nonlinearity applied\n", " \"\"\"\n", " return 1 / (1 + np.exp(-x))\n", "\n", "\n", "def create_connectivity(n_neurons, random_state=42, p=0.9):\n", " \"\"\"\n", " Generate our nxn causal connectivity matrix.\n", "\n", " Args:\n", " n_neurons (int): the number of neurons in our system.\n", " random_state (int): random seed for reproducibility\n", "\n", " Returns:\n", " A (np.ndarray): our 0.1 sparse connectivity matrix\n", " \"\"\"\n", " np.random.seed(random_state)\n", " A_0 = np.random.choice([0, 1], size=(n_neurons, n_neurons), p=[p, 1 - p])\n", "\n", " # set the timescale of the dynamical system to about 100 steps\n", " _, s_vals, _ = np.linalg.svd(A_0)\n", " if s_vals[0] != 0 and not np.isnan(s_vals[0]):\n", " A = A_0 / (1.01 * s_vals[0])\n", " else:\n", " eps = 1e-12\n", " A = eps*np.ones_like(A_0) # if denominator is zero, set A to a small value\n", "\n", " return A\n", "\n", "\n", "def simulate_neurons(A, timesteps, random_state=42):\n", " \"\"\"\n", " Simulates a dynamical system for the specified number of neurons and timesteps.\n", "\n", " Args:\n", " A (np.array): the connectivity matrix\n", " timesteps (int): the number of timesteps to simulate our system.\n", " random_state (int): random seed for reproducibility\n", "\n", " Returns:\n", " X has shape (n_neurons, timeteps).\n", " \"\"\"\n", " np.random.seed(random_state)\n", "\n", "\n", " n_neurons = len(A)\n", " X = np.zeros((n_neurons, timesteps))\n", "\n", " for t in range(timesteps - 1):\n", " epsilon = np.random.multivariate_normal(np.zeros(n_neurons),\n", " np.eye(n_neurons))\n", " X[:, t + 1] = sigmoid(A.dot(X[:, t]) + epsilon)\n", "\n", " assert epsilon.shape == (n_neurons, )\n", " return X\n", "\n", "\n", "def get_sys_corr(n_neurons, timesteps, random_state=42, neuron_idx=None):\n", " \"\"\"\n", " A wrapper function for our correlation calculations between A and R.\n", "\n", " Args:\n", " n_neurons (int): the number of neurons in our system.\n", " timesteps (int): the number of timesteps to simulate our system.\n", " random_state (int): seed for reproducibility\n", " neuron_idx (int): optionally provide a neuron idx to slice out\n", "\n", " Returns:\n", " A single float correlation value representing the similarity between A and R\n", " \"\"\"\n", "\n", " A = create_connectivity(n_neurons, random_state)\n", " X = simulate_neurons(A, timesteps)\n", "\n", " R = correlation_for_all_neurons(X)\n", "\n", " return np.corrcoef(A.flatten(), R.flatten())[0, 1]\n", "\n", "\n", "def correlation_for_all_neurons(X):\n", " \"\"\"\n", " Computes the connectivity matrix for the all neurons using correlations\n", "\n", " Args:\n", " X: the matrix of activities\n", "\n", " Returns:\n", " estimated_connectivity (np.ndarray): estimated connectivity for the\n", " selected neuron, of shape (n_neurons,)\n", " \"\"\"\n", " n_neurons = len(X)\n", " S = np.concatenate([X[:, 1:], X[:, :-1]], axis=0)\n", " R = np.corrcoef(S)[:n_neurons, n_neurons:]\n", " return R" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The helper functions defined above are:\n", "- `sigmoid`: computes sigmoid nonlinearity element-wise on input, from Tutorial 1\n", "- `create_connectivity`: generates nxn causal connectivity matrix., from Tutorial 1\n", "- `simulate_neurons`: simulates a dynamical system for the specified number of neurons and timesteps, from Tutorial 1\n", "- `get_sys_corr`: a wrapper function for correlation calculations between A and R, from Tutorial 2\n", "- `correlation_for_all_neurons`: computes the connectivity matrix for the all neurons using correlations, from Tutorial 2" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 1: Regression: recovering connectivity by model fitting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 1: Regression approach\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 1: Regression approach\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'Av4LaXZdgDo'), ('Bilibili', 'BV1m54y1q78b')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Regression_approach_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "You may be familiar with the idea that correlation only implies causation when there are no hidden *confounders*. This aligns with our intuition that correlation only implies causality when no alternative variables could explain away a correlation.\n", "\n", "**A confounding example**:\n", "Suppose you observe that people who sleep more do better in school. It's a nice correlation. But what else could explain it? Maybe people who sleep more are richer, don't work a second job, and have time to actually do homework. If you want to ask if sleep *causes* better grades, and want to answer that with correlations, you have to control for all possible confounds.\n", "\n", "A confound is any variable that affects both the outcome and your original covariate. In our example, confounds are things that affect both sleep and grades.\n", "\n", "**Controlling for a confound**:\n", "Confonds can be controlled for by adding them as covariates in a regression. But for your coefficients to be causal effects, you need three things:\n", "\n", "1. **All** confounds are included as covariates\n", "2. Your regression assumes the same mathematical form of how covariates relate to outcomes (linear, GLM, etc.)\n", "3. No covariates are caused *by* both the treatment (original variable) and the outcome. These are [colliders](https://en.wikipedia.org/wiki/Collider_(statistics)); we won't introduce it today (but Google it on your own time! Colliders are very counterintuitive.)\n", "\n", "In the real world it is very hard to guarantee these conditions are met. In the brain it's even harder (as we can't measure all neurons). Luckily today we simulated the system ourselves." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 2: Fitting a GLM\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 2: Fitting a GLM\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'GvMj9hRv5Ak'), ('Bilibili', 'BV16p4y1S7yE')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Fitting_a_GLM_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "\n", "\n", "Recall that in our system each neuron effects every other via:\n", "\n", "$$\n", "\\vec{x}_{t+1} = \\sigma(A\\vec{x}_t + \\epsilon_t),\n", "$$\n", "\n", "where $\\sigma$ is our sigmoid nonlinearity from before: $\\sigma(x) = \\frac{1}{1 + e^{-x}}$\n", "\n", "Our system is a closed system, too, so there are no omitted variables. The regression coefficients should be the causal effect. Are they?" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We will use a regression approach to estimate the causal influence of all neurons to neuron #1. Specifically, we will use linear regression to determine the $A$ in:\n", "\n", "\\begin{equation}\n", "\\sigma^{-1}(\\vec{x}_{t+1}) = A\\vec{x}_t + \\epsilon_t ,\n", "\\end{equation}\n", "\n", "where $\\sigma^{-1}$ is the inverse sigmoid transformation, also sometimes referred to as the **logit** transformation: $\\sigma^{-1}(x) = \\log(\\frac{x}{1-x})$.\n", "\n", "Let $W$ be the $\\vec{x}_t$ values, up to the second-to-last timestep $T-1$:\n", "\n", "\\begin{equation}\n", "W =\n", "\\begin{bmatrix}\n", "\\mid & \\mid & ... & \\mid \\\\\n", "\\vec{x}_0 & \\vec{x}_1 & ... & \\vec{x}_{T-1} \\\\\n", "\\mid & \\mid & ... & \\mid\n", "\\end{bmatrix}_{n \\times (T-1)}\n", "\\end{equation}\n", "\n", "Let $Y$ be the $\\vec{x}_{t+1}$ values for a selected neuron, indexed by $i$, starting from the second timestep up to the last timestep $T$:\n", "\n", "\\begin{equation}\n", "Y =\n", "\\begin{bmatrix}\n", "x_{i,1} & x_{i,2} & ... & x_{i, T} \\\\\n", "\\end{bmatrix}_{1 \\times (T-1)}\n", "\\end{equation}\n", "\n", "You will then fit the following model:\n", "\n", "\\begin{equation}\n", "\\sigma^{-1}(Y^T) = W^\\top V\n", "\\end{equation}\n", "\n", "where $V$ is the $n \\times 1$ coefficient matrix of this regression, which will be the estimated connectivity matrix between the selected neuron and the rest of the neurons.\n", "\n", "**Review**: As you learned in Week 1, *lasso* a.k.a. **$L_1$ regularization** causes the coefficients to be sparse, containing mostly zeros. Think about why we want this here." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Coding Exercise 1: Use linear regression plus lasso to estimate causal connectivities\n", "\n", "You will now create a function to fit the above regression model and V. We will then call this function to examine how close the regression vs the correlation is to true causality.\n", "\n", "**Code**:\n", "\n", "You'll notice that we've transposed both $Y$ and $W$ here and in the code we've already provided below. Why is that?\n", "\n", "This is because the machine learning models provided in scikit-learn expect the *rows* of the input data to be the observations, while the *columns* are the variables. We have that inverted in our definitions of $Y$ and $W$, with the timesteps of our system (the observations) as the columns. So we transpose both matrices to make the matrix orientation correct for scikit-learn.\n", "\n", "\n", "- Because of the abstraction provided by scikit-learn, fitting this regression will just be a call to initialize the `Lasso()` estimator and a call to the `fit()` function\n", "- Use the following hyperparameters for the `Lasso` estimator:\n", " - `alpha = 0.01`\n", " - `fit_intercept = False`\n", "- How do we obtain $V$ from the fitted model?\n", "\n", "We will use the helper function `logit`.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# Set parameters\n", "n_neurons = 50 # the size of our system\n", "timesteps = 10000 # the number of timesteps to take\n", "random_state = 42\n", "neuron_idx = 1\n", "\n", "# Set up system and simulate\n", "A = create_connectivity(n_neurons, random_state)\n", "X = simulate_neurons(A, timesteps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to enable helper function `logit`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to enable helper function `logit`\n", "\n", "def logit(x):\n", " \"\"\"\n", " Applies the logit (inverse sigmoid) transformation\n", "\n", " Args:\n", " x (np.ndarray): the numpy data array we want to transform\n", " Returns\n", " (np.ndarray): x with logit nonlinearity applied\n", " \"\"\"\n", " return np.log(x/(1-x))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "both", "execution": {} }, "outputs": [], "source": [ "def get_regression_estimate(X, neuron_idx):\n", " \"\"\"\n", " Estimates the connectivity matrix using lasso regression.\n", "\n", " Args:\n", " X (np.ndarray): our simulated system of shape (n_neurons, timesteps)\n", " neuron_idx (int): a neuron index to compute connectivity for\n", "\n", " Returns:\n", " V (np.ndarray): estimated connectivity matrix of shape (n_neurons, n_neurons).\n", " if neuron_idx is specified, V is of shape (n_neurons,).\n", " \"\"\"\n", " # Extract Y and W as defined above\n", " W = X[:, :-1].transpose()\n", " Y = X[[neuron_idx], 1:].transpose()\n", "\n", " # Apply inverse sigmoid transformation\n", " Y = logit(Y)\n", "\n", " ############################################################################\n", " ## TODO: Insert your code here to fit a regressor with Lasso. Lasso captures\n", " ## our assumption that most connections are precisely 0.\n", " ## Fill in function and remove\n", " raise NotImplementedError(\"Please complete the regression exercise\")\n", " ############################################################################\n", "\n", " # Initialize regression model with no intercept and alpha=0.01\n", " regression = ...\n", "\n", " # Fit regression to the data\n", " regression.fit(...)\n", "\n", " V = regression.coef_\n", "\n", " return V\n", "\n", "\n", "# Estimate causality with regression\n", "V = get_regression_estimate(X, neuron_idx)\n", "\n", "print(f\"Regression: correlation of estimated with true connectivity: {np.corrcoef(A[neuron_idx, :], V)[1, 0]:.3f}\")\n", "print(f\"Lagged correlation of estimated with true connectivity: {get_sys_corr(n_neurons, timesteps, random_state, neuron_idx=neuron_idx):.3f}\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "You should find that:\n", "\n", "```\n", "Regression: correlation of estimated connectivity with true connectivity: 0.865\n", "Lagged correlation of estimated connectivity with true connectivity: 0.703\n", "```" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W3D5_NetworkCausality/solutions/W3D5_Tutorial3_Solution_9134c469.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Linear_regression_with_lasso_to_estimate_causal_connectivities_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "\n", "\n", "We can see from these numbers that multiple regression is better than simple correlation for estimating connectivity." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 2: Partially Observed Systems\n", "\n", "*Estimated timing to here from start of tutorial: 10 min*\n", "\n", "If we are unable to observe the entire system, **omitted variable bias** becomes a problem. If we don't have access to all the neurons, and so therefore can't control them, can we still estimate the causal effect accurately?\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 3: Omitted variable bias\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 3: Omitted variable bias\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', '5CCib6CTMac'), ('Bilibili', 'BV1ov411i7dc')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Omitted_variable_bias_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "**Video correction**: the labels \"connectivity from\"/\"connectivity to\" are swapped in the video but fixed in the figures/demos below" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We first visualize different subsets of the connectivity matrix when we observe 75% of the neurons vs 25%.\n", "\n", "Recall the meaning of entries in our connectivity matrix: $A[i,j] = 1$ means a connectivity **from** neuron $i$ **to** neuron $j$ with strength $1$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to visualize subsets of connectivity matrix\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to visualize subsets of connectivity matrix\n", "\n", "# Run this cell to visualize the subsets of variables we observe\n", "n_neurons = 25\n", "A = create_connectivity(n_neurons)\n", "\n", "fig, axs = plt.subplots(2, 2, figsize=(10, 10))\n", "ratio_observed = [0.75, 0.25] # the proportion of neurons observed in our system\n", "\n", "for i, ratio in enumerate(ratio_observed):\n", " sel_idx = int(n_neurons * ratio)\n", "\n", " offset = np.zeros((n_neurons, n_neurons))\n", " axs[i, 1].title.set_text(f\"{int(ratio * 100)}% neurons observed\")\n", " offset[:sel_idx, :sel_idx] = 1 + A[:sel_idx, :sel_idx]\n", " im = axs[i, 1].imshow(offset, cmap=\"coolwarm\", vmin=0, vmax=A.max() + 1)\n", " axs[i, 1].set_xlabel(\"Connectivity from\")\n", " axs[i, 1].set_ylabel(\"Connectivity to\")\n", " plt.colorbar(im, ax=axs[i, 1], fraction=0.046, pad=0.04)\n", " see_neurons(A, axs[i, 0], ratio)\n", "\n", "plt.suptitle(\"Visualizing subsets of the connectivity matrix\")\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Interactive Demo 3: Regression performance as a function of the number of observed neurons\n", "\n", "We will first change the number of observed neurons in the network and inspect the resulting estimates of connectivity in this interactive demo. How does the estimated connectivity differ?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to get helper functions `get_regression_estimate_full_connectivity` and `get_regression_corr_full_connectivity`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to get helper functions `get_regression_estimate_full_connectivity` and `get_regression_corr_full_connectivity`\n", "\n", "def get_regression_estimate_full_connectivity(X):\n", " \"\"\"\n", " Estimates the connectivity matrix using lasso regression.\n", "\n", " Args:\n", " X (np.ndarray): our simulated system of shape (n_neurons, timesteps)\n", " neuron_idx (int): optionally provide a neuron idx to compute connectivity for\n", " Returns:\n", " V (np.ndarray): estimated connectivity matrix of shape (n_neurons, n_neurons).\n", " if neuron_idx is specified, V is of shape (n_neurons,).\n", " \"\"\"\n", " n_neurons = X.shape[0]\n", "\n", " # Extract Y and W as defined above\n", " W = X[:, :-1].transpose()\n", " Y = X[:, 1:].transpose()\n", "\n", " # apply inverse sigmoid transformation\n", " Y = logit(Y)\n", "\n", " # fit multioutput regression\n", " reg = MultiOutputRegressor(Lasso(fit_intercept=False,\n", " alpha=0.01, max_iter=250),\n", " n_jobs=-1)\n", " reg.fit(W, Y)\n", "\n", " V = np.zeros((n_neurons, n_neurons))\n", " for i, estimator in enumerate(reg.estimators_):\n", " V[i, :] = estimator.coef_\n", "\n", " return V\n", "\n", "\n", "def get_regression_corr_full_connectivity(n_neurons, A, X, observed_ratio,\n", " regression_args):\n", " \"\"\"\n", " A wrapper function for our correlation calculations between A and the V estimated\n", " from regression.\n", "\n", " Args:\n", " n_neurons (int): number of neurons\n", " A (np.ndarray): connectivity matrix\n", " X (np.ndarray): dynamical system\n", " observed_ratio (float): the proportion of n_neurons observed, must be betweem 0 and 1.\n", " regression_args (dict): dictionary of lasso regression arguments and hyperparameters\n", "\n", " Returns:\n", " A single float correlation value representing the similarity between A and R\n", " \"\"\"\n", " assert (observed_ratio > 0) and (observed_ratio <= 1)\n", "\n", " sel_idx = np.clip(int(n_neurons*observed_ratio), 1, n_neurons)\n", "\n", " sel_X = X[:sel_idx, :]\n", " sel_A = A[:sel_idx, :sel_idx]\n", "\n", " sel_V = get_regression_estimate_full_connectivity(sel_X)\n", " return np.corrcoef(sel_A.flatten(), sel_V.flatten())[1, 0], sel_V" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to enable the widgets\n", "\n", "# @markdown Note: The plots will take a few seconds to update after moving the slider.\n", "\n", "n_neurons = 50\n", "A = create_connectivity(n_neurons, random_state=42)\n", "X = simulate_neurons(A, 4000, random_state=42)\n", "\n", "reg_args = {\n", " \"fit_intercept\": False,\n", " \"alpha\": 0.001\n", "}\n", "\n", "@widgets.interact(n_observed=widgets.IntSlider(min=5, max=45, step=5,\n", " continuous_update=False))\n", "def plot_observed(n_observed):\n", " to_neuron = 0\n", " fig, axs = plt.subplots(1, 3, figsize=(15, 5))\n", " sel_idx = n_observed\n", " ratio = (n_observed) / n_neurons\n", " offset = np.zeros((n_neurons, n_neurons))\n", " axs[0].title.set_text(f\"{int(ratio * 100)}% neurons observed\")\n", " offset[:sel_idx, :sel_idx] = 1 + A[:sel_idx, :sel_idx]\n", " im = axs[1].imshow(offset, cmap=\"coolwarm\", vmin=0, vmax=A.max() + 1)\n", " plt.colorbar(im, ax=axs[1], fraction=0.046, pad=0.04)\n", "\n", " see_neurons(A,axs[0], ratio, False)\n", " corr, R = get_regression_corr_full_connectivity(n_neurons, A, X,\n", " ratio, reg_args)\n", " big_R = np.zeros(A.shape)\n", " big_R[:sel_idx, :sel_idx] = 1 + R\n", " im = axs[2].imshow(big_R, cmap=\"coolwarm\", vmin=0, vmax=A.max() + 1)\n", " plt.colorbar(im, ax=axs[2],fraction=0.046, pad=0.04)\n", " c = 'w' if n_observed<(n_neurons-3) else 'k'\n", " axs[2].text(0, n_observed + 3, f\"Correlation: {corr:.2f}\", color=c, size=15)\n", " axs[1].title.set_text(\"True connectivity\")\n", " axs[1].set_xlabel(\"Connectivity from\")\n", " axs[1].set_ylabel(\"Connectivity to\")\n", " axs[2].title.set_text(\"Estimated connectivity\")\n", " axs[2].set_xlabel(\"Connectivity from\")\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Next, we will inspect a plot of the correlation between true and estimated connectivity matrices vs the percent of neurons observed over multiple trials.\n", "What is the relationship that you see between performance and the number of neurons observed?\n", "\n", "**Note:** the cell below will take about 25-30 seconds to run." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Plot correlation vs. subsampling\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Plot correlation vs. subsampling\n", "\n", "# we'll simulate many systems for various ratios of observed neurons\n", "n_neurons = 50\n", "timesteps = 5000\n", "ratio_observed = [1, 0.75, 0.5, .25, .125] # the proportion of neurons observed in our system\n", "n_trials = 3 # run it this many times to get variability in our results\n", "\n", "reg_args = {\n", " \"fit_intercept\": False,\n", " \"alpha\": 0.001\n", "}\n", "\n", "corr_data = np.zeros((n_trials, len(ratio_observed)))\n", "for trial in range(n_trials):\n", "\n", " A = create_connectivity(n_neurons, random_state=trial)\n", " X = simulate_neurons(A, timesteps)\n", " print(f\"simulating trial {trial + 1} of {n_trials}\")\n", "\n", " for j, ratio in enumerate(ratio_observed):\n", " result,_ = get_regression_corr_full_connectivity(n_neurons, A, X,\n", " ratio, reg_args)\n", " corr_data[trial, j] = result\n", "\n", "corr_mean = np.nanmean(corr_data, axis=0)\n", "corr_std = np.nanstd(corr_data, axis=0)\n", "\n", "plt.plot(np.asarray(ratio_observed) * 100, corr_mean)\n", "plt.fill_between(np.asarray(ratio_observed) * 100,\n", " corr_mean - corr_std, corr_mean + corr_std,\n", " alpha=.2)\n", "plt.xlim([100, 10])\n", "plt.xlabel(\"Percent of neurons observed\")\n", "plt.ylabel(\"connectivity matrices correlation\")\n", "plt.title(\"Performance of regression\\nas a function of the number of neurons observed\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Regression_performance_as_a_function_of_the_number_of_observed_neurons_Interactive_Demo\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Summary\n", "\n", "*Estimated timing of tutorial: 20 min*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 4: Summary\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 4: Summary\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'T1uGf1H31wE'), ('Bilibili', 'BV1bh411o73r')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Summary\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "In this tutorial, we learned:\n", "\n", "1. To use regression for estimating causality\n", "2. The problem of omitted variable bias, and how it arises in practice" ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W3D5_Tutorial3", "provenance": [], "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.17" } }, "nbformat": 4, "nbformat_minor": 0 }