Tutorial 1: Framing the Question#
Week 2, Day 1: Modeling Practice
By Neuromatch Academy
Content creators: Marius ‘t Hart, Megan Peters, Paul Schrater, Gunnar Blohm, Konrad Kording, Marius Pachitariu
Content reviewers: Eric DeWitt, Tara van Viegen, Marius Pachitariu
Production editors: Ella Batty, Spiros Chavlis
Tutorial objectives#
Last week you gained some understanding of what models can buy us in neuroscience. But how do you build a model? Today, we will try to clarify the process of computational modeling, by thinking through the logic of modeling based on your project ideas. We will now work through the 4 first steps of modeling (Blohm et al., 2019):
Framing the question
finding a phenomenon and a question to ask about it
understanding the state of the art
determining the basic ingredients
formulating specific, mathematically defined hypotheses
Demos#
We will demo the modeling process to you based on the train illusion. In parallel to the computational model, you will develop questions and hypotheses for a data analysis project based on the same phenomenon. This will be done during “Think!” sections.
Enjoy!
Setup#
Install and import feedback gadget#
Show code cell source
# @title Install and import feedback gadget
!pip3 install vibecheck datatops --quiet
from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
return DatatopsContentReviewContainer(
"", # No text prompt
notebook_section,
{
"url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
"name": "neuromatch_cn",
"user_key": "y1x3mpx5",
},
).render()
feedback_prefix = "W2D1_T1"
# Imports
import numpy as np
import matplotlib.pyplot as plt
# for random distributions:
from scipy.stats import norm, poisson
# for logistic regression:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
Figure settings#
Show code cell source
# @title Figure settings
import logging
logging.getLogger('matplotlib.font_manager').disabled = True
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.style.use("https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle")
Plotting Functions#
Show code cell source
# @title Plotting Functions
def rasterplot(spikes,movement,trial):
[movements, trials, neurons, timepoints] = np.shape(spikes)
trial_spikes = spikes[movement,trial, :, :]
trial_events = [((trial_spikes[x,:] > 0).nonzero()[0]-150)/100 for x in range(neurons)]
plt.figure()
dt=1/100
plt.eventplot(trial_events, linewidths=1);
plt.title('movement: %d - trial: %d'%(movement, trial))
plt.ylabel('neuron')
plt.xlabel('time [s]')
plt.show()
def plotCrossValAccuracies(accuracies):
f, ax = plt.subplots(figsize=(8, 3))
ax.boxplot(accuracies, vert=False, widths=.7)
ax.scatter(accuracies, np.ones(8))
ax.set(xlabel="Accuracy", yticks=[],
title=f"Average test accuracy: {accuracies.mean():.2%}")
ax.spines["left"].set_visible(False)
plt.show()
Generate Data#
Show code cell source
# @title Generate Data
def generateSpikeTrains():
gain = 2
neurons = 50
movements = [0, 1, 2]
repetitions = 800
np.random.seed(37)
# set up the basic parameters:
dt = 1/100
start, stop = -1.5, 1.5
t = np.arange(start, stop+dt, dt) # a time interval
Velocity_sigma = 0.5 # std dev of the velocity profile
Velocity_Profile = norm.pdf(t, 0, Velocity_sigma)/norm.pdf(0, 0, Velocity_sigma) # The Gaussian velocity profile, normalized to a peak of 1
# set up the neuron properties:
Gains = np.random.rand(neurons) * gain # random sensitivity between 0 and `gain`
FRs = (np.random.rand(neurons) * 60 ) - 10 # random base firing rate between -10 and 50
# output matrix will have this shape:
target_shape = [len(movements), repetitions, neurons, len(Velocity_Profile)]
# build matrix for spikes, first, they depend on the velocity profile:
Spikes = np.repeat(Velocity_Profile.reshape([1, 1, 1, len(Velocity_Profile)]),
len(movements)*repetitions*neurons,axis=2).reshape(target_shape)
# multiplied by gains:
S_gains = np.repeat(np.repeat(Gains.reshape([1, 1, neurons]), len(movements)*repetitions, axis=1).reshape(target_shape[:3]),
len(Velocity_Profile)).reshape(target_shape)
Spikes = Spikes * S_gains
# and multiplied by the movement:
S_moves = np.repeat( np.array(movements).reshape([len(movements), 1, 1, 1]),
repetitions*neurons*len(Velocity_Profile), axis=3 ).reshape(target_shape)
Spikes = Spikes * S_moves
# on top of a baseline firing rate:
S_FR = np.repeat(np.repeat(FRs.reshape([1, 1, neurons]),
len(movements)*repetitions, axis=1).reshape(target_shape[:3]),
len(Velocity_Profile)).reshape(target_shape)
Spikes = Spikes + S_FR
# can not run the poisson random number generator on input lower than 0:
Spikes = np.where(Spikes < 0, 0, Spikes)
# so far, these were expected firing rates per second, correct for dt:
Spikes = poisson.rvs(Spikes * dt)
return(Spikes)
def subsetPerception(spikes):
movements = [0, 1, 2]
split = 400
subset = 40
hwin = 3
[num_movements, repetitions, neurons, timepoints] = np.shape(spikes)
decision = np.zeros([num_movements, repetitions])
# ground truth for logistic regression:
y_train = np.repeat([0, 1, 1],split)
y_test = np.repeat([0, 1, 1],repetitions-split)
m_train = np.repeat(movements, split)
m_test = np.repeat(movements, split)
# reproduce the time points:
dt = 1/100
start, stop = -1.5, 1.5
t = np.arange(start, stop+dt, dt)
w_idx = list( (abs(t) < (hwin*dt)).nonzero()[0] )
w_0 = min(w_idx)
w_1 = max(w_idx)+1 # python...
# get the total spike counts from stationary and movement trials:
spikes_stat = np.sum( spikes[0, :, :, :], axis=2)
spikes_move = np.sum( spikes[1:, :, :, :], axis=3)
train_spikes_stat = spikes_stat[:split, :]
train_spikes_move = spikes_move[:, :split, :].reshape([-1, neurons])
test_spikes_stat = spikes_stat[split:, :]
test_spikes_move = spikes_move[:, split:, :].reshape([-1, neurons])
# data to use to predict y:
x_train = np.concatenate((train_spikes_stat, train_spikes_move))
x_test = np.concatenate((test_spikes_stat, test_spikes_move))
# this line creates a logistics regression model object, and immediately fits it:
population_model = LogisticRegression(solver='liblinear', random_state=0).fit(x_train, y_train)
# solver, one of: 'liblinear', 'newton-cg', 'lbfgs', 'sag', and 'saga'
# some of those require certain other options
#print(population_model.coef_) # slope
#print(population_model.intercept_) # intercept
ground_truth = np.array(population_model.predict(x_test))
ground_truth = ground_truth.reshape([3, -1])
output = {}
output['perception'] = ground_truth
output['spikes'] = spikes[:, split:, :subset, :]
return(output)
def getData():
spikes = generateSpikeTrains()
dataset = subsetPerception(spikes=spikes)
return(dataset)
dataset = getData()
perception = dataset['perception']
spikes = dataset['spikes']
Step 0: Introduction#
Video 1: Introduction to tutorial#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Introduction_to_tutorial_Video")
Step 1: Finding a phenomenon and a question to ask about it#
Video 2: Asking a question#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Asking_a_question_Video")
Example projects step 1#
Show code cell source
# @title Example projects step 1
from ipywidgets import widgets
from IPython.display import Markdown
markdown1 = '''
## Think! 1: Asking a question
<br>
<font size='3pt'>
The train illusion occurs when sitting on a train and viewing another train outside the window. Suddenly, the other train *seems* to move, i.e. you experience visual motion of the other train relative to your train. But which train is actually moving?
Often people mix this up. In particular, they think their own train might be moving when it's the other train that moves; or vice versa. The illusion is usually resolved once you gain vision of the surroundings that lets you disambiguate the relative motion; or if you experience strong vibrations indicating that it is indeed your own train that is in motion.
We assume that we have buily the train illusion model (see the other example project colab). That model predicts that accumulated sensory evidence from vestibular signals determines the decision of whether self-motion is experienced or not. We now have vestibular neuron data (simulated in our case, but let's pretend) and would like to see if that prediction holds true.
The data contains *N* neurons and *M* trials for each of 3 motion conditions: no self-motion, slowly accelerating self-motion and faster accelerating self-motion. In our data,
*N* = 40 and *M* = 400.
**Discuss between yourselves and come up with a question for this data analysis project.**
</font>
'''
#out2 = widgets.Output()
#with out2:
# display(Markdown(markdown2))
out1 = widgets.Output()
with out1:
display(Markdown(markdown1))
out = widgets.Tab([out1]) #, out2])
out.set_title(0, 'Data analysis')
#out.set_title(1, 'Data Analysis')
display(out)
As a reminder, here is what you should discuss and write down:
What exact aspect of this neural with N neurons and M trials data needs modeling?
Define an evaluation method!
How will you know your modeling is good?
E.g., comparison to specific data (quantitative method of comparison?)
Think of an experiment that could test your model.
Make sure to avoid the pitfalls!
Click here for a recap on pitfalls
Question is too general
- Remember: science advances one small step at the time. Get the small step right…
Precise aspect of phenomenon you want to model is unclear
- You will fail to ask a meaningful question
You have already chosen a toolkit
- This will prevent you from thinking deeply about the best way to answer your scientific question
You don’t have a clear goal
- What do you want to get out of modeling?
You don’t have a potential experiment in mind
- This will help concretize your objectives and think through the logic behind your goal
Note
The hardest part is Step 1. Once that is properly set up, all the others should be easier. BUT: often you think that Step 1 is done only to figure out in later steps (anywhere really) that you were not as clear on your question and goal than you thought. Revisiting Step 1 is a frequent necessity. Don’t feel bad about it. You can revisit Step 1 later; for now, let’s move onto the next step.
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Asking_your_own_question_Discussion")
Step 2: Understanding the state of the art & background#
Here you will learn how to do a literature review (to be done for your project AFTER this tutorial!).
Video 3: Literature Review & Background Knowledge#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Literature_Review_and_Background_Knowledge_Video")
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Literature_review_Discussion")
Step 3: Determining the basic ingredients#
Video 4: Determining basic ingredients#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Submit_your_feedback_Video")
Example projects step 3#
Show code cell source
# @title Example projects step 3
from ipywidgets import widgets
from IPython.display import Markdown, Math
markdown1 = r'''
## Think! 3: Determine your basic ingredients
<br>
<font size='3pt'>
The idea of our project is that the vestibular signals are noisy so that they might be mis-interpreted by the brain. We did some brainstorming and think that we need to somehow extract the self-motion judgements from the spike counts of our neurons. Based on that, our algorithm needs to make a decision: was there self motion or not? This is a classical 2-choice classification problem.
We will have to transform the raw spike data into the right input for the algorithm (spike pre-processing).
In order to address our question we need to design an appropriate computational data analysis pipeline.
**What ingredients do we need? What variables from the data should we extract, and what methods should we apply to them?**
*Discuss for about 10 minutes*
</font>
'''
# No idea why this is necessary but math doesn't render properly without it
display(Markdown(r""))
#out2 = widgets.Output()
#with out2:
# display(Markdown(markdown2))
out1 = widgets.Output()
with out1:
display(Markdown(markdown1))
out = widgets.Tab([out1])
out.set_title(0, 'Data analysis project')
#out.set_title(1, 'Data Analysis')
display(out)
<IPython.core.display.Markdown object>
Make sure to avoid the pitfalls!
Click here for a recap on pitfalls
I’m experienced, I don’t need to think about ingredients anymore
- Or so you think…
I can’t think of any ingredients
- Think about the potential experiment. What are your stimuli? What parameters? What would you control? What do you measure?
I have all inputs and outputs
- Good! But what will link them? Thinking about that will start shaping your model and hypotheses
I can’t think of any links (= mechanisms)
- You will acquire a library of potential mechanisms as you keep modeling and learning
- But the literature will often give you hints through hypotheses
- If you still can't think of links, then maybe you're missing ingredients?
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Determine_your_basic_ingredients_Discussion")
Step 4: Formulating specific, mathematically defined hypotheses#
Video 5: Formulating a hypothesis#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Formulating_your_hypothesis_Video")
Example projects step 4#
Show code cell source
# @title Example projects step 4
from ipywidgets import widgets
from IPython.display import Markdown
# Not writing in latex because that didn't render in jupyterbook
markdown1 = r'''
## Think! 4: Formulating your hypothesis
<br>
<font size='3pt'>
We think that noise in the signal drives whether or not people perceive self motion. Maybe the brain uses the strongest signal at peak acceleration to decide on self motion, but we actually think it is better to accumulate evidence over some period of time. We want to test this. The noise idea also means that when the signal-to-noise ratio is higher, the brain does better, and this would be in the faster acceleration condition. We want to test this too.
**Come up with hypotheses focussing on specific details of our overall research question.**
Can you write down your hypotheses mathematically, using for example the ingredients and variables from the previous step?
*Work on this for 15 minutes.*
</font>
'''
# No idea why this is necessary but math doesn't render properly without it
display(Markdown(r""))
out1 = widgets.Output()
with out1:
display(Markdown(markdown1))
out = widgets.Tab([out1])
out.set_title(0, 'Data Analysis project')
display(out)
<IPython.core.display.Markdown object>
Make sure to avoid the pitfalls!
Click here for a recap on pitfalls
I don’t need hypotheses, I will just play around with the model
- Hypotheses help determine and specify goals. You can (and should) still play…
My hypotheses don’t match my question (or vice versa)
- This is a normal part of the process!
- You need to loop back to Step 1 and revisit your question / phenomenon / goals
I can’t write down a math hypothesis
- Often that means you lack ingredients and/or clarity on the hypothesis
- OR: you have a “structural” hypothesis, i.e. you expect a certain model component to be crucial in explaining the phenomenon / answering the question
If you want to learn more about steps 5-10 of modelling from (Blohm et al., 2019), you can later read the paper, or check out the optional notebook available here. However, at the Neuromatch Academy, we would like you to use a new app built specifically to help with projects, the NMA project planner.
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Formulating_your_hypothesis_Discussion")
NMA project planner#
For the rest of the project time today and until the end of the summer school, you will use an online tool (here) that will help you organize the questions, hypotheses, datasets, methods and other ingredients for the projects. The tool is built on a large language model that takes your answers and compares them to the requirements of the section, then gives you feedback about how well the answers match the requirements and what you can do to improve your score. The fields you complete are persistent on the computer where you complete them, and they can be easily saved/loaded and shared between the group. Start filling out this tool together (one person shares their screen), and then share the saved file with everyone in the group.
Note that for NMA we do not consider it important what order you complete the project planner in. For some projects, it might make sense to start with the dataset, since you need to understand this well before you can formulate questions about it. For other types of projects, you might want to start with the question, for example if you plan to make your own model of a phenomenon, like in the train illusion. The only thing that matters is that at the end of the summer school, all sections have been filled, and you have good scores on all sections (7 or above, we won’t check, don’t worry).
Watch Konrad tell you more about the app he has built in the video below!
Video 6: The NMA project planner#
Summary#
In this tutorial, we worked through some steps of the process of modeling.
We defined a phenomenon and formulated a question (step 1)
We collected information the state-of-the-art about the topic (step 2)
We determined the basic ingredients (step 3), and used these to formulate a specific mathematically defined hypothesis (step 4)
We further introduced the NMA project planner, which will guide you throughout the entire project development over the next two weeks. For the rest of today, you will use the project planner and the additional guidance in the project guide to make progress. Any progress is good progress, on any of the sections in the project planner. Different projects may progress through the sections in a different order, but what matters is that at the end all the sections are filled out. Find the order that works best for you and your team, and revisit each section as often as needed to make a solid overall project plan.
Reading#
Blohm G, Kording KP, Schrater PR (2020). A How-to-Model Guide for Neuroscience. eNeuro, 7(1) ENEURO.0352-19.2019. doi: 10.1523/ENEURO.0352-19.2019
Kording KP, Blohm G, Schrater P, Kay K (2020). Appreciating the variety of goals in computational neuroscience. Neurons, Behavior, Data Analysis, and Theory 3(6). arXiv: arxiv:2002.03211v1, url: https://nbdt.scholasticahq.com/article/16723-appreciating-the-variety-of-goals-in-computational-neuroscience
Schrater PR, Peters MK, Kording KP, Blohm G (2019). Modeling in Neuroscience as a Decision Process. OSF pre-print. url: https://osf.io/w56vt/