# Modeling Steps 1 - 4

## Contents

# Modeling Steps 1 - 4¶

**By Neuromatch Academy**

**Content creators:** Marius ‘t Hart, Megan Peters, Paul Schrater, Gunnar Blohm

**Content reviewers:** Eric DeWitt, Tara van Viegen, Marius Pachitariu

**Production editors:** Ella Batty

**Note:** that this is the same as W1D2 Tutorial 1 - we provide it here as well for ease of access.

# Tutorial objectives¶

Yesterday you gained some understanding of what models can buy us in neuroscience. But how do you build a model? Today, we will try to clarify the process of computational modeling, by thinking through the logic of modeling based on your project ideas.

We assume that you have a general idea of a project in mind, i.e., a preliminary question, and/or phenomenon you would like to understand. You should have started developing a project idea yesterday with this brainstorming demo. Maybe you have a goal in mind. We will now work through the 4 first steps of modeling (Blohm et al., 2019):

**Framing the question**

finding a phenomenon and a question to ask about it

understanding the state of the art

determining the basic ingredients

formulating specific, mathematically defined hypotheses

The remaining steps 5-10 will be covered in a second notebook that you can consult throughout the modeling process when you work on your projects.

**Importantly**, we will guide you through Steps 1-4 today. After you do more work on projects, you likely have to revite some or all of these steps *before* you move on the the remaining steps of modeling.

**Note**: there will be no coding today. It’s important that you think through the different steps of this how-to-model tutorial to maximize your chance of succeeding in your group projects. **Also**: “Models” here can be data analysis pipelines, not just computational models…

**Think! Sections**: All activities you should perform are labeled with **Think!**. These are discussion based exercises and can be found in the Table of Content on the left side of the notebook. Make sure you complete all within a section before moving on!

## Demos¶

We will demo the modeling process to you based on the train illusion. The introductory video will explain the phenomenon to you. Then we will do roleplay to showcase some common pitfalls to you based on a computational modeling project around the train illusion. In addition to the computational model, we will also provide a data neuroscience project example to you so you can appreciate similarities and differences.

Enjoy!

## Video 1: Introduction to tutorial¶

# Setup¶

```
# Imports
import numpy as np
import matplotlib.pyplot as plt
# for random distributions:
from scipy.stats import norm, poisson
# for logistic regression:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
```

## Plotting Functions¶

```
# @title Plotting Functions
def rasterplot(spikes,movement,trial):
[movements, trials, neurons, timepoints] = np.shape(spikes)
trial_spikes = spikes[movement,trial,:,:]
trial_events = [((trial_spikes[x,:] > 0).nonzero()[0]-150)/100 for x in range(neurons)]
plt.figure()
dt=1/100
plt.eventplot(trial_events, linewidths=1);
plt.title('movement: %d - trial: %d'%(movement, trial))
plt.ylabel('neuron')
plt.xlabel('time [s]')
def plotCrossValAccuracies(accuracies):
f, ax = plt.subplots(figsize=(8, 3))
ax.boxplot(accuracies, vert=False, widths=.7)
ax.scatter(accuracies, np.ones(8))
ax.set(
xlabel="Accuracy",
yticks=[],
title=f"Average test accuracy: {accuracies.mean():.2%}"
)
ax.spines["left"].set_visible(False)
```

## Generate Data¶

```
#@title Generate Data
def generateSpikeTrains():
gain = 2
neurons = 50
movements = [0,1,2]
repetitions = 800
np.random.seed(37)
# set up the basic parameters:
dt = 1/100
start, stop = -1.5, 1.5
t = np.arange(start, stop+dt, dt) # a time interval
Velocity_sigma = 0.5 # std dev of the velocity profile
Velocity_Profile = norm.pdf(t,0,Velocity_sigma)/norm.pdf(0,0,Velocity_sigma) # The Gaussian velocity profile, normalized to a peak of 1
# set up the neuron properties:
Gains = np.random.rand(neurons) * gain # random sensitivity between 0 and `gain`
FRs = (np.random.rand(neurons) * 60 ) - 10 # random base firing rate between -10 and 50
# output matrix will have this shape:
target_shape = [len(movements), repetitions, neurons, len(Velocity_Profile)]
# build matrix for spikes, first, they depend on the velocity profile:
Spikes = np.repeat(Velocity_Profile.reshape([1,1,1,len(Velocity_Profile)]),len(movements)*repetitions*neurons,axis=2).reshape(target_shape)
# multiplied by gains:
S_gains = np.repeat(np.repeat(Gains.reshape([1,1,neurons]), len(movements)*repetitions, axis=1).reshape(target_shape[:3]), len(Velocity_Profile)).reshape(target_shape)
Spikes = Spikes * S_gains
# and multiplied by the movement:
S_moves = np.repeat( np.array(movements).reshape([len(movements),1,1,1]), repetitions*neurons*len(Velocity_Profile), axis=3 ).reshape(target_shape)
Spikes = Spikes * S_moves
# on top of a baseline firing rate:
S_FR = np.repeat(np.repeat(FRs.reshape([1,1,neurons]), len(movements)*repetitions, axis=1).reshape(target_shape[:3]), len(Velocity_Profile)).reshape(target_shape)
Spikes = Spikes + S_FR
# can not run the poisson random number generator on input lower than 0:
Spikes = np.where(Spikes < 0, 0, Spikes)
# so far, these were expected firing rates per second, correct for dt:
Spikes = poisson.rvs(Spikes * dt)
return(Spikes)
def subsetPerception(spikes):
movements = [0,1,2]
split = 400
subset = 40
hwin = 3
[num_movements, repetitions, neurons, timepoints] = np.shape(spikes)
decision = np.zeros([num_movements, repetitions])
# ground truth for logistic regression:
y_train = np.repeat([0,1,1],split)
y_test = np.repeat([0,1,1],repetitions-split)
m_train = np.repeat(movements, split)
m_test = np.repeat(movements, split)
# reproduce the time points:
dt = 1/100
start, stop = -1.5, 1.5
t = np.arange(start, stop+dt, dt)
w_idx = list( (abs(t) < (hwin*dt)).nonzero()[0] )
w_0 = min(w_idx)
w_1 = max(w_idx)+1 # python...
# get the total spike counts from stationary and movement trials:
spikes_stat = np.sum( spikes[0,:,:,:], axis=2)
spikes_move = np.sum( spikes[1:,:,:,:], axis=3)
train_spikes_stat = spikes_stat[:split,:]
train_spikes_move = spikes_move[:,:split,:].reshape([-1,neurons])
test_spikes_stat = spikes_stat[split:,:]
test_spikes_move = spikes_move[:,split:,:].reshape([-1,neurons])
# data to use to predict y:
x_train = np.concatenate((train_spikes_stat, train_spikes_move))
x_test = np.concatenate(( test_spikes_stat, test_spikes_move))
# this line creates a logistics regression model object, and immediately fits it:
population_model = LogisticRegression(solver='liblinear', random_state=0).fit(x_train, y_train)
# solver, one of: 'liblinear', 'newton-cg', 'lbfgs', 'sag', and 'saga'
# some of those require certain other options
#print(population_model.coef_) # slope
#print(population_model.intercept_) # intercept
ground_truth = np.array(population_model.predict(x_test))
ground_truth = ground_truth.reshape([3,-1])
output = {}
output['perception'] = ground_truth
output['spikes'] = spikes[:,split:,:subset,:]
return(output)
def getData():
spikes = generateSpikeTrains()
dataset = subsetPerception(spikes=spikes)
return(dataset)
dataset = getData()
perception = dataset['perception']
spikes = dataset['spikes']
```

# Step 1: Finding a phenomenon and a question to ask about it¶

## Video 2: Asking a question¶

## Example projects step 1¶

```
# @title Example projects step 1
from ipywidgets import widgets
from IPython.display import Markdown
markdown1 = '''
## Step 1
<br>
<font size='3pt'>
The train illusion occurs when sitting on a train and viewing another train outside the window. Suddenly, the other train *seems* to move, i.e. you experience visual motion of the other train relative to your train. But which train is actually moving?
Often people have the wrong percept. In particular, they think their own train might be moving when it's the other train that moves; or vice versa. The illusion is usually resolved once you gain vision of the surroundings that lets you disambiguate the relative motion; or if you experience strong vibrations indicating that it is indeed your own train that is in motion.
We asked the following (arbitrary) question for our demo project: "How do noisy vestibular estimates of motion lead to illusory percepts of self motion?"
</font>
'''
markdown2 = '''
## Step 1
<br>
<font size='3pt'>
The train illusion occurs when sitting on a train and viewing another train outside the window. Suddenly, the other train *seems* to move, i.e. you experience visual motion of the other train relative to your train. But which train is actually moving?
Often people mix this up. In particular, they think their own train might be moving when it's the other train that moves; or vice versa. The illusion is usually resolved once you gain vision of the surroundings that lets you disambiguate the relative motion; or if you experience strong vibrations indicating that it is indeed your own train that is in motion.
We assume that we have build the train illusion model (see the other example project colab). That model predicts that accumulated sensory evidence from vestibular signals determines the decision of whether self-motion is experienced or not. We now have vestibular neuron data (simulated in our case, but let's pretend) and would like to see if that prediction holds true.
The data contains *N* neurons and *M* trials for each of 3 motion conditions: no self-motion, slowly accelerating self-motion and faster accelerating self-motion. In our data,
*N* = 40 and *M* = 400.
**So we can ask the following question**: "Does accumulated vestibular neuron activity correlate with self-motion judgements?"
</font>
'''
out2 = widgets.Output()
with out2:
display(Markdown(markdown2))
out1 = widgets.Output()
with out1:
display(Markdown(markdown1))
out = widgets.Tab([out1, out2])
out.set_title(0, 'Computational Model')
out.set_title(1, 'Data Analysis')
display(out)
```

## Think! 1: Asking your own question¶

*Please discuss the following for about 25 min*

You should already have a project idea from your brainstorming yesterday. **Write down the phenomenon, question and goal(s) if you have them.**

As a reminder, here is what you should discuss and write down:

What exact aspect of data needs modeling?

Answer this question clearly and precisely! Otherwise you will get lost (almost guaranteed)

Write everything down!

Also identify aspects of data that you do not want to address (yet)

Define an evaluation method!

How will you know your modeling is good?

E.g. comparison to specific data (quantitative method of comparison?)

For computational models: think of an experiment that could test your model

You essentially want your model to interface with this experiment, i.e. you want to simulate this experiment

You can find interesting questions by looking for phenomena that differ from your expectations. In *what* way does it differ? *How* could that be explained (starting to think about mechanistic questions and structural hypotheses)? *Why* could it be the way it is? What experiment could you design to investigate this phenomenon? What kind of data would you need?

**Make sure to avoid the pitfalls!**

## Click here for a recap on pitfalls

Question is too general

Remember: science advances one small step at the time. Get the small step right…

Precise aspect of phenomenon you want to model is unclear

You will fail to ask a meaningful question

You have already chosen a toolkit

This will prevent you from thinking deeply about the best way to answer your scientific question

You don’t have a clear goal

What do you want to get out of modeling?

You don’t have a potential experiment in mind

This will help concretize your objectives and think through the logic behind your goal

**Note**:

The hardest part is Step 1. Once that is properly set up, all other should be easier. **BUT**: often you think that Step 1 is done only to figure out in later steps (anywhere really) that you were not as clear on your question and goal than you thought. Revisiting Step 1 is frequent necessity. Don’t feel bad about it. You can revisit Step 1 later; for now, let’s move on to the nest step.

# Step 2: Understanding the state of the art & background¶

Here you will do a literature review (**to be done AFTER this tutorial!**).

## Video 3: Literature Review & Background Knowledge¶

## Example projects step 2¶

```
# @title Example projects step 2
from ipywidgets import widgets
from IPython.display import Markdown
markdown1 = '''
## Step 2
<br>
<font size='3pt'>
You have learned all about the vestibular system in the Intro video. This is also where you would do a literature search to learn more about what's known about self-motion perception and vestibular signals. You would also want to examine any attempts to model self-motion, perceptual decision making and vestibular processing.</font>
'''
markdown21 = '''
## Step 2
<br>
<font size='3pt'>
While it seems a well-known fact that vestibular signals are noisy, we should check if we can also find this in the literature.
Let's also see what's in our data, there should be a 4d array called `spikes` that has spike counts (positive integers), a 2d array called `perception` with self-motion judgements (0=no motion or 1=motion). Let's see what this data looks like:
</font><br>
'''
markdown22 = '''
<br>
<font size='3pt'>
In the `spikes` array, we see our 3 acceleration conditions (first dimension), with 400 trials each (second dimensions) and simultaneous recordings from 40 neurons (third dimension), across 3 seconds in 10 ms bins (fourth dimension). The first two dimensions are also there in the `perception` array.
Perfect perception would have looked like [0, 1, 1]. The average judgements are far from correct (lots of self-motion illusions) but they do make some sense: it's closer to 0 in the no-motion condition and closer to 1 in both of the real-motion conditions.
The idea of our project is that the vestibular signals are noisy so that they might be mis-interpreted by the brain. Let's see if we can reproduce the stimuli from the data:
</font>
<br>
'''
markdown23 = '''
<br>
<font size='3pt'>
Blue is the no-motion condition, and produces flat average spike counts across the 3 s time interval. The orange and green line do show a bell-shaped curve that corresponds to the acceleration profile. But there also seems to be considerable noise: exactly what we need. Let's see what the spike trains for a single trial look like:
</font>
<br>
'''
markdown24 = '''
<br>
<font size='3pt'>
You can change the trial number in the bit of code above to compare what the rasterplots look like in different trials. You'll notice that they all look kind of the same: the 3 conditions are very hard (impossible?) to distinguish by eye-balling.
Now that we have seen the data, let's see if we can extract self-motion judgements from the spike counts.
</font>
<br>
'''
display(Markdown(r""))
out2 = widgets.Output()
with out2:
display(Markdown(markdown21))
print(f'The shape of `spikes` is: {np.shape(spikes)}')
print(f'The shape of `perception` is: {np.shape(perception)}')
print(f'The mean of `perception` is: {np.mean(perception, axis=1)}')
display(Markdown(markdown22))
for move_no in range(3):
plt.plot(np.arange(-1.5,1.5+(1/100),(1/100)),np.mean(np.mean(spikes[move_no,:,:,:], axis=0), axis=0), label=['no motion', '$1 m/s^2$', '$2 m/s^2$'][move_no])
plt.xlabel('time [s]');
plt.ylabel('averaged spike counts');
plt.legend()
plt.show()
display(Markdown(markdown23))
for move in range(3):
rasterplot(spikes = spikes, movement = move, trial = 0)
plt.show()
display(Markdown(markdown24))
out1 = widgets.Output()
with out1:
display(Markdown(markdown1))
out = widgets.Tab([out1, out2])
out.set_title(0, 'Computational Model')
out.set_title(1, 'Data Analysis')
display(out)
```

Here you will do a literature review (**to be done AFTER this tutorial!**). For the projects, do not spend too much time on this. A thorough literature review could take weeks or months depending on your prior knowledge of the field…

The important thing for your project here is not to exhaustively survey the literature but rather to learn the process of modeling. 1-2 days of digging into the literature should be enough!

**Here is what you should get out of it**:

Survey the literature

What’s known?

What has already been done?

Previous models as a starting point?

What hypotheses have been emitted in the field?

Are there any alternative / complementary modeling approaches?

What skill sets are required?

Do I need learn something before I can start?

Ensure that no important aspect is missed

Potentially provides specific data sets / alternative modeling approaches for comparison

**Do this AFTER the tutorial**

# Step 3: Determining the basic ingredients¶

## Video 4: Determining basic ingredients¶

## Example projects step 3¶

```
# @title Example projects step 3
from ipywidgets import widgets
from IPython.display import Markdown, Math
markdown1 = r'''
## Step 3
<br>
<font size='3pt'>
We determined that we probably needed the following ingredients for our model:
* Vestibular input: *v(t)*
* Binary decision output: *d* - time dependent?
* Decision threshold: θ
* A filter (maybe running average?): *f*
* An integration mechanism to get from vestibular acceleration to sensed velocity: ∫
</font>
'''
markdown2 = '''
## Step 3
<br>
<font size='3pt'>
In order to address our question we need to design an appropriate computational data analysis pipeline. We did some brainstorming and think that we need to somehow extract the self-motion judgements from the spike counts of our neurons. Based on that, our algorithm needs to make a decision: was there self motion or not? This is a classical 2-choice classification problem. We will have to transform the raw spike data into the right input for the algorithm (spike pre-processing).
So we determined that we probably needed the following ingredients:
* spike trains *S* of 3-second trials (10ms spike bins)
* ground truth movement *m<sub>r</sub>* (real) and perceived movement *m<sub>p</sub>*
* some form of classifier *C* giving us a classification *c*
* spike pre-processing
</font>
'''
# No idea why this is necessary but math doesn't render properly without it
display(Markdown(r""))
out2 = widgets.Output()
with out2:
display(Markdown(markdown2))
out1 = widgets.Output()
with out1:
display(Markdown(markdown1))
out = widgets.Tab([out1, out2])
out.set_title(0, 'Computational Model')
out.set_title(1, 'Data Analysis')
display(out)
```

## Think! 3: Determine your basic ingredients¶

*Please discuss the following for about 25 min*

This will allow you to think deeper about what your modeling project will need. It’s a crucial step before you can formulate hypotheses because you first need to understand what your modeling approach will need. There are 2 aspects you want to think about:

What parameters / variables are needed?]

Constants?

Do they change over space, time, conditions…?

What details can be omitted?

Constraints, initial conditions?

Model inputs / outputs?

Variables needed to describe the process to be modelled?

Brainstorming!

What can be observed / measured? latent variables?

Where do these variables come from?

Do any abstract concepts need to be instantiated as variables?

E.g., value, utility, uncertainty, cost, salience, goals, strategy, plant, dynamics

Instantiate them so that they relate to potential measurements!

This is a step where your prior knowledge and intuition is tested. You want to end up with an inventory of *specific* concepts and/or interactions that need to be instantiated.

**Make sure to avoid the pitfalls!**

## Click here for a recap on pitfalls

I’m experienced, I don’t need to think about ingredients anymore

Or so you think…

I can’t think of any ingredients

Think about the potential experiment. What are your stimuli? What parameters? What would you control? What do you measure?

I have all inputs and outputs

Good! But what will link them? Thinking about that will start shaping your model and hypotheses

I can’t think of any links (= mechanisms)

You will acquire a library of potential mechanisms as you keep modeling and learning

But the literature will often give you hints through hypotheses

If you still can’t think of links, then maybe you’re missing ingredients?

# Step 4: Formulating specific, mathematically defined hypotheses¶

## Video 5: Formulating a hypothesis¶

## Example projects step 4¶

```
# @title Example projects step 4
from ipywidgets import widgets
from IPython.display import Markdown
# Not writing in latex because that didn't render in jupyterbook
markdown1 = r'''
## Step 4
<br>
<font size='3pt'>
Our main hypothesis is that the strength of the illusion has a linear relationship to the amplitude of vestibular noise.
Mathematically, this would write as
<div align="center">
<em>S</em> = <em>k</em> ⋅ <em>N</em>
</div>
where *S* is the illusion strength and *N* is the noise level, and *k* is a free parameter.
>we could simply use the frequency of occurance across repetitions as the "strength of the illusion"
We would get the noise as the standard deviation of *v(t)*, i.e.
<div align="center">
<em>N</em> = <b>E</b>[<em>v(t)</em><sup>2</sup>],
</div>
where **E** stands for the expected value.
Do we need to take the average across time points?
> doesn't really matter because we have the generative process, so we can just use the σ that we define
</font>
'''
markdown2 = '''
## Step 4
<br>
<font size='3pt'>
We think that noise in the signal drives whether or not people perceive self motion. Maybe the brain uses the strongest signal at peak acceleration to decide on self motion, but we actually think it is better to accumulate evidence over some period of time. We want to test this. The noise idea also means that when the signal-to-noise ratio is higher, the brain does better, and this would be in the faster acceleration condition. We want to test this too.
We came up with the following hypotheses focussing on specific details of our overall research question:
* Hyp 1: Accumulated vestibular spike rates explain self-motion judgements better than average spike rates around peak acceleration.
* Hyp 2: Classification performance should be better for faster vs slower self-motion.
> There are many other hypotheses you could come up with, but for simplicity, let's go with those.
Mathematically, we can write our hypotheses as follows (using our above ingredients):
* Hyp 1: **E**(c<sub>accum</sub>) > **E**(c<sub>win</sub>)
* Hyp 2: **E**(c<sub>fast</sub>) > **E**(c<sub>slow</sub>)
Where **E** denotes taking the expected value (in this case the mean) of its argument: classification outcome in a given trial type.
</font>
'''
# No idea why this is necessary but math doesn't render properly without it
display(Markdown(r""))
out2 = widgets.Output()
with out2:
display(Markdown(markdown2))
out1 = widgets.Output()
with out1:
display(Markdown(markdown1))
out = widgets.Tab([out1, out2])
out.set_title(0, 'Computational Model')
out.set_title(1, 'Data Analysis')
display(out)
```

## Think! 4: Formulating your hypothesis¶

*Please discuss the following for about 25 min*

Once you have your question and goal lines up, you have done a literature review (let’s assume for now) and you have thought about ingredients needed for your model, you’re now ready to start thinking about *specific* hypotheses.

Formulating hypotheses really consists of two consecutive steps:

You think about the hypotheses in words by relating ingredients identified in Step 3

What is the model mechanism expected to do?

How are different parameters expected to influence model results?

You then express these hypotheses in mathematical language by giving the ingredients identified in Step 3 specific variable names.

Be explicit, e.g., \(y(t)=f(x(t),k)\) but \(z(t)\) doesn’t influence \(y\)

There are also “structural hypotheses” that make assumptions on what model components you hypothesize will be crucial to capture the phenomenon at hand.

**Important**: Formulating the hypotheses is the last step before starting to model. This step determines the model approach and ingredients. It provides a more detailed description of the question / goal from Step 1. The more precise the hypotheses, the easier the model will be to justify.

**Make sure to avoid the pitfalls!**

## Click here for a recap on pitfalls

I don’t need hypotheses, I will just play around with the model

Hypotheses help determine and specify goals. You can (and should) still play…

My hypotheses don’t match my question (or vice versa)

This is a normal part of the process!

You need to loop back to Step 1 and revisit your question / phenomenon / goals

I can’t write down a math hypothesis

Often that means you lack ingredients and/or clarity on the hypothesis

OR: you have a “structural” hypothesis, i.e., you expect a certain model component to be crucial in explaining the phenomenon / answering the question

# Summary¶

In this tutorial, we worked through some steps of the process of modeling.

We defined a phenomenon and formulated a question (step 1)

We collected information the state-of-the-art about the topic (step 2)

We determined the basic ingredients (step 3), and used these to formulate a specific mathematically defined hypothesis (step 4)

You are now in a position that you could start modeling without getting lost. But remember: you might have to work through steps 1-4 again after doing a literature review and/or if there were other pitfalls you identified along the way (which is totally normal).

# Next steps¶

In a follow-up notebook, we will continue with the steps 5-10 to guide you through the implementation and completion stages of the projects. You can also find this in the Modeling Steps section of the Project Booklet.

# Reading¶

Blohm G, Kording KP, Schrater PR (2020). *A How-to-Model Guide for Neuroscience*. eNeuro, 7(1) ENEURO.0352-19.2019. doi: 10.1523/ENEURO.0352-19.2019

Kording KP, Blohm G, Schrater P, Kay K (2020). *Appreciating the variety of goals in computational neuroscience*. Neurons, Behavior, Data Analysis, and Theory 3(6). url: nbdt.scholasticahq.com/article/16723-appreciating-the-variety-of-goals-in-computational-neuroscience

Schrater PR, Peters MK, Kording KP, Blohm G (2019). *Modeling in Neuroscience as a Decision Process*. OSF pre-print. url: https://osf.io/w56vt/