Neural network inference from its time series: 2013

Friday, December 27, 2013

"Neural processing unit"?

Based on the Izhikevich model? See more at Qualcomm:

Other instances of the NPU idea:

Also featured on NYT recently.

Thursday, December 26, 2013

Based on the work of Gomez-Rodriguez, et. al. (2011). So far, I read up to Section 3.2. First task is to understand the subroutine for identifying the MLE cascade structure from a list of hit times. Ok; putting it into practice.

I'd like to begin by repeating the authors' synthetic experiments. There are two immediate tasks:

Given a ground truth graph $G$ , and cascade parametrization ( $\beta$ and $P_c(u,v)$ ), generate synthetic cascades.
Given cascade $c$ , estimate the maximum likelihood cascade tree.

Beginning the implementation... Python ("Snap.py") or Matlab? I had trouble choosing every time I began a programming assignment in CS 224w... The former has some convenient functions built-in, but I feel that there's some "barrier" between whatever I want to do and the syntactically-correct implementation; on the other hand, Matlab lets me do whatever I want, but forces me to implement everything from first principles.

Decided to proceed with Python, for its dictionary.

At the same time, try out a software for easy graph visualization (instead of drawing in my notebook manually each time):

Monday, December 9, 2013

CS 224w writeup

See the writeup here!

Monday, December 2, 2013

Granger causality from first principles

Rather than using a black box (the GCCA toolbox), I implemented the Granger causality workflow in Matlab from first principles -- e.g. multivariate (vector) autoregressive modeling (VAR model). By working through the logic step-by-step, I have the basic framework that will allow me to make FDR-based directed edge inferences based on sets of time series. However, while the system should be able to make significance-based edge inferences, I have strong reservations about the ability of a VAR model to fundamentally describe the spike train data. Nevertheless, I will continue on with my exploration of G-causality based directed edge inferences.

I implemented both pairwise and conditional Granger causality methods. Here's a quick estimate of the amount of work associated with fitting the pairwise and conditional models:

Null distribution for the G-causal score:

Exponential fit appears to be decent -- I will proceed with the exponential fit for converting the test statistic (G-causal score) to a

$p$ -value.

Actually, on closer examination (larger number of simulations) the null distribution of the Granger score is not exponential:

Instead, I will use an empirical CCDF for the computation of

$p$ -value (which is what I did previously for max correlation).

Tuesday, November 19, 2013

Granger causality analysis -- building a foundation

Previously, I had used a "high level" Granger causality toolbox for the inference of directed edges. While there were some early results, I don't like to leave things as a "black box" without understanding the mechanics. In particular, I had observed that the

$p$ -values for the Granger toolbox seemed to be incorrectly scaled (which causes the FDR method to break down) and I had some serious concerns about the ability of the MVAR model to capture neural processes. So, I decided to implement Granger causality analysis from first principles, which is VAR modeling of time series data.

Getting started with VAR modeling, I found the Mathwork's tutorial to be really helpful.

Friday, November 15, 2013

Scooped!

My term project has been scooped!

Actually, there is quite a bit of work -- especially on PLoS ONE, it seems -- on applying causal measures on neural data (e.g. electrode arrays, etc.).

Lot of recent work also seems to pay a lot of attention to transfer entropy, which I also find very interesting. However, I am leaving the TE as beyond the scope of the CS224w work. I find conceptual similarities between TE and Granger causality.

Thursday, November 14, 2013

Project milestone

Still a lot of pieces missing: Project milestone

Wednesday, November 13, 2013

Undirected edge inference -- temporal filtering

For reference, here is the impulse response of calcium indicators today, taken from the article on GCaMP6:

Time constants are rather high, almost 1 s for the GCaMP6 variants.

I implemented a basic filtering system, where the impulse response is a simple exponential function characterized with a time constant

$\tau$ . Here are some representative traces and the filtered output, as well as how the filtering operation modifies the cross-correlation. (The neuron pair shown below are synaptically connected):

It can already be surmised that the temporal filtering will degrade our ability to detect significant correlations. Some thoughts that arise:

First, it may be the case that the filtering result will depend on the width of the action potential. While the Izhikevich model with the current parameter set produces spikes with reasonable widths, it still may be the case that the widths are dependent on the numerical integration step size.
Secondly, I wonder if there is a simple closed form formula for determining the resulting cross-correlation when the time series has been temporally filtered. (This seems very plausible.) On the other hand, I'm learning about the limits of cross-correlation for the current task... closed-form formulas for other measures will be harder to obtain.

Yes, there should be a simple formula here -- will verify soon.

Thirdly, it appears that we have to "retrain" a $p$ -score classifier following temporal filtering.

I now suspect that the "inferrability" of the undirected network will be sharply degraded by the temporal filter. It suggests one line of investigation: is there a mechanism that may allow us to recover the network (e.g. multiple trials, etc.) despite the temporally degraded signals?

I resynthesized the CDF of cross correlations given temporal filtering of

$\tau = 5$ ms. It can be seen that the distribution is different from that without the temporal filtering:

Undirected edge inference -- some failure modes

I considered the undirected edge inference task for higher edge densities. My "standard" model currently uses

$M=100$ ,

$N=100$ , which is a very sparsely connected (

$1\%$ ) network. I considered how the difficulty of the task varies for

$M=500$ ,

$M=1000$ .

At

$M=500$ , the inference run finds many more edges than the true number. I think that the cross-correlation does not have the means to distinguish between direct and complex descendant relationships, and that it will tend to close triads. It seems to me that Granger causality would better handle dense graphs (since G-causality asks whether

$X_1$ influences

$X_2$ when other

$\left\{X_i\right\}$ are given).

At

$M=1000$ , the Izhikevich model itself "locks up," and all spikes become "phase-locked." It appears that the coupling coefficient

$\alpha$ that I am using is too high to support such a dense network.

These observations suggest that I should consider a case with lower

$\alpha$ , as to explore varying edge densities in the inference.

It also suggests to me to try out bipartite populations of neurons with dominant synaptic connections between the parts. This actually models the kind of experiments that I want to eventually undertake.

Tuesday, November 12, 2013

FDR-based undirected edge inference validation

Right now, I suspect that the G-causality based inference is "too eager" to predict an edge for a given FDR level

$q$ , while the undirected inference results seems to be better calibrated to

$q$ . Here, I'd like to explicitly quantify the FDR calibration for undirected edge inference.

Here is a single instance of

$N=100$ undirected inference, where I shmoo the FDR level q, and measure the actual FDR observed by the inference method. I also plot the standard ROC of the inference:

Note that the FDR is not the same quantity as the FPR. See the Wikipedia page on ROC for details.

Here is the above graph, based on 50 inference runs:

It is interesting that the undirected inference for the choice of parameters that I've picked for the Izhikevich numerical model performs so well. This way, we can see the effects of temporal filtering on the inference performance!

Monday, November 11, 2013

G-causality analysis of neuron traces

A bit more background on G-causality: Scholarpedia (run by Izhikevich!) article on Granger causality.

I am concerned about the assumption that G-causality analysis makes that the time series can be described as a linear (multivariate autoregressive) model. My initial exploration of the G-causality toolbox is that, while the analysis seems to recover directed edges in the neural network for my initial (small) toy models, that the assumption of the linear model may be invalid for my application. In particular, the toolbox complains that my residuals after the model fit is not white (by the Durbin-Watson test), which suggests that the model is not good for the neural traces.

That said, it does seem to "work". Here, the analysis correctly reproduced three directed edges in a set of five neurons:

Actually, the Durbin-Watson test is not implemented in the toolbox. It always throws a warning... Nevertheless, I still want a sanity check of the validity of the MVAR fit on the neural traces... When I "turn on" the true DW test for residual whiteness, the neural data supposedly passes.

But clearly, the problem here is that I'm just using the toolbox superficially without knowing about its workings... Now, to dig in.

First question to consider is how well the multivariate autoregressive (MVAR) model assumed by G-causality describes the data. As it turns out, the description is quite good, as long as one is careful about the sampling rate of the signal:

(I worry about overfitting here...)

Next, we want to know the maximum lag (the "model order") we should use in the G-causality analysis. Based on the delay in the cross correlations, we should use up to 10 ms of lag:

Making inferences of directed edges with the G-causality toolbox:

Some more inference instances:

It is quite interesting that the Granger analysis can infer directed edges! (Q: So far, I have been using exclusively excitatory synapses. Can the method also pick up inhibitory synapses?)

I find that the "FDR level" knob seems to understate the rate of false positives for the G-causality inference. I am not sure how the toolbox calculated

$p$ -values associated with each Granger edge, which might be the cause for this discrepancy.

Sunday, November 10, 2013

FDR-based inference on Izhikevich simulation using cross-correlation as metric

Simulation of Izhikevich neurons (10 and 100 neurons):

In the above simulation, all neurons are independent (no synaptic connections). Here is a distribution of

$C_{ij}^\mathrm{max}$ for uncorrelated neurons (i.e. the "distribution of the test statistic under the null hypothesis"):

Now here is a set of neurons with some synaptic connections between them. (Not easy at all to discern the underlying graphical structure.)

We consider the distribution of

$p$ -values for synaptic pairs in the Izhikevich model as a function of the "synaptic strength" constant

$\alpha$ :

The above shows that a synaptic strength of

$\alpha \approx 10$ is necessary to discern the connectivity structure in the Izhikevich model. Note that

$\alpha \approx 10$ does not yield a totally trivial (i.e. deterministic firing) relationship between two synaptically connected neurons. Here is a trace of two neurons with connectivity of

$\alpha = 10$ that shows that the relationship is not necessarily deterministic:

For that matter,

$\alpha = 20$ doesn't yield deterministic relations either, but there are more (almost) coincident spikes between the two:

(Note: An independent characterization for the "interpretation" of

$\alpha$ could be an empirically derived probability that the post-synaptic neuron fires within some interval of receiving the pre-synaptic spike.)

Next, let's try some inference on a model of

$N=10$ neurons with

$M=10$ synapses. (My eventual target is to run simulations on

$N=100$ or even

$N=10^3$ neurons.) I visualize the inference result as an adjacency matrix, where a filled green rectangle represents a correctly inferred edge, unfilled green rectangle represents an uninferred edge (false negative) and a filled red rectangle represents a falsely inferred edge (false positive). All inference instances are for

$\alpha=10$ .

Also, examples of inference on

$N=100$ :

Next, I want to obtain the ROC curve.

Friday, November 8, 2013

Granger causality ("G-causality")

Basic intuition: "one series can be causal to the other if we can better predict the second time series by incorporating knowledge of the first one."

Trying out a Matlab package for Granger causality on simulated neural traces. I simulated three traces where neuron 1 drives neuron 2, and neuron 3 is isolated. The G-causality inference produced the correct result:

On the other hand, I don't know how well the traces can be modeled by the multivariate autoregressive (AR) model assumed by G-causality. (The toolbox gives a warning that the residuals after the AR model fit is not white.)

Thursday, October 31, 2013

Izhikevich numerical stability: Normalizing the coupling strength

In the Izhikevich numerical model, the inputs to a neuron are represented by the term

$I$ in the dynamical equation for

$v$ . In the example code provided, "purely random inputs" to a neuron (called "thalamic input", given the analogy of cortical neuron modeling) are drawn from a Gaussian random variable at each time step. If the amplitude of the Gaussian random variable is held constant, the number of spikes induced by the random input is highly dependent on the time step of integration.

(I observed this phenomenon as I was trying to use decreasing time steps in integration, to show that the simulated traces would converge to a stationary value.)

I think a reasonable way to handle the unwanted variation in the strength of the random input effect is to pre-synthesize the random input traces at the beginning, and interpolate the result for a specified integration time.

Monday, October 28, 2013

Example electrophysiology trace

Example electrophysiology trace from this paper:

Monday, October 21, 2013

Proposal

See here.

Friday, October 18, 2013

Basic question #5: Can we improve the numerical stability of the Izhikevich model?

Indeed, by decreasing the integration step size from 0.5 ms (Izhikevich default) down to 1/32 ms, the peak of the action potential stabilizes at around ~30 mV, which is the physiologically realistic value:

Only then (with well-behaved action potential amplitudes) is it reasonable to compute the "analog" cross-correlation between two traces:

On the other hand, it occurred to me that I need to think carefully about the scaling of the random input ("thalamic input") as the step size is changed. Presumably, there is a nice theoretic justification for the appropriate scaling. We should also verify that, with correct scaling, the firing rate does not depend (too strongly) on the integration step size.

Thursday, October 17, 2013

Basic question #4: Robustness of the FDR network inference to larger (but unconnected) graphs

The basic motivation for using the FDR method for inference is to automatically distinguish the significant edges in a given network, in a conservative way.

As a test of the FDR procedure, I will revisit the case of

$N=9$ ,

$M=9$ . (It now occurred to me that I was not careful about checking for repeated edges in the underlying directed edge construction. Furthermore, it is possible to get

$n_i \to n_j$ and

$n_j \to n_i$ , in which case, the "true" number of inferrable undirected edges will be less than the nominal

$M$ .)

Wednesday, October 16, 2013

Basic question #3: Simulation of neural signals

Up to this point, my experiments have involved randomly generated noise. However, the intent of this work is to look for network connectivity in recordings of neuron populations. More specifically, I wish to develop a "null model" based on neuronal simulations.

Based on a brief literature review, it appears that Izhikevich's numerical code for neuron simulation has nonzero traction. (For instance, Kramer's simulated neural data is based on the Izhikevich model). More information about the Izhikevich model can be obtained here. Izhikevich seems to be an interesting guy, and I'm eager to find out more about his work.

On the other hand, in the longer run, it's probably good to test the network inference framework on multiple simulation models, so that we are not misled by idiosyncrasies of a particular numerical model.

A first stab at generating a few Izhikevich neurons:

Now, to get at the model in a bit more detail, I would like to see the traces in a more controlled setting: evolution of two neurons where neuron 1 has an excitatory synapse on neuron 2. Here's a "first pass" simulation of two neurons with a specific nonzero coupling between them. I also plot the sample cross-correlation between the two temporal traces:

Here are some thoughts on Izhikevich's numerical model:

The paper that describes the model is pretty sparse on "biological intuition."
It seems to me that the >100 mV amplitude spiking that I see in my "local implementation" (which is basically copied and pasted from the Izhikevich paper) is a numerical glitch. Two observations: (1) I find that if I step the numerical model at timesteps other than 0.5 ms (actually, using finer steps), the numerical trajectories become divergent; (2) The bottom panel of Fig. 3 in the paper shows a pretty regular sized amplitude train, which appears to be on the order of <100 mV based on the noise characteristics.

I think getting a handle on the numerical simulation will be important. I suspect that the pulse amplitudes -- which suffer from numeric noise (as I claim above) -- is throwing off the cross-correlation metric.

Another possibility is to use the "digital" firing traces (i.e. did a cell fire at a particular time or not) instead of the "analog" (voltage) time traces.

Next, my observations on how cell-to-cell connectivity is implemented.

Numerical implementation of synapses in the Izhikevich model:

In the Izhikevich numerical model, each neuron has two state variables

$(u, v)$ that proceed according to the following differential equation:

$v' = 0.04 v^2 + 5v + 140 -u + I$
$u' = a\cdot (bv - u)$

where

$v$ represents the membrane potential of that neuron. In the DE for

$v'$ , the term

$I$ is a forcing term that represents the inputs to that neuron. In particular, in Izhikevich's example code the forcing term consists of two terms: a random term (denoted "thalamic input", so presumably we're modeling cortical neurons), and another term that derives from the connection matrix

$S_{ij}$ .

How to scale the connection strength $S_{ij}$ ?

In the numerical model, we can set any arbitrary value for

$S_{ij}$ . Clearly, our ability to distinguish the cross-correlation between two neurons will be aided by choosing a large value of the connection strength. However, it is important to use values for

$S_{ij}$ that is biologically relevant.

Unfortunately, the fact that the differential equation is given in "dimensionless units" (even though v is supposed to represent voltage at mV scale and time is ms scale) and the lack of biological insights in the paper makes it hard to determine the reasonable range of

$S_{ij}$ values.

Instead, I propose to perform the following experiment. I will hook up a two neuron system where neuron 2 (N2) drives neuron 1 (N1) through a coupling term

$S_{1,2}=\alpha$ . I will remove the "thalamic" input from N1, so that N1 is driven exclusively by N2 (otherwise, it is difficult to distinguish the effect of N2). I will characterize

$\alpha$ by the amplitude of the excitatory post-synaptic potential (EPSP).

Based on this simulation, I find that the (positive) magnitude of

$alpha$ is roughly the amplitude of the EPSP. According to Wikipedia, the amplitude of PSPs can be "as low as 0.4 mV to as high as 20 mV". With

$S_{1,2}=20$ , action potentials from N2 is able to drive action potentials in N1. So, it appears that a range of

$S_{ij} \leq 20$ should be biologically feasible.

Tuesday, October 15, 2013

Basic question #2: Simple instance of FDR inference

The intent in this post is to basically replicate (and understand) the simple "pink noise data" simulation experiment described in Kramer2009. Some remarks:

Kramer's independent noise signals are colored (with $1/f^{0.33}$ spectrum), whereas I will do my initial experiment with white noise.
Kramer's experiment uses a coupling strength of $\alpha=0.4$ . That is, if node 1 connects to node 2, the signals are $x_1[t] = w_1[t]$ and $x_2[t] = w_2[t] + 0.4 w_1[t]$ where $w_i$ are independent noise signals. So, the optimal lag in the cross-correlation is expected to be zero.

I followed the FDR procedure as described in Kramer2009. As with the "pink noise data" experiment, I used a network of

$N=9$ nodes and

$M=9$ edges. For each inference, some parameters of interest are:

Number of false inferred edges (this is the quantity that the FDR method intends to control -- in expectation);
Number of inferred edges (relative to the true edge count $M$ ).

Here is a single inference run, showing the ordered

$p$ -values, using the "extremum" method (e.g. Fig. 2(c)) with "FDR level" of

$q=0.1$ :

Next, I ran

$N_{inf}=10^4$ instances of the inference problem, to see the general distribution of inference results.

Here is the distribution of num_inferred_edges:

Recall that the true number of edges is

$M=9$ . So, it appears that the current FDR procedure tends to under-report the expected number of edges (the mode appears to be

$7$ ). This is consistent with the FDR process being somewhat "conservative" in its estimate of edges.

It is important to keep in mind that there will be variations in the number of inferred edges. The time series data does not unambiguously reveal the presence or absence of a coupling.

Here is the distribution of num_false_edges = num_inferred_edges - num_correct:

Based on appearance, I attempted to model the distribution as a geometric distribution -- which is apparently not quite right.

Actually, the more "appropriate" horizontal scale for the above distribution is the proportion of false positives, since the FDR level

$q$ is "an upper bound on the expected proportion of false positives among all declared edges in [the] inferred network" [Kramer2009]. Here is that distribution:

I find a mean false edge proportion of

$0.07$ in my

$N_{inf}=10^4$ dataset, which is below the

$q=0.1$ bound as advertised.

One more thing. We might look at the variation in false detection rate conditioned on the number of inferred edges.

Initial agenda: the warm-up

Here are some warm-up tasks:

Run through a single complete instance of Kramer's FDR (false discovery rate) mechanism for a small random network with correlated noise. (Basically, replicate Section IV.A of Kramer2009.)

Consider the null distribution of $C_{ij}$ for colored noise, and as a function of the signal length $n$ .
Also, examine the distribution of $s_{ij} = \mathrm{max}_\tau|C_{ij}(\tau)|$ whose CDF $P[z]$ is given in Eq. 4 of Kramer2009.

Run through many instances of the inference on many small random networks, to verify if the expected number of spurious edges is consistent on multiple instances of the inference task (where ground truth is known).

I may want to explore the parameter space a bit, considering the accuracy of the inference as a function of the coupling strength $\alpha$ , as well as the general validity of FDR.

Acquaint myself with Izhikevich's synthetic neuron simulation routine, for the generation of simulated neural data. Note that Izhikevich's script is referred to in a few computational papers on network inference.
Acquaint myself with the "frequency domain bootstrap" method in Kramer2009, which will be relevant for experimental cases where one cannot simply analytically estimate or simulate a distribution.

I am also interested in coupling measures that are not the sample cross-correlation (many are listed in Kramer2009). In particular, I am interested in asymmetric measures that can yield directed dependencies. On this thought, a concept that is repeatedly mentioned in the literature is "Granger causality."

Basic question #1: Distribution of sample cross-correlations

To get started, I will examine some basic claims found in Kramer, et. al.'s "Network inference with confidence from multivariate time series." The paper proposes a computational methodology which, according to the authors, will allow one to reconstruct the connectivity network with confidence (i.e. an estimate of the correctness of the inferred graph, as quantified by the "false discovery rate"). This approach is clearly an improvement over ad hoc threshold-based methods, and greatly appeals to me.

Kramer defines the sample cross correlation (Eq. 1) as:

$C_{ij}[\tau] = \frac{1}{\hat{\sigma_i}\hat{\sigma_j}(n-|l|)} \sum_{t=1}^{n-\tau} (x_i[t]-\bar{x}_i)(x_j[t+\tau]-\bar{x}_j)$
Actually, Kramer's normalization of

$(n-2l)$ doesn't make sense to me for two reasons: (1) there's a spurious factor of

$2$ which does not capture the number of overlap terms in the case of finite signals, and (2)

$l$ should have the absolute value applied. I find that my corrected formula is consistent with Eq. 2 in Kramer2009, whereas the original definition is not.

Kramer also cites the following (Bartlett's) estimate for the variance of

$C_{ij}[l]$ when signals

$x_i$ and

$x_j$ are uncoupled:

$var(C_{ij}[l]) = \frac{1}{n-|l|}\sum_{\tau=-n}^n C_{ii}[\tau]C_{jj}[\tau]$ .
(Again, I added the absolute value.)

The corresponding mean is clearly

$E\left[C_{ij}[l]\right]=0$ .

For this inaugural post, I would like to verify the Bartlett estimate. I begin with two uncorrelated white Gaussian noise

$x_1[t]$ and

$x_2[t]$ .

Here is the simulated distribution of

$C_{ij}[l]$ at a few values of

$l$ , and a normal distribution whose variance is given by the Bartlett estimate. Not bad at all!:

Next, Kramer asks us to consider the Fisher transformation of the

$C_{ij}$ , as follows:

$C_{ij}^F[\tau] = \frac{1}{2}\log\frac{1+C_{ij}[\tau]}{1-C_{ij}[\tau]}$ .

Oh, this bit is trivial. The Fisher transform maps

$[-1, 1] \to [-\infty, \infty]$ , so

$C_{ij}^F$ is better described by the normal distribution than

$C_{ij}$ . I checked the correspondence of the above experiment, when the

$C_{ij}$ values underwent a Fisher transform. The agreement with the Bartlett estimated distribution is still good (the transform does little to change the values of

$C_{ij}$ above).

Next, let me consider the distribution in

$C_{ij}$ in the case of an actual correlation between

$x_1$ and

$x_2$ . I will follow Kramer's example and define the two signals as follows:

$x_1 = w_1$ and

$x_1 = w_2 + \alpha w_1$ (

$\alpha = 0.4$ ) where

$w_i$ are independent WGN instances. As I understand it, it is not required to estimate the distribution of

$C_{ij}$ under the alternate hypothesis ("H1: Coupling") for Kramer's framework, but I want to see the distribution since I have the scripts already written:

Interestingly, it does not appear to be the case that the distribution is simply the normal distribution with a mean at the coupling value

$\alpha$ and the Bartlett estimate of the variance. The distribution of

$C_{ij}$ (left panel in above figure) is definitely not well described by this candidate distribution; the distribution of

$C_{ij}^F$ fares better (right panel) but there is still a systematic offset.

[2013 10 16]: Actually, I wonder if the mean of the Bartlett estimate normal needs to be inverse Fisher-transformed...

Note that the correlated

$C_{ij}$ at

$\alpha=0.4$ would all be extremely improvable (very low

$p$ -value) under the null distribution. So, correlation at

$\alpha=0.4$ is highly detectable whereas

$\alpha \approx 0.1$ would be harder to distinguish, as judging from the null hypothesis distribution.