Tuesday, October 15, 2013

Basic question #1: Distribution of sample cross-correlations

To get started, I will examine some basic claims found in Kramer, et. al.'s "Network inference with confidence from multivariate time series." The paper proposes a computational methodology which, according to the authors, will allow one to reconstruct the connectivity network with confidence (i.e. an estimate of the correctness of the inferred graph, as quantified by the "false discovery rate"). This approach is clearly an improvement over ad hoc threshold-based methods, and greatly appeals to me.

Kramer defines the sample cross correlation (Eq. 1) as:
$$C_{ij}[\tau] = \frac{1}{\hat{\sigma_i}\hat{\sigma_j}(n-|l|)} \sum_{t=1}^{n-\tau} (x_i[t]-\bar{x}_i)(x_j[t+\tau]-\bar{x}_j)$$
Actually, Kramer's normalization of $(n-2l)$ doesn't make sense to me for two reasons: (1) there's a spurious factor of $2$ which does not capture the number of overlap terms in the case of finite signals, and (2) $l$ should have the absolute value applied. I find that my corrected formula is consistent with Eq. 2 in Kramer2009, whereas the original definition is not.

Kramer also cites the following (Bartlett's) estimate for the variance of $C_{ij}[l]$ when signals $x_i$ and $x_j$ are uncoupled:
$$var(C_{ij}[l]) = \frac{1}{n-|l|}\sum_{\tau=-n}^n C_{ii}[\tau]C_{jj}[\tau]$$.
(Again, I added the absolute value.)

The corresponding mean is clearly $E\left[C_{ij}[l]\right]=0$.

For this inaugural post, I would like to verify the Bartlett estimate. I begin with two uncorrelated white Gaussian noise $x_1[t]$ and $x_2[t]$.

Here is the simulated distribution of $C_{ij}[l]$ at a few values of $l$, and a normal distribution whose variance is given by the Bartlett estimate. Not bad at all!:

Next, Kramer asks us to consider the Fisher transformation of the $C_{ij}$, as follows:
$$C_{ij}^F[\tau] = \frac{1}{2}\log\frac{1+C_{ij}[\tau]}{1-C_{ij}[\tau]}$$.

Oh, this bit is trivial. The Fisher transform maps $[-1, 1] \to [-\infty, \infty]$, so $C_{ij}^F$ is better described by the normal distribution than $C_{ij}$. I checked the correspondence of the above experiment, when the $C_{ij}$ values underwent a Fisher transform. The agreement with the Bartlett estimated distribution is still good (the transform does little to change the values of $C_{ij}$ above).

Next, let me consider the distribution in $C_{ij}$ in the case of an actual correlation between $x_1$ and $x_2$. I will follow Kramer's example and define the two signals as follows: $x_1 = w_1$ and $x_1 = w_2 + \alpha w_1$ ($\alpha = 0.4$) where $w_i$ are independent WGN instances. As I understand it, it is not required to estimate the distribution of $C_{ij}$ under the alternate hypothesis ("H1: Coupling") for Kramer's framework, but I want to see the distribution since I have the scripts already written:

Interestingly, it does not appear to be the case that the distribution is simply the normal distribution with a mean at the coupling value $\alpha$ and the Bartlett estimate of the variance. The distribution of $C_{ij}$ (left panel in above figure) is definitely not well described by this candidate distribution; the distribution of $C_{ij}^F$ fares better (right panel) but there is still a systematic offset.

[2013 10 16]: Actually, I wonder if the mean of the Bartlett estimate normal needs to be inverse Fisher-transformed...

Note that the correlated $C_{ij}$ at $\alpha=0.4$ would all be extremely improvable (very low $p$-value) under the null distribution. So, correlation at $\alpha=0.4$ is highly detectable whereas $\alpha \approx 0.1$ would be harder to distinguish, as judging from the null hypothesis distribution.

No comments:

Post a Comment