Loading web-font TeX/Math/Italic

Tuesday, October 15, 2013

Basic question #1: Distribution of sample cross-correlations

To get started, I will examine some basic claims found in Kramer, et. al.'s "Network inference with confidence from multivariate time series." The paper proposes a computational methodology which, according to the authors, will allow one to reconstruct the connectivity network with confidence (i.e. an estimate of the correctness of the inferred graph, as quantified by the "false discovery rate"). This approach is clearly an improvement over ad hoc threshold-based methods, and greatly appeals to me.

Kramer defines the sample cross correlation (Eq. 1) as:
C_{ij}[\tau] = \frac{1}{\hat{\sigma_i}\hat{\sigma_j}(n-|l|)} \sum_{t=1}^{n-\tau} (x_i[t]-\bar{x}_i)(x_j[t+\tau]-\bar{x}_j)

Actually, Kramer's normalization of (n-2l) doesn't make sense to me for two reasons: (1) there's a spurious factor of 2 which does not capture the number of overlap terms in the case of finite signals, and (2) l should have the absolute value applied. I find that my corrected formula is consistent with Eq. 2 in Kramer2009, whereas the original definition is not.

Kramer also cites the following (Bartlett's) estimate for the variance of C_{ij}[l] when signals x_i and x_j are uncoupled:
var(C_{ij}[l]) = \frac{1}{n-|l|}\sum_{\tau=-n}^n C_{ii}[\tau]C_{jj}[\tau]
.
(Again, I added the absolute value.)

The corresponding mean is clearly E\left[C_{ij}[l]\right]=0.

For this inaugural post, I would like to verify the Bartlett estimate. I begin with two uncorrelated white Gaussian noise x_1[t] and x_2[t].

Here is the simulated distribution of C_{ij}[l] at a few values of l, and a normal distribution whose variance is given by the Bartlett estimate. Not bad at all!:

Next, Kramer asks us to consider the Fisher transformation of the C_{ij}, as follows:
C_{ij}^F[\tau] = \frac{1}{2}\log\frac{1+C_{ij}[\tau]}{1-C_{ij}[\tau]}
.

Oh, this bit is trivial. The Fisher transform maps [-1, 1] \to [-\infty, \infty], so C_{ij}^F is better described by the normal distribution than C_{ij}. I checked the correspondence of the above experiment, when the C_{ij} values underwent a Fisher transform. The agreement with the Bartlett estimated distribution is still good (the transform does little to change the values of C_{ij} above).

Next, let me consider the distribution in C_{ij} in the case of an actual correlation between x_1 and x_2. I will follow Kramer's example and define the two signals as follows: x_1 = w_1 and x_1 = w_2 + \alpha w_1 (\alpha = 0.4) where w_i are independent WGN instances. As I understand it, it is not required to estimate the distribution of C_{ij} under the alternate hypothesis ("H1: Coupling") for Kramer's framework, but I want to see the distribution since I have the scripts already written:

Interestingly, it does not appear to be the case that the distribution is simply the normal distribution with a mean at the coupling value \alpha and the Bartlett estimate of the variance. The distribution of C_{ij} (left panel in above figure) is definitely not well described by this candidate distribution; the distribution of C_{ij}^F fares better (right panel) but there is still a systematic offset.

[2013 10 16]: Actually, I wonder if the mean of the Bartlett estimate normal needs to be inverse Fisher-transformed...

Note that the correlated C_{ij} at \alpha=0.4 would all be extremely improvable (very low p-value) under the null distribution. So, correlation at \alpha=0.4 is highly detectable whereas \alpha \approx 0.1 would be harder to distinguish, as judging from the null hypothesis distribution.

No comments:

Post a Comment