To get started, I will examine some basic claims found in
Kramer, et. al.'s "Network inference with confidence from multivariate time series." The paper proposes a computational methodology which, according to the authors, will allow one to reconstruct the connectivity network with confidence (i.e. an estimate of the correctness of the inferred graph, as quantified by the "false discovery rate"). This approach is clearly an improvement over ad hoc threshold-based methods, and greatly appeals to me.
Kramer defines the sample cross correlation (Eq. 1) as:
C_{ij}[\tau] = \frac{1}{\hat{\sigma_i}\hat{\sigma_j}(n-|l|)} \sum_{t=1}^{n-\tau} (x_i[t]-\bar{x}_i)(x_j[t+\tau]-\bar{x}_j)
Actually, Kramer's normalization of
(n-2l) doesn't make sense to me for two reasons: (1) there's a spurious factor of
2 which does not capture the number of overlap terms in the case of finite signals, and (2)
l should have the absolute value applied. I find that my corrected formula is consistent with Eq. 2 in Kramer2009, whereas the original definition is not.
Kramer also cites the following (Bartlett's) estimate for the variance of
C_{ij}[l] when signals
x_i and
x_j are uncoupled:
var(C_{ij}[l]) = \frac{1}{n-|l|}\sum_{\tau=-n}^n C_{ii}[\tau]C_{jj}[\tau].
(Again, I added the absolute value.)
The corresponding mean is clearly
E\left[C_{ij}[l]\right]=0.
For this inaugural post, I would like to verify the Bartlett estimate. I begin with two uncorrelated white Gaussian noise
x_1[t] and
x_2[t].
Here is the simulated distribution of
C_{ij}[l] at a few values of
l, and a normal distribution whose variance is given by the Bartlett estimate. Not bad at all!:
Next, Kramer asks us to consider the Fisher transformation of the
C_{ij}, as follows:
C_{ij}^F[\tau] = \frac{1}{2}\log\frac{1+C_{ij}[\tau]}{1-C_{ij}[\tau]}.
Oh, this bit is trivial. The Fisher transform maps
[-1, 1] \to [-\infty, \infty], so
C_{ij}^F is better described by the normal distribution than
C_{ij}. I checked the correspondence of the above experiment, when the
C_{ij} values underwent a Fisher transform. The agreement with the Bartlett estimated distribution is still good (the transform does little to change the values of
C_{ij} above).
Next, let me consider the distribution in
C_{ij} in the case of an actual correlation between
x_1 and
x_2. I will follow Kramer's example and define the two signals as follows:
x_1 = w_1 and
x_1 = w_2 + \alpha w_1 (
\alpha = 0.4) where
w_i are independent WGN instances. As I understand it, it is not required to estimate the distribution of
C_{ij} under the alternate hypothesis ("H1: Coupling") for Kramer's framework, but I want to see the distribution since I have the scripts already written:
Interestingly, it does not appear to be the case that the distribution is simply the normal distribution with a mean at the coupling value \alpha and the Bartlett estimate of the variance. The distribution of C_{ij} (left panel in above figure) is definitely not well described by this candidate distribution; the distribution of C_{ij}^F fares better (right panel) but there is still a systematic offset.
[2013 10 16]: Actually, I wonder if the mean of the Bartlett estimate normal needs to be inverse Fisher-transformed...
Note that the correlated
C_{ij} at
\alpha=0.4 would all be extremely improvable (very low
p-value) under the null distribution. So, correlation at
\alpha=0.4 is highly detectable whereas
\alpha \approx 0.1 would be harder to distinguish, as judging from the null hypothesis distribution.