Tuesday, October 15, 2013

Basic question #2: Simple instance of FDR inference

The intent in this post is to basically replicate (and understand) the simple "pink noise data" simulation experiment described in Kramer2009. Some remarks:
  • Kramer's independent noise signals are colored (with $1/f^{0.33}$ spectrum), whereas I will do my initial experiment with white noise.
  • Kramer's experiment uses a coupling strength of $\alpha=0.4$. That is, if node 1 connects to node 2, the signals are $x_1[t] = w_1[t]$ and $x_2[t] = w_2[t] + 0.4 w_1[t]$ where $w_i$ are independent noise signals. So, the optimal lag in the cross-correlation is expected to be zero.
I followed the FDR procedure as described in Kramer2009. As with the "pink noise data" experiment, I used a network of $N=9$ nodes and $M=9$ edges. For each inference, some parameters of interest are:
  • Number of false inferred edges (this is the quantity that the FDR method intends to control -- in expectation);
  • Number of inferred edges (relative to the true edge count $M$).
Here is a single inference run, showing the ordered $p$-values, using the "extremum" method (e.g. Fig. 2(c)) with "FDR level" of $q=0.1$:

Next, I ran $N_{inf}=10^4$ instances of the inference problem, to see the general distribution of inference results.

Here is the distribution of num_inferred_edges:
Recall that the true number of edges is $M=9$. So, it appears that the current FDR procedure tends to under-report the expected number of edges (the mode appears to be $7$). This is consistent with the FDR process being somewhat "conservative" in its estimate of edges.

It is important to keep in mind that there will be variations in the number of inferred edges. The time series data does not unambiguously reveal the presence or absence of a coupling.

Here is the distribution of num_false_edges = num_inferred_edges - num_correct:
Based on appearance, I attempted to model the distribution as a geometric distribution -- which is apparently not quite right.

Actually, the more "appropriate" horizontal scale for the above distribution is the proportion of false positives, since the FDR level $q$ is "an upper bound on the expected proportion of false positives among all declared edges in [the] inferred network" [Kramer2009]. Here is that distribution:
I find a mean false edge proportion of $0.07$ in my $N_{inf}=10^4$ dataset, which is below the $q=0.1$ bound as advertised.

One more thing. We might look at the variation in false detection rate conditioned on the number of inferred edges.

No comments:

Post a Comment