Wednesday, January 1, 2014

Cascade-based graph inference (2)

Determination of maximum likelihood tree $T^*$ given cascade $c$ on graph $G$.

To me, the main insight to solve this problem is to realize that $T^*$ identification given $c$, $G$ can be efficiently performed by a node-wise optimization of the active nodes in $c$ (i.e. any active node has exactly one infection source). On the other hand, I am not entirely certain whether $w(i,j) = \log(P'(i,j))-\log(\epsilon)$ (preprint on arXiv) or $w(i,j) = \log(P'(i,j))-\log(\epsilon P(i,j))$ (KDD '12) is actually the right weight to use. It seems to me that we ought to use $w(i,j)=\log(P'(i,j))$ directly, and that we don't necessarily have to make the approximation of Eq. 11 (i.e. assuming all unactivated edges are $\epsilon$-edges). Let me try out my idea first.

Friday, December 27, 2013

Thursday, December 26, 2013

Cascade-based graph inference

Based on the work of Gomez-Rodriguez, et. al. (2011). So far, I read up to Section 3.2. First task is to understand the subroutine for identifying the MLE cascade structure from a list of hit times. Ok; putting it into practice.

I'd like to begin by repeating the authors' synthetic experiments. There are two immediate tasks:
  1. Given a ground truth graph $G$, and cascade parametrization ($\beta$ and $P_c(u,v)$), generate synthetic cascades.
  2. Given cascade $c$, estimate the maximum likelihood cascade tree.
Beginning the implementation... Python ("Snap.py") or Matlab? I had trouble choosing every time I began a programming assignment in CS 224w... The former has some convenient functions built-in, but I feel that there's some "barrier" between whatever I want to do and the syntactically-correct implementation; on the other hand, Matlab lets me do whatever I want, but forces me to implement everything from first principles.

Decided to proceed with Python, for its dictionary.

At the same time, try out a software for easy graph visualization (instead of drawing in my notebook manually each time):

Monday, December 2, 2013

Granger causality from first principles

Rather than using a black box (the GCCA toolbox), I implemented the Granger causality workflow in Matlab from first principles -- e.g. multivariate (vector) autoregressive modeling (VAR model). By working through the logic step-by-step, I have the basic framework that will allow me to make FDR-based directed edge inferences based on sets of time series. However, while the system should be able to make significance-based edge inferences, I have strong reservations about the ability of a VAR model to fundamentally describe the spike train data. Nevertheless, I will continue on with my exploration of G-causality based directed edge inferences.

I implemented both pairwise and conditional Granger causality methods. Here's a quick estimate of the amount of work associated with fitting the pairwise and conditional models:

Null distribution for the G-causal score:

Exponential fit appears to be decent -- I will proceed with the exponential fit for converting the test statistic (G-causal score) to a $p$-value.

Actually, on closer examination (larger number of simulations) the null distribution of the Granger score is not exponential:
Instead, I will use an empirical CCDF for the computation of $p$-value (which is what I did previously for max correlation).

Tuesday, November 19, 2013

Granger causality analysis -- building a foundation

Previously, I had used a "high level" Granger causality toolbox for the inference of directed edges. While there were some early results, I don't like to leave things as a "black box" without understanding the mechanics. In particular, I had observed that the $p$-values for the Granger toolbox seemed to be incorrectly scaled (which causes the FDR method to break down) and I had some serious concerns about the ability of the MVAR model to capture neural processes. So, I decided to implement Granger causality analysis from first principles, which is VAR modeling of time series data.

Getting started with VAR modeling, I found the Mathwork's tutorial to be really helpful.

Friday, November 15, 2013

Scooped!

My term project has been scooped!
Actually, there is quite a bit of work -- especially on PLoS ONE, it seems -- on applying causal measures on neural data (e.g. electrode arrays, etc.).

Lot of recent work also seems to pay a lot of attention to transfer entropy, which I also find very interesting. However, I am leaving the TE as beyond the scope of the CS224w work. I find conceptual similarities between TE and Granger causality.