Friday, December 27, 2013

Thursday, December 26, 2013

Cascade-based graph inference

Based on the work of Gomez-Rodriguez, et. al. (2011). So far, I read up to Section 3.2. First task is to understand the subroutine for identifying the MLE cascade structure from a list of hit times. Ok; putting it into practice.

I'd like to begin by repeating the authors' synthetic experiments. There are two immediate tasks:
  1. Given a ground truth graph $G$, and cascade parametrization ($\beta$ and $P_c(u,v)$), generate synthetic cascades.
  2. Given cascade $c$, estimate the maximum likelihood cascade tree.
Beginning the implementation... Python ("Snap.py") or Matlab? I had trouble choosing every time I began a programming assignment in CS 224w... The former has some convenient functions built-in, but I feel that there's some "barrier" between whatever I want to do and the syntactically-correct implementation; on the other hand, Matlab lets me do whatever I want, but forces me to implement everything from first principles.

Decided to proceed with Python, for its dictionary.

At the same time, try out a software for easy graph visualization (instead of drawing in my notebook manually each time):

Monday, December 2, 2013

Granger causality from first principles

Rather than using a black box (the GCCA toolbox), I implemented the Granger causality workflow in Matlab from first principles -- e.g. multivariate (vector) autoregressive modeling (VAR model). By working through the logic step-by-step, I have the basic framework that will allow me to make FDR-based directed edge inferences based on sets of time series. However, while the system should be able to make significance-based edge inferences, I have strong reservations about the ability of a VAR model to fundamentally describe the spike train data. Nevertheless, I will continue on with my exploration of G-causality based directed edge inferences.

I implemented both pairwise and conditional Granger causality methods. Here's a quick estimate of the amount of work associated with fitting the pairwise and conditional models:

Null distribution for the G-causal score:

Exponential fit appears to be decent -- I will proceed with the exponential fit for converting the test statistic (G-causal score) to a $p$-value.

Actually, on closer examination (larger number of simulations) the null distribution of the Granger score is not exponential:
Instead, I will use an empirical CCDF for the computation of $p$-value (which is what I did previously for max correlation).