Determination of maximum likelihood tree $T^*$ given cascade $c$ on graph $G$.
To me, the main insight to solve this problem is to realize that $T^*$ identification given $c$, $G$ can be efficiently performed by a node-wise optimization of the active nodes in $c$ (i.e. any active node has exactly one infection source). On the other hand, I am not entirely certain whether $w(i,j) = \log(P'(i,j))-\log(\epsilon)$ (preprint on arXiv) or $w(i,j) = \log(P'(i,j))-\log(\epsilon P(i,j))$ (KDD '12) is actually the right weight to use. It seems to me that we ought to use $w(i,j)=\log(P'(i,j))$ directly, and that we don't necessarily have to make the approximation of Eq. 11 (i.e. assuming all unactivated edges are $\epsilon$-edges). Let me try out my idea first.
Neural network inference from its time series
Wednesday, January 1, 2014
Friday, December 27, 2013
"Neural processing unit"?
Thursday, December 26, 2013
Cascade-based graph inference
Based on the work of Gomez-Rodriguez, et. al. (2011). So far, I read up to Section 3.2. First task is to understand the subroutine for identifying the MLE cascade structure from a list of hit times. Ok; putting it into practice.
I'd like to begin by repeating the authors' synthetic experiments. There are two immediate tasks:
I'd like to begin by repeating the authors' synthetic experiments. There are two immediate tasks:
- Given a ground truth graph $G$, and cascade parametrization ($\beta$ and $P_c(u,v)$), generate synthetic cascades.
- Given cascade $c$, estimate the maximum likelihood cascade tree.
Beginning the implementation... Python ("Snap.py") or Matlab? I had trouble choosing every time I began a programming assignment in CS 224w... The former has some convenient functions built-in, but I feel that there's some "barrier" between whatever I want to do and the syntactically-correct implementation; on the other hand, Matlab lets me do whatever I want, but forces me to implement everything from first principles.
Decided to proceed with Python, for its dictionary.
At the same time, try out a software for easy graph visualization (instead of drawing in my notebook manually each time):
Decided to proceed with Python, for its dictionary.
At the same time, try out a software for easy graph visualization (instead of drawing in my notebook manually each time):
Monday, December 9, 2013
Monday, December 2, 2013
Granger causality from first principles
Rather than using a black box (the GCCA toolbox), I implemented the Granger causality workflow in Matlab from first principles -- e.g. multivariate (vector) autoregressive modeling (VAR model). By working through the logic step-by-step, I have the basic framework that will allow me to make FDR-based directed edge inferences based on sets of time series. However, while the system should be able to make significance-based edge inferences, I have strong reservations about the ability of a VAR model to fundamentally describe the spike train data. Nevertheless, I will continue on with my exploration of G-causality based directed edge inferences.
I implemented both pairwise and conditional Granger causality methods. Here's a quick estimate of the amount of work associated with fitting the pairwise and conditional models:
Null distribution for the G-causal score:
Exponential fit appears to be decent -- I will proceed with the exponential fit for converting the test statistic (G-causal score) to a $p$-value.
Actually, on closer examination (larger number of simulations) the null distribution of the Granger score is not exponential:
Instead, I will use an empirical CCDF for the computation of $p$-value (which is what I did previously for max correlation).
I implemented both pairwise and conditional Granger causality methods. Here's a quick estimate of the amount of work associated with fitting the pairwise and conditional models:
Null distribution for the G-causal score:
Exponential fit appears to be decent -- I will proceed with the exponential fit for converting the test statistic (G-causal score) to a $p$-value.
Actually, on closer examination (larger number of simulations) the null distribution of the Granger score is not exponential:
Instead, I will use an empirical CCDF for the computation of $p$-value (which is what I did previously for max correlation).
Tuesday, November 19, 2013
Granger causality analysis -- building a foundation
Previously, I had used a "high level" Granger causality toolbox for the inference of directed edges. While there were some early results, I don't like to leave things as a "black box" without understanding the mechanics. In particular, I had observed that the $p$-values for the Granger toolbox seemed to be incorrectly scaled (which causes the FDR method to break down) and I had some serious concerns about the ability of the MVAR model to capture neural processes. So, I decided to implement Granger causality analysis from first principles, which is VAR modeling of time series data.
Getting started with VAR modeling, I found the Mathwork's tutorial to be really helpful.
Getting started with VAR modeling, I found the Mathwork's tutorial to be really helpful.
Friday, November 15, 2013
Scooped!
My term project has been scooped!
- http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3704386/
- http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002653
- http://www.mit.edu/people/sturaga/papers/TuragaNIPS2013.pdf
Actually, there is quite a bit of work -- especially on PLoS ONE, it seems -- on applying causal measures on neural data (e.g. electrode arrays, etc.).
Lot of recent work also seems to pay a lot of attention to transfer entropy, which I also find very interesting. However, I am leaving the TE as beyond the scope of the CS224w work. I find conceptual similarities between TE and Granger causality.
Lot of recent work also seems to pay a lot of attention to transfer entropy, which I also find very interesting. However, I am leaving the TE as beyond the scope of the CS224w work. I find conceptual similarities between TE and Granger causality.
Subscribe to:
Posts (Atom)