-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about usage of DirectLiNGAM #78
Comments
|
p.s. It would be better to use a different method, e.g., something implemented in dowhy, to compute causal effects from your continuous variables on the binary treatment based on an estimated causal graph rather than using the output of DirectLiNGAM. DirectLiNGAM assumes all the variables are continuous when it computes the causal effects. |
Thanks for your reply! With regards to your first reply,
With regards to your second reply, I am currently only using the output of DirectLiNGAM as our estimated causal graph. For instance, the adjacency matrix from DirectLiNGAM is converted to a NetworkX Digraph, before being pushed directly to DoWhy, where it interprets all non-zero entries as an edge and zero entries as no edge. Backdoor criterion in dowhy is then applied to identify confounders within this graph. Therefore, in a way, I am only relying on DirectLiNGAM's adjacency matrix of zero and non-zero values (the magnitude and sign of these values do not matter I guess), and not the causal effects that the LiNGAM class provides. Were your concerns referring to the fact that the values in the adjacency matrix (not just about zeros and non-zeros) are still employed during independence testing? (i.e., Thanka once again for the speedy and helpful replies! |
Ok, then, the direct edges from the continuous variables to the binary treatment might not be properly pruned. DirectLiNGAM uses sparse linear regression to prune directed edges assuming all the variables are continuous. Some other methods like sparse logistic regression having the continuous variables as explanatory variables and the treatment as the response variable would be better to estimate the existence of directed edges from those continuous variables on the binary treatment (and outcome), though DirectLiNGAM can estimate the causal structure of those continuous variables. |
Based on your recommendation, would such a scenario below work?
log(Y / 1 - Y) = βX1 + βX2 + βT + ... Is it therefore safe to assume the β obtained from the sparse logistic regression above can be used as the values for the adjacency matrix? |
My suggestion would be something like
2 and 3. Adaptive logistic regression (4.1 of the original adaptive lasso paper: http://users.stat.umn.edu/~zouxx019/Papers/adalasso.pdf) having all the continuous variables as the features and binary treatment as the target. Do the same for the binary outcome. Draw directed edges from the continous variables to the treatment and target based on the sparse patterns of the sparse adaptive logistic regression coefficients.
|
Hi Dr. Shimizu, the plan that you've suggested seems to be going well so far, and there isn't any downstream issues (e.g., confounder identification and causal estimation) as of now! Thanks so much for your assistance and quick replies!! I've also tried to replace HSIC with unconditional FCIT (fcit package), which does not seem to have caused any OOM issue thus far! However, I'd still like to clarify some doubts on your implementation of HSIC:
Thank you! |
Hi, DirectLiNGAM tries to find a DAG that minimizes dependence between error terms. DirectLiNGAM does not use HSIC to prune edges. Rather, HSIC is used to see if the error terms in the estimated DAG are independent. Thiis is to find possible violations of the independence assumption. |
Oh I see, thanks for the clarification! Do you then think it's possible to use such independence tests (HSIC or FCIT) to prune edges derived from the adjacency matrix of DirectLiNGAM as described in my question above? |
Yeah, that could be an alternative way for pruning edges. |
Hello, I’d first like to thank you for this incredible package (along with the interesting papers on LiNGAM you’ve published)!
I’m currently trying to employ this package in my Causal Inference pipeline (causal discovery portion).
More specifically, I am currently using DirectLiNGAM with a prior knowledge matrix (specifically for having an edge from the treatment to outcome variable, and that there should be no other outgoing edges from both the treatment and outcome variables). BottomUpParceLiNGAM would have been the ideal model but it dosent work due to scalability and instant out of memory issues.
After running a couple of experiments with DirectLiNGAM, I have 3 questions I’d like to clarify with you if possible:
get_error_independence_p_values
andbootstrap
? As the former (specifically duringhsic_test_gamma
) causes out of memory issues for even the smallest dataset (e.g., 250k x 155), while the latter takes too long (i.e, the defaultfit
in DirectLiNGAM with the above mentioned prior-knowledge matrix ranges between 20 hours - 5 days for the datasets I currently have)Thank you very much!
The text was updated successfully, but these errors were encountered: