Hybrid-Based Approaches
Running examples
CADIMULC is a light Python repository without sophisticated library API design. Documentation on this page is meant to provide introductory materials of the practical tool as to causal discovery. For running example, please simply check out Quick Tutorials for the straightforward usage in the "micro" workflow of causal discovery.
Class: MLC-LiNGAM [1]
cadimulc.hybrid_algorithms.hybrid_algorithms.MLCLiNGAM
Bases: HybridFrameworkBase
MLC-LiNGAM stands for a hybrid causal discovery method for the LiNGAM approach with multiple latent confounders. It serves as an enhancement of LiNGAM* via combining the advantages of constraint-based and functional-based causality methodology.
The LiNGAM causal discovery approach
LiNGAM, the linear non-Gaussian acyclic model, is known as one of the structural-identifiable SCMs.
MLC-LiNGAM was proposed to alleviate the following issues:
- how to detect the latent confounders;
- how to uncover the causal relations among observed and latent variables.
__init__(pc_alpha=0.05, _latent_confounder_detection=[])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pc_alpha |
float
|
Significance level of independence tests (p_value), which is required by the constraint-based methodology incorporated in the initial stage of the hybrid causal discovery framework. |
0.05
|
The output corresponding to the MLC-LiNGAM algorithm
In the field of causal discovery, causal graphs are usually represented as the directed acyclic graph (DAG) or the causal order. The existence of latent confounders, however, might result in the causal relations that cannot be determined by algorithms.
Correspondingly, the estimated causal graph, by the MLC-LiNGAM or the Nonlinear-MLC algorithm in CADIMULC, is represented as the partial directed acyclic graph or the partial causal order.
Primary Method: fit
Fitting data via the MLC-LiNGAM causal discovery algorithm:
- Stage-I: Utilize the constraint-based method to learn a causal skeleton.
- Stage-II: Identify the causal directions by conducting regression and independence tests on the adjacent pairs in the causal skeleton.
- Stage-III: Detect the latent confounders with the help of the maximal clique patterns raised by the latent confounders, and uncover the causal structure with latent variables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
ndarray
|
The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable) |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
Update the |
Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py
Private Method: _stage_1_learning
Stage-I: Causal skeleton construction (based on the PC-stable algorithm).
Stage-I begins with a complete undirected graph and performs conditional independence tests to delete the edges between independent variables pairs, reducing the computational cost of subsequent regressions and independence tests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
ndarray
|
The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable) |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
Update |
Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py
Private Method: _stage_2_learning
Stage-II: Partial causal order identification.
Based on the causal skeleton by stage-I, stage II in MLC-LiNGAM identifies causal directions among the adjacent variables that are implied by the skeleton. Causal orders relative to all variables can be partially determined by regression and independence tests, if variables that are relatively exogenous or endogenous do not be affected by latent confounders.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph_pattern_manager |
An auxiliary module embedded in the MLC-LiNGAM algorithm, managing adjacency matrices amidst the procedure between causal skeleton learning and causal direction orientation. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
Update |
Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py
Private Method: _stage_3_learning
Stage-III: Latent confounders' detection
Stage-III will learn more causal orders if some variables are not affected by the latent confounders but are in the remaining subset. Meanwhile, the stage-III learning makes use of the causal skeleton information to reduce the testing space of remaining variables from all subsets to typical maximal cliques.
Notice that the maximal cliques, including the undirected relations that cannot be determined, are possibly formed by latent confounders. This in turn provides insight to detect the latent confounders, and uncover the causal relations among observed and latent variables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
graph_pattern_manager |
An auxiliary module embedded in the MLC-LiNGAM algorithm, featuring the algorithmic behavior of the maximal-cliques pattern recognition. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
Update |
Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py
1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 |
|
Class: Nonlinear-MLC [2]
cadimulc.hybrid_algorithms.hybrid_algorithms.NonlinearMLC
Bases: HybridFrameworkBase
The hybrid algorithm Nonlinear-MLC, incorporation of the constraint-based and functional-based causal discovery methodology, is developed for the general causal inference over non-linear data in presence of multiple unknown factors.
A primary feature of Nonlinear-MLC lies in exploiting the non-linear causal identification with multiple latent confounders (proposed as the Latent-ANMs causal identification), which is on the basic of the well-known ANMs* method.
The ANMs causal discovery approach
ANMs, the additive-noise-models, is known as one of the structural-identifiable SCMs.
__init__(ind_test='kernel_ci', pc_alpha=0.05, _regressor=LinearGAM())
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ind_test |
str
|
Popular non-linear independence-tests methods are recommended: Kernel-based Conditional Independence tests (KCI); Hilbert-Schmidt Independence Criterion for General Additive Models (HSIC-GAMs). |
'kernel_ci'
|
pc_alpha |
float
|
Significance level of independence tests (p_value), which is required by the constraint-based methodology incorporated in the initial stage of the hybrid causal discovery framework. |
0.05
|
Primary Method: fit
Fitting data via the Nonlinear-MLC causal discovery algorithm.
The procedure comprises the causal skeleton learning in the initial stage, along with the causal identification procedure involving non-linear regression and independence tests for the subsequence. Following the well-known divide-and-conquer strategy, non-linear causal inference are conducted over the maximal cliques recognized from the estimated causal skeleton.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
ndarray
|
The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable) |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
Update the |
Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py
528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 |
|
Private Method: _clique_based_causal_inference
For each of the undetermined maximal cliques (e.g. at least one edge within a maximal clique remains undirected) with respect to the whole maximal-clique patterns, the algorithm conducts non-linear regression and independence tests with the additional explanatory variables selected from the undetermined maximal clique.
This strategy is argued to enhance the efficiency and robustness as to the non-linear causal discovery with multiple latent confounders, serving as the essence of the Nonlinear-MLC algorithm (See "Latent-ANMs Lemma" for details in the relevant paper, Section 3).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
undetermined_maximal_cliques |
list[list]
|
A list of undetermined maximal cliques in which the element
(maximal clique) involves at least one edge remaining undirected
(e.g. |
required |
Returns:
Type | Description |
---|---|
list
|
The list of determined pairs over the inputting undetermined cliques
after searching
(e.g. |
Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py
623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 |
|
Auxiliary Class: GraphPatternManager
cadimulc.hybrid_algorithms.hybrid_algorithms.GraphPatternManager
Bases: object
An auxiliary module embedded in the MLC-LiNGAM and Nonlinear-MLC algorithms, featuring the algorithmic behavior of the maximal-cliques pattern recognition.
The module as well manages adjacency matrices amidst the procedure between causal skeleton learning and causal direction orientation.
Primary Method: recognize_maximal_cliques_pattern
Recognize the maximal-cliques pattern based on a causal skeleton (undirected graph) or a partial causal adjacency matrix (partially directed graph).
Reference by recognize_maximal_cliques_pattern
Bron, Coen, and Joep Kerbosch. "Algorithm 457: finding all cliques of an undirected graph." Communications of the ACM 16, no. 9 (1973): 575-577. (Implementation by NetworkX)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
causal_skeleton |
ndarray
|
The undirected graph corresponding to the causal graph. |
required |
adjacency_matrix |
ndarray | None
|
The (partially) directed acyclic graph (DAG) corresponding to the causal graph. |
None
|
Returns:
Name | Type | Description |
---|---|---|
maximal_cliques |
list[list]
|
A whole list of "maximal cliques", along with each of the "maximal clique" element that is as well in form of list. |
Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py
Primary Method: find_adjacent_set
Given a causal skeleton (or a subset of the causal skeleton), find out the adjacent variables for each of the variable.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
causal_skeleton |
ndarray
|
The undirected graph corresponding to the causal graph. |
required |
Returns:
Name | Type | Description |
---|---|---|
adjacent_set |
dict
|
A dictionary that describes the adjacency relations:
|
Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py
Base Class: HybridFrameworkBase
cadimulc.hybrid_algorithms.hybrid_framework.HybridFrameworkBase
A hybrid causal discovery framework with established implementations of discovering the causal skeleton by the Peter-Clark algorithm (PC algorithm). The framework is incorporated into the initial stage of both the Nonlinear-MLC and MLC-LiNGAM causal discovery algorithms.
__init__(pc_alpha=0.5, _dataset=None, _dim=None, _skeleton=None, _adjacency_matrix=None, _parents_set={})
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pc_alpha |
float
|
Significance level of independence tests (p_value), which is required by the constraint-based methodology incorporated in the initial stage of the hybrid causal discovery framework. |
0.5
|
_dataset |
ndarray
|
The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable) |
None
|
_dim |
int
|
int The variable dimension corresponding to the causal graph. |
None
|
_skeleton |
ndarray
|
The estimated undirected graph corresponding to the causal graph. |
None
|
_adjacency_matrix |
ndarray
|
The estimated directed acyclic graph (DAG) corresponding to the causal graph. |
None
|
_parents_set |
dict
|
dict The child-parents relations associating with the adjacency matrix. |
{}
|
Primary Method: _causal_skeleton_learning
Causal skeleton construction (based on the PC-stable algorithm).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
ndarray
|
The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable) |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
object
|
Update |
Source code in cadimulc\hybrid_algorithms\hybrid_framework.py
Reference
[1] Chen, Wei, Ruichu Cai, Kun Zhang, and Zhifeng Hao. "Causal discovery in linear non-gaussian acyclic model with multiple latent confounders. " IEEE Transactions on Neural Networks and Learning Systems. 2021.
[2] Chen, Xuanzhi, Wei Chen, Ruichu Cai. "Non-linear Causal Discovery for Additive Noise Model with Multiple Latent Confounders". Xuanzhi's Personal Website. 2023.
[3] Shimizu, Shohei, Patrik O. Hoyer, Aapo Hyvärinen, Antti Kerminen, and Michael Jordan. "A linear non-Gaussian acyclic model for causal discovery." Journal of Machine Learning Research. 2006.
[4] Hoyer, Patrik, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. "Nonlinear causal discovery with additive noise models." Advances in neural information processing systems. 2008.
[5] Spirtes, Peter, Clark N. Glymour, and Richard Scheines. Causation, prediction, and search. MIT press, 2000.