Hybrid-Based Approaches

Running examples

CADIMULC is a light Python repository without sophisticated library API design. Documentation on this page is meant to provide introductory materials of the practical tool as to causal discovery. For running example, please simply check out Quick Tutorials for the straightforward usage in the "micro" workflow of causal discovery.

Class: MLC-LiNGAM ^[1]

`cadimulc.hybrid_algorithms.hybrid_algorithms.MLCLiNGAM`

Bases: HybridFrameworkBase

MLC-LiNGAM stands for a hybrid causal discovery method for the LiNGAM approach with multiple latent confounders. It serves as an enhancement of LiNGAM^* via combining the advantages of constraint-based and functional-based causality methodology.

The LiNGAM causal discovery approach

LiNGAM, the linear non-Gaussian acyclic model, is known as one of the structural-identifiable SCMs.

MLC-LiNGAM was proposed to alleviate the following issues:

how to detect the latent confounders;
how to uncover the causal relations among observed and latent variables.

`init(pc_alpha=0.05, _latent_confounder_detection=[])`

Parameters:

Name	Type	Description	Default
`pc_alpha`	`float`	Significance level of independence tests (p_value), which is required by the constraint-based methodology incorporated in the initial stage of the hybrid causal discovery framework.	`0.05`

The output corresponding to the MLC-LiNGAM algorithm

In the field of causal discovery, causal graphs are usually represented as the directed acyclic graph (DAG) or the causal order. The existence of latent confounders, however, might result in the causal relations that cannot be determined by algorithms.

Correspondingly, the estimated causal graph, by the MLC-LiNGAM or the Nonlinear-MLC algorithm in CADIMULC, is represented as the partial directed acyclic graph or the partial causal order.

Primary Method: fit

Fitting data via the MLC-LiNGAM causal discovery algorithm:

Stage-I: Utilize the constraint-based method to learn a causal skeleton.
Stage-II: Identify the causal directions by conducting regression and independence tests on the adjacent pairs in the causal skeleton.
Stage-III: Detect the latent confounders with the help of the maximal clique patterns raised by the latent confounders, and uncover the causal structure with latent variables.

Parameters:

Name	Type	Description	Default
`dataset`	`ndarray`	The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable)	required

Returns:

Name	Type	Description
`self`	`object`	Update the `adjacency_matrix` represented as an estimated causal graph. The `adjacency_matrix` is a (d * d) numpy array with 0/1 elements characterizing the causal direction.

Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py

def fit(self, dataset: ndarray) -> object:
    """
    Fitting data via the *MLC-LiNGAM* causal discovery algorithm:

    - **Stage-I**: Utilize the constraint-based method to learn a **causal skeleton**.
    - **Stage-II**: Identify the causal directions by conducting **regression** and **independence tests**
    on the adjacent pairs in the causal skeleton.
    - **Stage-III**: Detect the latent confounders with the help of the **maximal clique patterns**
    raised by the latent confounders,
    and uncover the causal structure with latent variables.

    Parameters:
        dataset:
            The observational dataset shown as a matrix or table,
            with a format of "sample (n) * dimension (d)."
            (input as Pandas dataframe is also acceptable)

    Returns:
        self:
            Update the ``adjacency_matrix`` represented as an estimated causal graph.
            The ``adjacency_matrix`` is a (d * d) numpy array with 0/1 elements
            characterizing the causal direction.
    """

    # stage-1: causal skeleton reconstruction(PC-stable algorithm)
    self._stage_1_learning(dataset)

    graph_pattern_manager = GraphPatternManager(init_graph=self._skeleton)

    # stage-2: partial causal orders identification
    self._stage_2_learning(graph_pattern_manager)

    graph_pattern_manager.store_last_managing_adjacency_matrix()

    # stage-3: latent confounders' detection
    self._stage_3_learning(graph_pattern_manager)

    return self

Private Method: _stage_1_learning

Stage-I: Causal skeleton construction (based on the PC-stable algorithm).

Stage-I begins with a complete undirected graph and performs conditional independence tests to delete the edges between independent variables pairs, reducing the computational cost of subsequent regressions and independence tests.

Parameters:

Name	Type	Description	Default
`dataset`	`ndarray`	The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable)	required

Returns:

Name	Type	Description
`self`	`object`	Update `_skeleton` as the estimated undirected graph corresponding to the causal graph, initialize `_adjacency_matrix` via a copy of `_skeleton`, and record `_stage1_time` as the stage-1 computational time.

Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py

def _stage_1_learning(self, dataset: ndarray) -> object:
    """
    **Stage-I**: Causal skeleton construction (based on the PC-stable algorithm).

    Stage-I begins with a complete undirected graph and performs **conditional
    independence tests** to delete the edges between independent variables pairs,
    reducing the computational cost of subsequent regressions and independence tests.

    Parameters:
        dataset:
            The observational dataset shown as a matrix or table,
            with a format of "sample (n) * dimension (d)."
            (input as Pandas dataframe is also acceptable)

    Returns:
        self:
            Update `_skeleton` as the estimated undirected graph corresponding to
            the causal graph, initialize `_adjacency_matrix` via a copy of `_skeleton`,
            and record `_stage1_time` as the stage-1 computational time.
    """

    self._causal_skeleton_learning(dataset)

    return self

Private Method: _stage_2_learning

Stage-II: Partial causal order identification.

Based on the causal skeleton by stage-I, stage II in MLC-LiNGAM identifies causal directions among the adjacent variables that are implied by the skeleton. Causal orders relative to all variables can be partially determined by regression and independence tests, if variables that are relatively exogenous or endogenous do not be affected by latent confounders.

Parameters:

Name	Type	Description	Default
`graph_pattern_manager`		An auxiliary module embedded in the MLC-LiNGAM algorithm, managing adjacency matrices amidst the procedure between causal skeleton learning and causal direction orientation.	required

Returns:

Name	Type	Description
`self`	`object`	Update `_adjacency_matrix` as the estimated (partial) directed acyclic graph (DAG) corresponding to the causal graph, and `record _stage2_time` as the `stage-2` computational time.

Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py

def _stage_2_learning(self, graph_pattern_manager) -> object:
    """
    **Stage-II**: Partial causal order identification.

    Based on the causal skeleton by stage-I,
    stage II in *MLC-LiNGAM* identifies causal directions among the adjacent variables
    that are implied by the skeleton.
    Causal orders relative to all variables can be partially determined by **regression
    and independence tests**, if variables that are relatively exogenous or endogenous
    do not be affected by latent confounders.

    Parameters:
        graph_pattern_manager:
            An auxiliary module embedded in the MLC-LiNGAM algorithm,
            managing adjacency matrices amidst the procedure between causal skeleton
            learning and causal direction orientation.

    Returns:
        self:
            Update `_adjacency_matrix` as the estimated (partial) directed acyclic
            graph (DAG) corresponding to the causal graph,
            and `record _stage2_time` as the `stage-2` computational time.
    """

    # Arguments for testing:
    #   _skeleton(attribute): ndarray
    #   _dataset(attribute): ndarray
    #   _dim(attribute): int

    start = time.perf_counter()

    # Reconstruction of the causal skeleton entails specific pairs of adjacent variables,
    # rather than all pairs of variables.
    causal_skeleton = self._skeleton

    # MLC-LiNGAM performs regression and independence tests efficiently
    # based on the adjacency set.
    adjacent_set = GraphPatternManager.find_adjacent_set(
        causal_skeleton=causal_skeleton
    )

    # Apply Algorithm-2 (given by the MLC-LiNGAM algorithm).
    self._algorithm_2(
        corresponding_adjacent_set=adjacent_set,
        corresponding_dataset=cp.copy(self._dataset),
        corresponding_variables=np.arange(self._dim),
        graph_pattern_manager=graph_pattern_manager
    )

    # Record computational time.
    end = time.perf_counter()
    self._stage2_time = end - start

    return self

Private Method: _stage_3_learning

Stage-III: Latent confounders' detection

Stage-III will learn more causal orders if some variables are not affected by the latent confounders but are in the remaining subset. Meanwhile, the stage-III learning makes use of the causal skeleton information to reduce the testing space of remaining variables from all subsets to typical maximal cliques.

Notice that the maximal cliques, including the undirected relations that cannot be determined, are possibly formed by latent confounders. This in turn provides insight to detect the latent confounders, and uncover the causal relations among observed and latent variables.

Parameters:

Name	Type	Description	Default
`graph_pattern_manager`		An auxiliary module embedded in the MLC-LiNGAM algorithm, featuring the algorithmic behavior of the maximal-cliques pattern recognition.	required

Returns:

Name	Type	Description
`self`	`object`	Update `_adjacency_matrix` as the estimated (partial) directed acyclic graph (DAG) corresponding to the causal graph, `_latent_confounder_detection` as the undirected maximal cliques after stage-III learning, and `record _stage3_time` as the `stage-3` computational time.

Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py

def _stage_3_learning(self, graph_pattern_manager) -> object:
    """
    **Stage-III**: Latent confounders' detection

    Stage-III will learn more causal orders if some variables are not affected
    by the latent confounders but are in the remaining subset.
    Meanwhile, the stage-III learning makes use of the causal skeleton information
    to reduce the testing space of remaining variables from all subsets to typical
    **maximal cliques**.

    Notice that the maximal cliques, including the undirected relations that cannot
    be determined, are possibly formed by latent confounders. This in turn provides
    insight to detect the latent confounders, and uncover the causal relations
    among observed and latent variables.

    Parameters:
        graph_pattern_manager:
            An auxiliary module embedded in the MLC-LiNGAM algorithm,
            featuring the algorithmic behavior of the maximal-cliques pattern recognition.

    Returns:
        self:
            Update ``_adjacency_matrix`` as the estimated (partial) directed acyclic
            graph (DAG) corresponding to the causal graph,
            ``_latent_confounder_detection`` as the undirected maximal cliques after
             stage-III learning, and `record _stage3_time` as the `stage-3`
             computational time.
    """

    # Arguments for testing:
    #   _skeleton (attribute)
    #   _adjacency_matrix (attribute)
    #   _parents_set (attribute)
    #   _dataset(attribute): ndarray

    start = time.perf_counter()

    # Recognize the maximal-clique pattern based on the causal skeleton.
    maximal_cliques_completely_undetermined = (
        GraphPatternManager.recognize_maximal_cliques_pattern(
            causal_skeleton=self._skeleton,
            adjacency_matrix=self._adjacency_matrix
        )
    )

    # Setup of regression, referring to MLC-LiNGAM default settings.
    regressor = LinearRegression()
    residuals_dataset = cp.copy(self._dataset)

    # Replace the variables in the clique with their corresponding residuals via
    # regressing out the effect of their confounded parents that are outside the clique.
    for maximal_clique in maximal_cliques_completely_undetermined:
        # Record: Each of the variable requires a single replacement if necessary.
        variables_replaced = {}
        for variable in maximal_clique:
            variables_replaced[variable] = set()

        # Get undetermined pairs within a clique.
        for i in maximal_clique:
            for j in maximal_clique[maximal_clique.index(i) + 1:]:
                parents_i = graph_pattern_manager.managing_parents_set[i]
                parents_j = graph_pattern_manager.managing_parents_set[j]

                # Conduct residuals replacement if the variables share the same parents.
                if (parents_i & parents_j) != set():
                    confounded_parents = parents_i & parents_j

                    for confounder in confounded_parents:
                        data_confounder = residuals_dataset[:, confounder]

                        if confounder not in variables_replaced[i]:
                            variables_replaced[i].add(confounder)

                            data_i = residuals_dataset[:, i]
                            residuals_i = get_residuals_scm(
                                explanatory_data=data_confounder,
                                explained_data=data_i,
                                regressor=regressor
                            )
                            residuals_dataset[:, i] = residuals_i.squeeze()

                        if confounder not in variables_replaced[j]:
                            variables_replaced[j].add(confounder)

                            data_j = residuals_dataset[:, j]
                            residuals_j = get_residuals_scm(
                                explanatory_data=data_confounder,
                                explained_data=data_j,
                                regressor=regressor
                            )
                            residuals_dataset[:, j] = residuals_j.squeeze()

    # Apply Algorithm-2 on the maximal cliques.
    for maximal_clique in maximal_cliques_completely_undetermined:
        # Get adjacent set with respect to the variables within maximal cliques.
        adjacent_set_clique = {}
        for variable in maximal_clique:
            adjacent_set_clique[variable] = set(maximal_clique) - {variable}

        # Apply Algorithm-2 (given by the MLC-LiNGAM algorithm).
        self._algorithm_2(
            corresponding_adjacent_set=adjacent_set_clique,
            corresponding_dataset=residuals_dataset,
            corresponding_variables=np.array(maximal_clique),
            graph_pattern_manager=graph_pattern_manager,
            _specify_adjacency=True,
            _adjacent_set=adjacent_set_clique
        )

    # Update latent confounder detection
    graph_pattern_manager.store_last_managing_adjacency_matrix()
    self._latent_confounder_detection = (
        graph_pattern_manager.get_undetermined_cliques(
            maximal_cliques=maximal_cliques_completely_undetermined
        )
    )

    # Record computational time.
    end = time.perf_counter()
    self._stage3_time = end - start

    return self

Class: Nonlinear-MLC ^[2]

`cadimulc.hybrid_algorithms.hybrid_algorithms.NonlinearMLC`

Bases: HybridFrameworkBase

The hybrid algorithm Nonlinear-MLC, incorporation of the constraint-based and functional-based causal discovery methodology, is developed for the general causal inference over non-linear data in presence of multiple unknown factors.

A primary feature of Nonlinear-MLC lies in exploiting the non-linear causal identification with multiple latent confounders (proposed as the Latent-ANMs causal identification), which is on the basic of the well-known ANMs^* method.

The ANMs causal discovery approach

ANMs, the additive-noise-models, is known as one of the structural-identifiable SCMs.

`init(ind_test='kernel_ci', pc_alpha=0.05, _regressor=LinearGAM())`

Parameters:

Name	Type	Description	Default
`ind_test`	`str`	Popular non-linear independence-tests methods are recommended: Kernel-based Conditional Independence tests (KCI); Hilbert-Schmidt Independence Criterion for General Additive Models (HSIC-GAMs).	`'kernel_ci'`
`pc_alpha`	`float`	Significance level of independence tests (p_value), which is required by the constraint-based methodology incorporated in the initial stage of the hybrid causal discovery framework.	`0.05`

Primary Method: fit

Fitting data via the Nonlinear-MLC causal discovery algorithm.

The procedure comprises the causal skeleton learning in the initial stage, along with the causal identification procedure involving non-linear regression and independence tests for the subsequence. Following the well-known divide-and-conquer strategy, non-linear causal inference are conducted over the maximal cliques recognized from the estimated causal skeleton.

Parameters:

Name	Type	Description	Default
`dataset`	`ndarray`	The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable)	required

Returns:

Name	Type	Description
`self`	`object`	Update the `adjacency_matrix` represented as an estimated causal graph. The `adjacency_matrix` is a (d * d) numpy array with 0/1 elements characterizing the causal direction.

Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py

def fit(self, dataset: ndarray) -> object:
    """
    Fitting data via the *Nonlinear-MLC* causal discovery algorithm.

    The procedure comprises the **causal skeleton learning** in the initial stage,
    along with the causal identification procedure involving **non-linear regression**
    and **independence tests** for the subsequence.
    Following the well-known **divide-and-conquer** strategy, non-linear causal inference
    are conducted over the maximal cliques recognized from the estimated causal skeleton.

    Parameters:
        dataset:
            The observational dataset shown as a matrix or table,
            with a format of "sample (n) * dimension (d)."
            (input as Pandas dataframe is also acceptable)

    Returns:
        self:
            Update the ``adjacency_matrix`` represented as an estimated causal graph.
            The ``adjacency_matrix`` is a (d * d) numpy array with 0/1 elements
            characterizing the causal direction.
    """

    start = time.perf_counter()

    # Reconstruct a causal skeleton using the PC-stable algorithm.
    self._causal_skeleton_learning(dataset)

    # Recognize the maximal-clique pattern based on the causal skeleton.
    maximal_cliques = GraphPatternManager.recognize_maximal_cliques_pattern(
        causal_skeleton=self._skeleton
    )

    # Initialize a graph pattern manager for subsequent learning.
    graph_pattern_manager = GraphPatternManager(
        init_graph=self._skeleton
    )

    # Perform the nonlinear-mlc causal discovery.
    continue_search = True
    while continue_search:

        # Obtain the cliques that remain at least one edge undetermined.
        undetermined_maximal_cliques = (
            graph_pattern_manager.get_undetermined_cliques(maximal_cliques)
        )

        # End if all edges over the cliques have been determined.
        if len(undetermined_maximal_cliques) == 0:
            break

        # Temporally store the adjacency matrix ahead of a search round.
        graph_pattern_manager.store_last_managing_adjacency_matrix()

        # In light of the L-ANMs theory (proposed in paper), start the search round
        # by conducting non-linear causal inference based on maximal cliques.
        determined_pairs = self._clique_based_causal_inference(
            undetermined_maximal_cliques=undetermined_maximal_cliques
        )

        # Orient determined causal directions after a search round over maximal cliques.
        graph_pattern_manager.identify_directed_causal_pair(
            determined_pairs=determined_pairs
        )

        # Update the causal adjacency matrix and parent-relations set
        # after a search round over maximal cliques.
        self._adjacency_matrix = (
            graph_pattern_manager.managing_adjacency_matrix
        )
        self._parents_set = (
            graph_pattern_manager.managing_parents_set
        )

        # Check if new causal relations have been determined
        # after the last round searching.
        newly_determined = (
            graph_pattern_manager.check_newly_determined(
                undetermined_maximal_cliques
            )
        )

        # End if there is none of new causal relation advancing the further search.
        if not newly_determined:
            continue_search = False

    end = time.perf_counter()

    self._running_time += (end - start)

    return self

Private Method: _clique_based_causal_inference

For each of the undetermined maximal cliques (e.g. at least one edge within a maximal clique remains undirected) with respect to the whole maximal-clique patterns, the algorithm conducts non-linear regression and independence tests with the additional explanatory variables selected from the undetermined maximal clique.

This strategy is argued to enhance the efficiency and robustness as to the non-linear causal discovery with multiple latent confounders, serving as the essence of the Nonlinear-MLC algorithm (See "Latent-ANMs Lemma" for details in the relevant paper, Section 3).

Parameters:

Name	Type	Description	Default
`undetermined_maximal_cliques`	`list[list]`	A list of undetermined maximal cliques in which the element (maximal clique) involves at least one edge remaining undirected (e.g. `[[X, Y, Z]]` could stand for a maximal clique , with both determined and undetermined relations " 'X <-> Y', 'Y <-> Z', and 'X -> Z'").	required

Returns:

Type	Description
`list`	The list of determined pairs over the inputting undetermined cliques after searching (e.g. `[[X, Y], [Y, Z]]` stands for two of the determined pairs "X -> Y" and "Y -> Z" after searching).

Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py

def _clique_based_causal_inference(
        self,
        undetermined_maximal_cliques: list[list]
) -> list:
    """
    For each of the undetermined maximal cliques (e.g. at least one edge
    within a maximal clique remains undirected) with respect to the whole
    maximal-clique patterns,
    the algorithm conducts non-linear regression and independence tests with
    the **additional explanatory variables** selected from the
    undetermined maximal clique.

    This strategy is argued to enhance the **efficiency** and **robustness** as to
    the **non-linear causal discovery with multiple latent confounders**,
    serving as the essence of the *Nonlinear-MLC* algorithm (See "Latent-ANMs Lemma" for
    details in the [relevant paper](https://xuanzhichen.github.io/work/papers/nonlinear_mlc.pdf),
    Section 3).

    Parameters:
        undetermined_maximal_cliques:
            A list of undetermined maximal cliques in which the element
            (maximal clique) involves at least one edge remaining undirected
            (e.g. `[[X, Y, Z]]` could stand for a maximal clique <X, Y, Z>,
            with both determined and undetermined relations "
             'X <-> Y', 'Y <-> Z', and 'X -> Z'").

    Returns:
        The list of determined pairs over the inputting undetermined cliques
         after searching
         (e.g. `[[X, Y], [Y, Z]]` stands for two of the determined pairs
         "X -> Y" and "Y -> Z" after searching).
    """

    # Arguments for Testing:
    #     _adjacency_matrix(attribute): ndarray
    #     _dataset(attribute): ndarray
    #     _parents_set(attribute): dict

    determined_pairs = []

    # Conduct non-linear causal inference based on each maximal clique unit.
    for undetermined_maximal_clique in undetermined_maximal_cliques:

        # Initialize the lists with elements of undetermined causal relations.
        # e.g. the element (cause, effect) specifies "cause -> effect"
        undetermined_pairs = []

        # Get undetermined pairs within a clique.
        for i in undetermined_maximal_clique:
            for j in undetermined_maximal_clique[
                undetermined_maximal_clique.index(i) + 1:
            ]:
                if (self._adjacency_matrix[i][j] == 1) and (
                    self._adjacency_matrix[j][i] == 1
                ):
                    undetermined_pairs.append([i, j])

        # Conduct pairwise non-linear regression and independence tests.
        for pair in undetermined_pairs:
            determined = False

            p_value_max = self.pc_alpha
            causation = copy_and_rename(pair)

            # Unravel the pairwise inferred directions respectively.
            pair_temp = cp.copy(pair)
            pair_temp.reverse()
            pair_reversed = copy_and_rename(pair_temp)

            for cause, effect in zip(pair, pair_reversed):

                # ================= Empirical Regressor Construction ==================

                # initialization of explanatory-and-explained variables
                explanatory_vars = set()
                explained_var = set()

                # basic explanatory-and-explained variables: cause-effect
                explanatory_vars.add(cause)
                explained_var.add(effect)  # namely the effect variable

                # Add explanatory variables to strengthen empirical regression:

                # determined parent-relations amidst the algorithm memory
                explanatory_vars = explanatory_vars | set(self._parents_set[effect])

                # undetermined connections within the maximal clique
                explanatory_vars = explanatory_vars | (
                        set(undetermined_maximal_clique) - {effect}
                )

                # Regress the effect variable on empirical explanatory variables
                # (in an attempt to cancel unobserved confounding).

                explanatory_data = cp.copy(
                    self._dataset[:, list(explanatory_vars)]
                )

                # namely the data with respect to the effect variable
                explained_data = cp.copy(
                    self._dataset[:, list(explained_var)]
                )

                # regressing residuals via fitting SCMs

                # Development notes:
                # The following IF branch is added due to a bug, occurring when
                # reinitializing the regressor (GAM) instance for fitting pairwise data.
                if explanatory_data.shape[1] == 1:
                    explanatory_data = check_1dim_array(explanatory_data)
                    explained_data = check_1dim_array(explained_data)

                    regressor = LinearGAM()
                    regressor.fit(explanatory_data, explained_data)
                    est_explained_data = regressor.predict(explanatory_data)
                    est_explained_data = check_1dim_array(est_explained_data)

                    residuals = explained_data - est_explained_data

                else:
                    residuals = get_residuals_scm(
                        explanatory_data=explanatory_data,
                        explained_data=explained_data,
                        regressor=self.regressor
                    )

                # Remove effects of parent-relations from the cause variable
                # (in an attempt to cancel unobserved confounding).

                cause_parents = list(self._parents_set[cause])

                if len(cause_parents) > 0:
                    # Development notes:
                    # The following IF branch is added due to a bug, occurring when
                    # reinitializing the regressor (GAM) instance for fitting pairwise data.
                    if len(cause_parents) == 1:
                        cause_explanatory_data = check_1dim_array(
                            cp.copy(self._dataset[:, cause_parents])
                        )
                        cause_explained_data = check_1dim_array(
                            cp.copy(self._dataset[:, cause])
                        )

                        regressor = LinearGAM()
                        regressor.fit(cause_explanatory_data, cause_explained_data)
                        est_cause_explained_data = regressor.predict(cause_explanatory_data)
                        est_cause_explained_data = check_1dim_array(est_cause_explained_data)

                        cause_residuals = cause_explained_data - est_cause_explained_data
                        cause_data = copy_and_rename(cause_residuals)
                    else:
                        cause_data = get_residuals_scm(
                            explanatory_data=self._dataset[:, cause_parents],
                            explained_data=self._dataset[:, cause],
                            regressor=self.regressor
                        )
                else:
                    cause_data = cp.copy(self._dataset[:, cause])

                # ========================== Independence Test ========================

                # Conduct the independence test
                # between the cause variable and regressing residuals.
                p_value = conduct_ind_test(
                    explanatory_data=cause_data,
                    residuals=residuals,
                    ind_test_method=self.ind_test
                )

                # One single inferred causal direction is determined given the
                # maximal p-value exceeding the threshold of the significant level.
                if p_value > p_value_max:
                    determined = True

                    p_value_max = p_value
                    causation = (cause, effect)

            if determined:
                determined_pairs.append(causation)

    return determined_pairs

Auxiliary Class: GraphPatternManager

`cadimulc.hybrid_algorithms.hybrid_algorithms.GraphPatternManager`

Bases: object

An auxiliary module embedded in the MLC-LiNGAM and Nonlinear-MLC algorithms, featuring the algorithmic behavior of the maximal-cliques pattern recognition.

The module as well manages adjacency matrices amidst the procedure between causal skeleton learning and causal direction orientation.

Primary Method: recognize_maximal_cliques_pattern

Recognize the maximal-cliques pattern based on a causal skeleton (undirected graph) or a partial causal adjacency matrix (partially directed graph).

Reference by recognize_maximal_cliques_pattern

Bron, Coen, and Joep Kerbosch. "Algorithm 457: finding all cliques of an undirected graph." Communications of the ACM 16, no. 9 (1973): 575-577. (Implementation by NetworkX)

Parameters:

Name	Type	Description	Default
`causal_skeleton`	`ndarray`	The undirected graph corresponding to the causal graph.	required
`adjacency_matrix`	`ndarray \| None`	The (partially) directed acyclic graph (DAG) corresponding to the causal graph.	`None`

Returns:

Name	Type	Description
`maximal_cliques`	`list[list]`	A whole list of "maximal cliques", along with each of the "maximal clique" element that is as well in form of list.

Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py

@staticmethod
def recognize_maximal_cliques_pattern(
        causal_skeleton: ndarray,
        adjacency_matrix: ndarray | None = None
) -> list[list]:
    """
    Recognize the maximal-cliques pattern based on a causal skeleton (undirected graph)
    or a partial causal adjacency matrix (partially directed graph).

    !!! note "Reference by ``recognize_maximal_cliques_pattern``"
        Bron, Coen, and Joep Kerbosch.
        "Algorithm 457: finding all cliques of an undirected graph."
        Communications of the ACM 16, no. 9 (1973): 575-577.
        (Implementation by [NetworkX](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.clique.find_cliques.html))

    Parameters:
        causal_skeleton:
            The undirected graph corresponding to the causal graph.
        adjacency_matrix:
            The (partially) directed acyclic graph (DAG) corresponding to the causal graph.

    Returns:
        maximal_cliques:
            A whole list of "maximal cliques", along with each of the
            "maximal clique" element that is as well in form of list.
    """
    # Remove the edge that has been determined as to the skeleton
    if adjacency_matrix is not None:
        dim = adjacency_matrix.shape[0]
        for i in range(dim):
            for j in range(dim):
                if (adjacency_matrix[i][j] == 1) and (adjacency_matrix[j][i] == 0):
                    causal_skeleton[i][j] = 0
                    causal_skeleton[j][i] = 0

    # Search maximal cliques by the Bron-Kerbosch algorithm.
    undirected_graph_nx = nx.from_numpy_array(causal_skeleton)
    clique_iter = nx.find_cliques(undirected_graph_nx)
    maximal_cliques_temp = [clique for clique in clique_iter]

    # Remove the trivial graph from maximal_cliques.
    maximal_cliques = []
    for clique in maximal_cliques_temp:
        if len(clique) > 1:
            maximal_cliques.append(clique)

    return maximal_cliques

Primary Method: find_adjacent_set

Given a causal skeleton (or a subset of the causal skeleton), find out the adjacent variables for each of the variable.

Parameters:

Name	Type	Description	Default
`causal_skeleton`	`ndarray`	The undirected graph corresponding to the causal graph.	required

Returns:

Name	Type	Description
`adjacent_set`	`dict`	A dictionary that describes the adjacency relations: `{variable: (adjacent variables)}`.

Source code in cadimulc\hybrid_algorithms\hybrid_algorithms.py

@staticmethod
def find_adjacent_set(causal_skeleton: ndarray) -> dict:
    """
    Given a causal skeleton (or a subset of the causal skeleton),
    find out the adjacent variables for each of the variable.

    Parameters:
        causal_skeleton:
            The undirected graph corresponding to the causal graph.

    Returns:
        adjacent_set:
            A dictionary that describes the adjacency relations:
            `{variable: (adjacent variables)}`.
    """

    dim = causal_skeleton.shape[1]
    adjacent_set = {}

    for i in range(dim):
        adjacent_set[i] = set()

    for i in range(dim):
        for j in range(dim):
            if i != j:
                if causal_skeleton[i][j] == 1:
                    adjacent_set[i].add(j)

    return adjacent_set

Base Class: HybridFrameworkBase

`cadimulc.hybrid_algorithms.hybrid_framework.HybridFrameworkBase`

A hybrid causal discovery framework with established implementations of discovering the causal skeleton by the Peter-Clark algorithm (PC algorithm). The framework is incorporated into the initial stage of both the Nonlinear-MLC and MLC-LiNGAM causal discovery algorithms.

`init(pc_alpha=0.5, _dataset=None, _dim=None, _skeleton=None, _adjacency_matrix=None, _parents_set={})`

Parameters:

Name	Type	Description	Default
`pc_alpha`	`float`	Significance level of independence tests (p_value), which is required by the constraint-based methodology incorporated in the initial stage of the hybrid causal discovery framework.	`0.5`
`_dataset`	`ndarray`	The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable)	`None`
`_dim`	`int`	int The variable dimension corresponding to the causal graph.	`None`
`_skeleton`	`ndarray`	The estimated undirected graph corresponding to the causal graph.	`None`
`_adjacency_matrix`	`ndarray`	The estimated directed acyclic graph (DAG) corresponding to the causal graph.	`None`
`_parents_set`	`dict`	dict The child-parents relations associating with the adjacency matrix.	`{}`

Primary Method: _causal_skeleton_learning

Causal skeleton construction (based on the PC-stable algorithm).

Parameters:

Name	Type	Description	Default
`dataset`	`ndarray`	The observational dataset shown as a matrix or table, with a format of "sample (n) * dimension (d)." (input as Pandas dataframe is also acceptable)	required

Returns:

Name	Type	Description
`self`	`object`	Update `_skeleton` as the estimated undirected graph corresponding to the causal graph, initialize `_adjacency_matrix` via a copy of `_skeleton`, and record `_stage1_time` as the stage-1 computational time (causal skeleton learning is usually the first stage in hybrid-based causal discovery algorithm) .

Source code in cadimulc\hybrid_algorithms\hybrid_framework.py

def _causal_skeleton_learning(self, dataset: ndarray) -> object:
    """
    Causal skeleton construction (based on the PC-stable algorithm).

    Parameters:
        dataset:
            The observational dataset shown as a matrix or table,
            with a format of "sample (n) * dimension (d)."
            (input as Pandas dataframe is also acceptable)

    Returns:
        self:
            Update `_skeleton` as the estimated undirected graph corresponding to
            the causal graph, initialize `_adjacency_matrix` via a copy of `_skeleton`,
            and record `_stage1_time` as the stage-1 computational time
            (causal skeleton learning is usually the first stage in
            hybrid-based causal discovery algorithm) .
    """

    # Arguments for testing:
    #   pc_alpha(parameter): float
    #   _dataset(attribute): dataframe

    self._dim = dataset.shape[1]
    self._dataset = dataset
    for i in range(self._dim):
        self._parents_set[i] = set()

    data = cp.copy(self._dataset)

    # Development notes:
    # Unify the linear independence test even for nonlinear-mlc.
    skeleton, running_time = get_skeleton_from_pc(
        data=data,
        alpha=self.pc_alpha,
        ind_test_type='linear'
    )

    self._skeleton = cp.copy(skeleton)
    self._adjacency_matrix = cp.copy(skeleton)
    self._stage1_time = running_time

    return self

Reference

[1] Chen, Wei, Ruichu Cai, Kun Zhang, and Zhifeng Hao. "Causal discovery in linear non-gaussian acyclic model with multiple latent confounders. " IEEE Transactions on Neural Networks and Learning Systems. 2021.

[2] Chen, Xuanzhi, Wei Chen, Ruichu Cai. "Non-linear Causal Discovery for Additive Noise Model with Multiple Latent Confounders". Xuanzhi's Personal Website. 2023.

[3] Shimizu, Shohei, Patrik O. Hoyer, Aapo Hyvärinen, Antti Kerminen, and Michael Jordan. "A linear non-Gaussian acyclic model for causal discovery." Journal of Machine Learning Research. 2006.

[4] Hoyer, Patrik, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. "Nonlinear causal discovery with additive noise models." Advances in neural information processing systems. 2008.

[5] Spirtes, Peter, Clark N. Glymour, and Richard Scheines. Causation, prediction, and search. MIT press, 2000.

Hybrid-Based Approaches

Class: MLC-LiNGAM [1]

cadimulc.hybrid_algorithms.hybrid_algorithms.MLCLiNGAM

__init__(pc_alpha=0.05, _latent_confounder_detection=[])

Primary Method: fit

Private Method: _stage_1_learning

Private Method: _stage_2_learning

Private Method: _stage_3_learning

Class: Nonlinear-MLC [2]

cadimulc.hybrid_algorithms.hybrid_algorithms.NonlinearMLC

__init__(ind_test='kernel_ci', pc_alpha=0.05, _regressor=LinearGAM())

Primary Method: fit

Private Method: _clique_based_causal_inference

Auxiliary Class: GraphPatternManager

cadimulc.hybrid_algorithms.hybrid_algorithms.GraphPatternManager

Primary Method: recognize_maximal_cliques_pattern

Primary Method: find_adjacent_set

Base Class: HybridFrameworkBase

cadimulc.hybrid_algorithms.hybrid_framework.HybridFrameworkBase

__init__(pc_alpha=0.5, _dataset=None, _dim=None, _skeleton=None, _adjacency_matrix=None, _parents_set={})

Primary Method: _causal_skeleton_learning

Reference

Class: MLC-LiNGAM ^[1]

`cadimulc.hybrid_algorithms.hybrid_algorithms.MLCLiNGAM`

`init(pc_alpha=0.05, _latent_confounder_detection=[])`

Class: Nonlinear-MLC ^[2]

`cadimulc.hybrid_algorithms.hybrid_algorithms.NonlinearMLC`

`init(ind_test='kernel_ci', pc_alpha=0.05, _regressor=LinearGAM())`

`cadimulc.hybrid_algorithms.hybrid_algorithms.GraphPatternManager`

`cadimulc.hybrid_algorithms.hybrid_framework.HybridFrameworkBase`

`init(pc_alpha=0.5, _dataset=None, _dim=None, _skeleton=None, _adjacency_matrix=None, _parents_set={})`